Assessing the Effectiveness of the Actuaries Climate Index for Estimating the Impact of Extreme Weather on Crop Yield and Insurance Applications

This paper investigates the effectiveness of the Actuaries Climate Index (ACI), a climate index jointly launched by multiple actuarial societies in North America in 2016, on predicting crop yields and (re)insurance ratemaking. The ACI is created using a variety of climate variables reflecting extreme weather conditions in 12 subregions in the US and Canada. Using data from eight Midwestern states in the US, we find that the ACI has significant predictive power for crop yields. Moreover, allowing the constituting variables of the ACI to have data-driven rather than pre-determined weights could further improve the predictive accuracy. Furthermore, we create the county-level ACI index using high-resolution climate data and investigate its predictive power on county-level corn yields, which are more relevant to insurance practices. We find that although the self-constructed ACI index leads to a slightly worse fit due to noisier county-specific yield data, the predictive results are still reasonable. Our findings suggest that the ACI index is promising for crop yield forecasting and (re)insurance ratemaking, and its effectiveness could be further improved by allowing for the data-driven weights of the constituting variables and could be created at higher


Introduction
In recent decades, global climate change has become one of the most complicated and serious challenges facing human society. Extreme weather risks impose non-negligible threats to many aspects, including resource shortage, food consumption, and human health ( [1]). The Intergovernmental Panel on Climate Changes (IPCC) reports that the average observed global mean surface temperature during the decade 2006-2015 was 0.87 • C higher than that over the last half of the 19th century. The National Oceanic and Atmospheric Administration (NOAA) reports that seven of the eight warmest years on record have happened since 2001, and all of the ten warmest years have taken place since 1995. This means that even though the temperature may not have increased significantly, extreme climate events are becoming more frequent.
Agriculture is the foundation of human development and social stability. The yields of many crops, and in many locations, are strongly related to weather variability, and severe weather conditions pose threats to agricultural production ( [2]). The increase in extreme climate events exacerbates the volatility of agricultural production and severe agricultural loss. For example, Ref. [3] posit that rising temperatures tend to lower maize and wheat yields in some countries. The reduction has been so large that it offsets the increase in crop yields brought by technological advances, such as biotechnology, farming equipment and practices, etc., and [4] found that extremely high temperatures can reduce corn yields by approximately 7% in the US. Moreover, Ref. [2] indicate that extreme weather events may have more negative impacts on crops planted in tropical areas and higher latitude regions. There is significant research that focuses on the impact of extreme weather risks on crop yield using specific climate variables or weather index variables, such as the Palmer Drought Severity Index (PDSI), Cooling Degree Day (CDD), temperature, precipitation, etc. Although the majority of the existing studies focus on the changes in the average yield over time, it is the frequency of severe losses that matters to insurers in agriculture ( [5]). In particular, for insurance purposes, it is not just frequency, but also severity that impacts agricultural loss.
In 2016, the American Academy of Actuaries (AAA), the Casualty Actuarial Society (CAS), the Canadian Institute of Actuaries (CIA), and the Society of Actuaries (SOA) launched the Actuaries Climate Index (ACI) [6]. The ACI is a climate index that measures the observations of extreme weather and sea levels within the continent of the United States and Canada. The ACI is intended to help actuaries, insurance companies, governments, and the general public better understand the potential effects of climate trends and extreme weather events. Actuaries not only focus on how to measure the risks facing people in their everyday lives, but also assist in mitigating, managing, and predicting the future risks facing human society. As climatologists, environmentalists, and agricultural scientists build models to assess the potential climate changes and their impacts on the environment and agriculture, actuaries establish models to analyze the effects of uncertain climate events on the financial losses of various entities. Therefore, the ACI may provide actuaries with reliable information about extreme weather conditions, which are important for estimating and modeling climate-related insurance and financial risks.
To examine the effectiveness and feasibility of the ACI, this paper adopts the ACI, as well as the constituting variables of the ACI, to build statistical models for estimating the impacts of extreme weather events on crop yields. As an illustration, our analysis focuses on the Midwest region in the US, i.e., the major crop production region in the country. In addition, this research develops a methodology for assessing and predicting the agricultural loss for crop insurance and reinsurance applications using a combination of linear regression and probit regression models. We aim to better understand the key components of the ACI that are most important for predicting crop yields. The ACI consists of multiple climate variables used in previous studies to predict crop yields (see more details in Sections 2 and 3). It could be treated as a standardized index that integrates climate information in multiple aspects (including average weather and extreme weather events) and is thus a potential standardized predictor that could be adopted by researchers and insurance practitioners to predict crop yields and price crop yield (re)insurance products. Since the ACI was recently launched on 30 November 2016, there is very little research examining the importance and feasibility of the index for forecasting and managing agriculture risks. Therefore, to the best of our knowledge, this paper provides the first discussion on the strengths and weaknesses of the ACI for agriculture (re)insurance applications.
In this paper, we apply five regression models based on different geographical scales and predictors. For each model, we consider both the linear regression model and probit regression. The probit model is used to identify the event of insurance claims. Specifically, the binary response variable is 1 when the (detrended) corn yield is below the 25th quantile of the empirical distribution of yields over the sample, and thus indicates an insurance claim. When an insurance claim occurs, the linear regression model will be used to predict the corn yield in that year, based on which the severity of insurance loss can be calculated.
The first three models are designed using corn yields from eight Midwestern states in the United States from 1961 to 2018 as the dependent variable and weather variables derived from the ACI as the independent variables [7,8]. Specifically, Model 1 considers the monthly ACI as the independent variable; Model 2 derives several individual components from the monthly ACI as independent variables, including temperature above the 90th percentile and below the 10th percentile, a maximum monthly 5-day rainfall period, annual consecutive dry days, and wind speed above the 90th percentile; Model 3 considers similar individual ACI components (excluding wind speed) but based on an annual, rather than monthly, temporal scale to investigate the impact of averaging out the monthly weather variations on corn yield prediction. Finally, a high-level spatial resolution climate dataset including monthly county-level climate information is employed to develop a high-level resolution ACI. Models 4 and 5 derive similar weather variables from high-level resolution data and apply them to predict county-level crop yields. Model 4 focuses on Iowa, while Model 5 covers the whole Midwest region. Models 4 and 5 serve as a comparison to Models 1 to 3 when more high-resolution yield data (county-level) are used in the prediction. From an insurance point of view, county-level data are more relevant as real-world insurance products rarely cover corn yields of a whole state, let alone the whole Midwest region.
The results suggest that the ACI has reasonable predictive powers on corn yields. Specifically, the linear regression Model 1 and probit Model 1 both include July and September ACI as two significant variables and produce an R 2 of 0.8917 and a pseudo R 2 of 0.1617, respectively. Further, the linear regression Model 2 presents an R 2 of 0.9535, which is the largest R 2 among the five models. The probit Model 2 produces a pseudo R 2 of 0.3451. The R 2 and pseudo R 2 of the linear regression Model 3 and probit Model 3 are reduced to 0.7504 and 0.0739, respectively, indicating that monthly variations are important in yield prediction. When high-resolution data are used, the goodness of fit was slightly worse compared to the Midwest region regression. One possible reason might be the data become noisier at the county level. Specifically, Model 4 generates an R 2 of 0.7982 for the linear regression and a pseudo R 2 of 0.2864 for the probit regression. Finally, the R 2 of linear regression Model 5 is 0.7126, while the pseudo R 2 of probit Model 5 is 0.2341.
Among the five models, Model 2 leads to the best goodness of fit. However, the county-level regressions, which are more valuable for insurance practices, also lead to reasonable goodness of fit. Therefore, our results indicate that the ACI index can be promising in modeling and predicting crop yields. Moreover, it would be beneficial to the insurance industry if the weights of the constituting variables of the ACI could be determined in a data-driven way, rather than being predetermined. Finally, the insurance sector could benefit if the ACI could be published at more high-resolution levels, such as the county level.
The rest of the paper is organized as follows. Section 2 provides a literature review. Section 3 introduces the construction of the ACI and its applications on (re)insurance. Section 4 introduces the data. Section 5 outlines the statistical models. Section 6 discusses the empirical results. Finally, Section 7 concludes the paper and Section 8 outlines future research directions. Other materials are delegated to the Appendix A.

Statistical Models for Crop Yields Prediction
Regression models are commonly adopted to assess the extreme climate risk probability distribution and the impacts of extreme weather events on agricultural production ( [9]). Production function models usually use cross-section or time-series data to evaluate the impacts of temperature, precipitation, labor, and fertilization on crop yields. Regression analysis is employed to quantify the marginal effects of these factors and predict crop yields in different climates. Compared to agronomic process-based models and economic models, statistical models are easy to implement and have lower data requirements ( [10]). For example, the most basic statistical model only needs historical yields and weather data. Furthermore, statistical models are more transparent in dealing with uncertainties compared to many other models. In particular, a statistical model is unable to reflect the way that weather affects crop yields if it has relatively low goodness of fit and many of the predictors are insignificant ( [11]).
Time horizon is shown to be an important factor in crop yield forecasting. Ref. [12] recommend using a reweighting method on weather data over a long period, such as over 100 years, and employ a Generalized Linear Model (GLM) linked with a fractional logit Sustainability 2022, 14, 6916 4 of 24 regression model to build a weather index to estimate crop yield loss more reliably. Further, ref. [13] shows that time horizon is an essential factor in crop insurance ratemaking and develops a conditional Weibull model to determine the effects of sample period span and weather variation on yield risk and pricing estimation.
In terms of regression methods, many researchers adopt three main types of statistical approaches, time series, cross-section, and panel data methods, to predict crop yields and related insurance losses ( [10,11,14]). Ref. [11] discuss the advantages and disadvantages of these methods and compare them to process-based simulation methods. Ref. [14] use panel regression to assess the impacts of temperature and precipitation on crop yields in the Pampas since 1971 and find that climate has relatively small but significant negative effects. Ref. [5] employ the Superposed Epoch Analysis (SEA) in a time-series regression to assess the impact of extreme weather events on crop production and observe that both drought and extreme heat lead to national production reduction globally.
Finally, crop yields are seasonal and are affected by a variety of weather variables. Variable selection is necessary to reduce the complexity of regression models and determine which variables contribute most to crop yield prediction. Three regularization techniques based on the Ordinary Least Squares (OLS) regression, including ridge regression, lasso regression, and elastic net, can be applied to obtain the optimum regression models. All of the approaches aim to shrink the regression coefficients: ridge regression shrinks the parameters close to zero, while lasso regression shrinks them exactly to zero. Therefore, ridge regression may prevent multicollinearity and reduce the model complexity, while lasso regression can select the most significant variables and reduce the model complexity. In other words, ridge regression works well when most parameters affect the result, while lasso works well when only a few parameters are related to the response. Elastic net combines the penalty approaches of both the lasso and ridge regression.

Machine Learning-Based Methods and Simulation Approaches for Crop Yields Prediction
The process of variable selection is also a machine learning (ML)-based method. ML methods are widely employed to assess and forecast the impacts of climate signals on crop yields. Ref. [15] use the random forest approach to explore how climate and other variables affect crop yields and the relationship between crop yields and scale-compatible climate data from 1962 to 2014 in sub-Saharan Africa. This model captures the negative response of crop yields to increasing maximum temperature and low precipitation. Ref. [16] adopt the lasso regression model with three ML approaches that include support-vector machines, random forest, and neural networks to build various empirical models for forecasting wheat yields from 2000 to 2014 in Australia. In general, they find that precipitation and wet day frequency have a positive correlation with crop yields, while maximum temperature and potential evapotranspiration have a negative correlation.
The empirical approach usually assumes that climate does not have a trend, so the estimation and forecasting process may disregard the sensitivity of crop yields to technology and climate trends. Ref. [17] demonstrate that statistical approaches tend to overestimate yield losses from warming, especially in dry years, while the Crop Environment Resource Synthesis Maize (CERES-Maize) model (a simulation method) captures the interaction effects of warming and moisture. The simulation model simulates dynamic weather factors to establish a mathematical model to predict crop yields under specific conditions. At present, representative crop simulation models include CERES, used in the US; Agricultural Production Systems Simulator (APSIM), employed in Australia; and World Food Studies (WOFOST), developed in the Netherlands ( [18]).

Extreme Weather Events for Crop Yields Prediction
Except for typical temperature and precipitation variables ( [3,14,15,19,20]), the PDSI is another weather variable commonly considered in many pieces of research ( [1,10,21]). PDSI subsumes the effects of both precipitation and temperature and provides a locally relative scale ranging from very wet to very dry conditions. Ref. [1] use positive PDSI values to represent wet spells and negative PDSI values to represent drought conditions to assess whether longer time series weather data can provide a more accurate weather index for capturing the impacts of weather events on crop losses. Ref. [21] develops a summer average PDSI index (June, July, and August) and employs it to monitor significant weather events affecting crop yield growth during the critical growing season. While other weather variables (such as temperature, precipitation, various degrees of integration or disaggregation levels, etc.) may have been used, the analysis does not suggest that using other PDSI index types, months or month combinations, or direct temperature and precipitation measures have a qualitative impact on the outcomes.
The normalized difference vegetation index (NDVI) and crop moisture index (CMI) are also used to represent drought conditions. NDVI is a satellite-derived remote sensing measure that can estimate the health of vegetation during any given period by measuring greenness, which is thought to be closely correlated to crop yields. NDVI is easy to measure on a regular basis and is not subject to manipulation by agricultural producers or insurers. Most research focuses on the applicability of NDVI in index-based crop insurance, not indemnity-based crop insurance. Ref. [22] investigate whether NDVI can be a reliable indicator in developing index-based agricultural insurance products and conclude there is a wide variation in relationships between crop yields and NDVI depending on locations and crop types. Ref. [23] show that while NDVI, as a factor of catastrophic drought index insurance in Zimbabwe, has a relatively high correlation with corn and cotton yield loss, the relationship between NDVI and crop yield varies across crops and districts. The CMI indicates general conditions and not local variations caused by isolated rain. It is the sum of the evapotranspiration anomaly (which is generally negative or slightly positive) and excess moisture (either zero or positive) and is developed from some of the moisture-accounting procedures used in computing the PDSI. The CMI gives the short-term or current status of purely agriculture drought or moisture surplus and can change rapidly from week to week.
Cooling degree days (CDD), growing degree days (GDD), and diurnal temperature range (DTR) are also widely considered to assess the impacts of temperature or extreme heat ( [1,10,14,22,[24][25][26]). CDD refers to the degrees that a day's average temperature is above 65 degrees Fahrenheit, and GDD represents the mean daily temperature above a certain temperature during the growing season. The DTR means the range of temperature between the high of the day and the cool of the night. Ref. [24] developed a weather index using whole-season CDD (from May to September) and June-July total CDD. The reason for using the June-July periods is that crop growth is frequently adversely affected by heat units. Ref. [22] use the GDD standard of 80 degrees Fahrenheit to represent the instances of extreme heat. Ref. [24] calculate the GDD regarding a base of 46.4 degrees Fahrenheit and a ceiling of 89.6 degrees Fahrenheit to measure the impacts of temperature on agricultural productivity. Ref. [14] calculate the monthly means of DTR and then average them across the growing season. They then estimate the impacts of precipitation, temperature, and DTR on crop yields on a county basis through time-series regression models and conclude that, compared to temperature, DTR has had relatively small negative impacts on corn, wheat, and soy in the Pampas since 1971. Ref. [26] demonstrate that a greater DTR indicates a higher warm temperature and a lower cool temperature in a day, both of which have negative effects on crop yields.

Other Variables Affect Crop Yields
Technology is one of the main reasons for growing yields for many crop types in many countries. Technology contributes positively to long-term crop yields by making the crops more resistant to extreme weather events, such as drought and strong wind ( [27]). Ref. [15] use time and country as fixed effect variables to represent technology changes and conclude that technological advances especially explained the increasing yields in sub-Saharan Africa from 1962 to 2014. Ref. [28] demonstrate that under similar drought conditions in 1988 and 2012 in 12 states in the Corn Belt, the yield losses, as well as loss-cost ratios and loss ratios that reflect crop insurance risk, were significantly lower in the second period. In 2012, 91% of corn planted in Iowa was a genetically engineered variety that is more resistant to heat and moisture stresses, but no genetically engineered variety was available in 1988.
In addition, other factors, including soil quality, the management skill of farmers, government policy, and solar radiation are considerable consequences of crop insurance ( [10,13,21,29,30]). Ref. [10] uses soil productivity as an independent variable to control farm heterogeneity through time and shows that better soil results in higher yields under the same weather and technology conditions. Ref. [13] document that failing to account for the spatial dependence of each dataset would lead to the erroneous estimation of some coefficients. The government also plays a vital role in responding to unusual and unexpected natural disasters ( [29]). In particular, if the government encourages farmers to plant a specific crop with agricultural subsidies, the yields of this crop in that period were found to be higher than normal. Finally, Ref. [30] find that solar radiation is correlated to the increase in corn yields in the US Corn Belt from 1984 to 2013.

Extreme Weather Events and Insurance Applications
Extreme weather incidents result in huge losses, including those from crop insurance, for insurance and reinsurance firms. Many large payments caused by extreme events depend on the risk loading included in the premium since the risk loading takes into account the likelihood and severity of extreme events ( [31]). Measuring the risk of extreme events provides more information to actuaries for climate insurance modeling. Ref. [32] use simulated Cooling Degree Day to build up a weather call option and measure the tails of the payments of the call option using conditional tail expectation (CTE) and Value-at-Risk (VaR). They then utilize a number of simulation-based methods to forecast the distributions of future daily temperature and insurance payments in California. The statistical ensemble models are essential to simulate and predict future indemnities or costs for insurance products so that actuaries could develop a better and more reasonable premium rate.

Background of the Actuarial Climate Index
The ACI has six components, including warm temperatures (T90: based on 90th percentile of both daily maximum and minimum temperature), cool temperatures (T10: based on 10th percentile of both daily maximum and minimum temperature), precipitation (P), drought (D), wind power (WP), and sea-level changes (S). In practice, the ACI employs the maximum number of consecutive dry days in a year (CDD) and the maximum consecutive 5-day precipitation amount in a month (Rx5day) to represent drought and precipitation, respectively. All components are presented as standardized anomalies and summarized on a monthly and seasonal basis. The standardized anomalies are determined by taking the difference between the component value in the current period and its mean value over the 1961-1990 reference period and dividing it by the standard deviation of the reference period. The ACI is the average of the standardized anomalies of six components (T10 is subtracted in the index) and thus focuses on extreme weather conditions.
The ACI is closely related to insurance rate-making and risk management. Insurance companies develop the premium for a given individual based on the information of the (homogeneous) groups or regions expected to have the same costs as the individual. Actuaries generally group individuals with the same anticipated costs together as one block and then derive a universal product premium for them. Risk grouping is an important step to utilize statistical methods more effectively and efficiently (On Risk Classification, 2011 [33]). Statistical models are typically used to predict the potential costs of providing coverage and other necessary additional expenses for these homogeneous blocks of individuals. The ACI data may provide reliable information on the frequency and severity of extreme weather events and could serve as useful variables in these statistical models.
Reinsurance is a type of insurance for insurance companies. Extreme climate events represent a possible failure for catastrophe reinsurance companies. By far, North America has the largest proportion of the reinsurance premiums capital ( [34]). In the past  30 to 40 years, the number of high-cost weather events in the US has increased by four times. In 2011 alone, the US experienced fourteen extreme climate-related events, each resulting in more than a $1 billion loss (Climate Science Watch). The occurrence of unusual extreme climate events and their potentially large losses are of concern to reinsurance companies ( [35]). Based on measurements from meteorological and coastal tide stations in the United States and Canada, the ACI is generated from observations of climate changes in temperature, sea level, precipitation, and wind speed. It is hypothesized that the ACI may improve the accuracy of estimating the frequency of rare events and contribute to building risk models that could be used by reinsurance companies in future business decisions.
The Actuaries Climate Risk Index (ACRI) is under development to represent the relationships between economic losses and climate variables using regression analysis. The intent is to develop several ACRIs that can be used for a wide range of fields related to climate risk ( [36]). However, the ACRI has not been developed for agriculture applications as of the writing of this paper. In fact, the results of this research could provide some insights into an ACRI applicable to assessing the climate risks in crop yields.

Data
This section introduces the crop data, the actuarial climate index, the data used to reconstruct the ACI, and the high-resolution climate data.

Crop Yields Data
One of the 12 subregions identified by the Actuaries Climate Index is the Midwest (MID) US, which contains eight states: Illinois, Indiana, Iowa, Michigan, Minnesota, Missouri, Ohio, and Wisconsin. The Midwest region is the region of focus for the empirical analysis in this research. A set of state-level data for yields (bushel/acre) and areas planted

The Actuaries Climate Index Data
The standardized ACI data over the same period were obtained from the Actuaries Climate Index official website (https://actuariesclimateindex.org/data/, accessed on 31 May 2022). The data include: (1) the monthly and annual ACI indices and (2) the five individual components of the ACI (there are no sea-level change records for the Midwest region). The individual components are available at a monthly frequency for the whole Midwest region, but only at an annual frequency for the individual states.
We selected the variables that are included in the statistical models based on the growing season of corn (from April to September). The combined ACI variables selected in this paper are ACI_4, ACI_5, ACI_6, ACI_7, ACI_8, and ACI_9. In addition, the individual components of the ACI are June to July CDD (CDD_6 and CDD_7), May to August Rx5Days (Rx5Day_5, Rx5Day_6, Rx5Day_7, and Rx5Day_8), April, May and September T10 (T10_4, T10_5, and T10_9), June to August T90 (T90_6, T90_7, and T90_8), and June WP90 (WP90_6). Summer is the growing season for corn, so most of the variables utilized in this research are summer variables. The reason for adopting the June wind speed is that tornados usually happen in June in the Midwest. Frozen nights often take place in the spring and autumn, so low-temperature variables in April, May, and September were selected.

The High-Level Resolution Climate Data
The ACI data is composed of grid-level data, where each grid is 2.5 degrees longitude (about 278.3 km) by 2.5 degrees latitude (about 277.5 km). Based on this, each grid is developed with state and region-level data. The grid level scale is relatively large since station-level data is commonly used in climate-related research, and the concern is that large-scale data might eliminate extreme local observations and lead to a misestimation of results. In order to investigate the impact of using more high-resolution data on crop yield prediction, we employ a high-level spatial resolution global climate dataset called TerraClimate in this research ( [37]). This dataset is based on an approximate 4 km by 4 km scale and contains monthly climate data, including historical weather data for global land surfaces, such as precipitation, PDSI, temperature, vapor pressure, etc. Variables similar to those used to construct the ACI index are selected, including PDSI, precipitation, minimum and maximum temperature, and soil moisture.

Statistical Models
This paper applies the linear regression and probit regression models as the two main statistical methods to estimate corn yields in the Midwest region of the US. Linear regression is widely used in estimation and prediction, measuring correlation and model fit.
In insurance pricing, the total expected claim amount is typically composed of the severity and frequency of the claims. A simple decomposition could be displayed by a continuous variable and a binary variable (such as the response variables discussed in Section 5.2). The binary variable could illustrate whether a claim happened, and the continuous variable could indicate the amount of a claim. GLM is a widely used method to estimate the expectation of the binary variable ( [38,39]). In particular, logit and probit models are typical models utilized in insurance pricing and are commonly used to estimate and predict the possibility of claims ( [40]). When the number of failures (the frequency of claims; also, the number of claims) is far less than the occurrence of success (the number of zeros in the response variable), and the independent variables include continuous variables, the probit model is shown to be preferred over the logit model ( [41]).
Five crop yield models were designed based on both linear and probit regression models, using different scale-level corn yields as the dependent variable and weather variables derived from the ACI and TerraClimate data as the independent variables. The details and results for each model will be discussed in Section 6.

Linear Regression Model
Linear regression is the first model applied, and is expressed as follows: where Y ik represents the weighted average corn yields of the whole Midwest (Models 1 and 2) for year i (i = 1961, . . . , 2018) or corn yields of state k in the Midwest (Model 3) for year i (i = 1961, . . . , 2016), X jik is the group of j explanatory variables described in Section 3 for year i and state k, and β j is the estimated coefficient of each explanatory variable. In addition, β 0k is the intercept for state k, t is the time effect, and ik represents the error terms following a normal distribution with (0, σ 2 ). Before estimating the models, we perform a KPSS test on the average corn yields of the Midwest region and the state-level corn yields. The results are shown in Table A7 in the Appendix A. We see that all corn yield series are non-stationary. Therefore, the time effect is included in the model specification (1) to capture this trend. For the probit model, the response variables (corn yields) are detrended and are therefore stationary. Hence, no time variable is included in the regressions.

Generalized Linear Model-Probit Regression Model
The second model is probit regression, which estimates corn yield losses with the selected variables. Since the corn yields have an increasing trend overall, detrending is necessary to prepare the data and ensure a more accurate and representative result. After detrending, we transform the corn yields into binary variables in the following two steps: Step 1: Compute the 25th percentile of the detrended corn yields; Step 2: Set the yields lower than the 25th percentile as "1", which means there is a loss, and the part above the 25th percentile data as "0", which means there is no loss.
Increased temperatures, drought conditions, and extreme weather disasters (EWDs) could reduce crop yields. For instance, corn yields decrease by about 10% with each degree Celsius increase in the US ( [42]), and EWDs would produce an average of 19.9% reduction in national cereal production in North America ( [5]). However, choosing the 25th percentile of detrended corn yields as the threshold of losses is a consideration for sample effectiveness and efficiency. A 10th percentile threshold would make the number of losses too small to generate a convincing result, particularly for Model 1, which includes only 58 observations. In addition, probit regression focuses on demonstrating that different resolution databases might produce inconsistent results for estimating crop yield losses.
The linear regression model is written as follows: The probit regression model is: where Y ik represents the weighted average corn yields of the whole Midwest (Models 1 and 2) for year I (i = 1961, . . . , 2018) or corn yields of state k in the Midwest (Model 3) for the year i (i = 1961, . . . , 2016), X jik is the group of j explanatory variables described in Section 3 for year i and state k, and β j is the estimated coefficient of each explanatory variable. Moreover, β 0k is the intercept for state k, and Φ(·) is the cumulative standard normal distribution function.

Empirical Analysis
In this section, we applied the five model specifications for crop yield prediction. Model 1 and Model 2 use the region-level annual weighted average corn yields as dependent variables and monthly combined ACI and individual components of the ACI as independent variables, respectively. Model 3 employs state-level annual corn yields and annual individual components of the ACI as dependent and independent variables. Model 4 utilizes the county-level annual corn yields in Iowa as dependent variables and monthly standardized high-level spatial resolution weather variables as independent variables. Model 5 is the same as Model 4, but the research area expands to the whole Midwest region.

Midwest Analysis with the Combined Actuaries Climate Index
In this sub-section, the effectiveness of the individual combined ACI variables is assessed for estimating the corn yields in the whole Midwest region. The linear regression results are shown in Table 1 column 2, which has an R 2 value of 0.8917 and two significant variables, July and September ACI. The coefficient of the July ACI illustrates that extreme weather conditions may have negative effects on corn yields. The coefficient of the September ACI is positive. One possible reason is that high daytime warm temperatures might offset the adverse effects of cold nights in September. The coefficients of ACI_4, ACI_5, ACI_6, and ACI_8 are not significant. Column 3 shows the linear regression results without the insignificant variables, dropping variables starting from the one with the highest p-value, step by step. All remaining variables are significant, and the R 2 value is as high as 0.8901. The fit of the linear regression model implies that changes in ACI variables explain about 89% of the changes in corn yields across the Midwest. Overall, the model is significant, as shown by the significance of the F-test statistic at a 0.1% significance level.
Column 4 is the results of probit regression and contains two significant variables as well. The pseudo R 2 value of 0.1617 is calculated by the formula:  Table A1. The second column is the result of the linear regression model. Column 3 is the result of the linear regression with fewer variables. Column 4 is the result of the probit regression model. " † ", "*",and "***" indicate significance at 10%, 5%, and 0.1% levels, respectively.

Midwest Analysis with Individual Component Variables of the Actuaries Climate Index
The results of the linear regression model with individual components of the ACI are shown in Table 2. The R 2 value is as high as 0.9548 and the F-test statistic is high enough to imply that all explanatory variables together significantly explain corn yields. The R 2 values are comparable to those reported in existing studies, such as [43], indicating reasonable goodness of fit. Furthermore, the R 2 values using the individual components are higher than using the original ACI variables. This indicates that relaxing the predetermined weights of the variables in the ACI construction could improve the predictive accuracy of corn yields.
Four individual variables are statistically significant, and one of them (T10_9) is significant at a 10% level. Using the Farrar-Glauder test to diagnose multicollinearity among variables, the results show that CDD_6 and CDD_7 are highly collinear variables. Therefore, the next step is to remove CDD_6 and CDD_7 from the linear regression. The results shown in Table 2 Table 2, column 4.
The backward stepwise selection method is employed as a comparison to check the accuracy of the manual stepwise selection results. The results shown in Table A4 column 2 have the same estimated coefficients and significance levels as the manual selection results. Variable changes during each dropping step of the manual selection method could be observed, such as the estimated coefficients, level of significance, and the value of R 2 and pseudo R 2 . This paper continues to use the manual selection method in the following analysis but provides the results from backward stepwise selection for comparison and reference in the Appendix A.
The results of the probit regression model with individual component ACI variables are shown in Table 3. We use the same variable selection approaches as for the linear regressions to obtain the simplest model, and the simplest model is shown in column 4. Only three significant variables are seen in the simplest probit model, and two of them, T10_5 and T10_9 (May and September minimum temperature), are significant at a 10% level. This result is not as good as the results from the linear regression model. However, the pseudo R 2 value of the probit model indicates a good fitness since it is close to 0.4. Furthermore, even though the pseudo R 2 is gradually decreasing, the AIC value and p-value of the Chi-square test are both declining as well. It might demonstrate that the simplest model, as a whole, has a better fit compared to the original probit model (ANOVA chi-square test is based on the current model and the null model). The backward stepwise selection results are the same as the simplest model and are shown in Table A4, column 3.  Table A1. The second column is the result of the original linear regression model. Column 3 is the result of the linear regression model without CDD variables. Column 4 is the result of the linear regression model that removed all insignificant variables. " † ", "*", and "***" indicate significance at 10%, 5%, and 0.1% levels, respectively.  Table A1. Column 2 is the result of the original probit regression model. Column 3 is the result of the probit regression model without CDD variables. Column 4 is the result of the probit regression model that removed almost all insignificant variables. " † ", "*", "**", and "***" indicate significance at 10%, 5%, 1%, and 0.1% levels, respectively.

Midwest Analysis with Annual State-Level Data
The results in the previous section show that some variables are not significant in predicting corn yields. One potential reason is that a relatively large spatial scale dataset is used and thus some extreme records may be averaged in some places. To further investigate this issue, the following analysis employs the Midwest state-level raw data of the ACI.
The ACI's state-level data is a yearly dataset rather than monthly data and only contains four independent variables, Rx5days, Tn10, Tx90, and CDD, from 1961 to 2016. Similarly, we use linear regression and probit regression models to estimate the impacts of extreme weather on the Midwest state-level corn yields. State-level corn yields are also detrended and transformed to follow a binomial distribution to prepare for the probit regression. Among the four variables, Tn10 and Tx90 represent the percentage of days when the daily minimum temperature was lower than the 10th percentile of the base period ) and the maximum temperature was higher than the 90th percentile of the base period (1961-1990), respectively. These definitions are slightly different from the terms of T10 and T90.
Before running the regression, we standardize the state-level raw data. Specifically, 30 years (from 1961 to 1990) are selected as the reference period, and then the two steps below are followed (using CDD as an example): Step 1: Calculate the mean and standard deviation of CDD in the reference period 1961 to 1990, written as µ f and σ f ; Step 2: Calculate the standardized CDD for year i in each state k within the Midwest from 1961 to 2016, using the formula The results in Table 4, columns 2 and 3, show that all explanatory variables together might be meaningful evidence since the F-test statistics are significant. Two out of the four variables are significant at least a 1% level of significance. The R 2 value of the linear regression model is 0.7504, and the pseudo R 2 value of the probit regression model is 0.0739. After adding a state effect variable to the linear regression model, the R 2 value grows to 0.7607, and the significance of the state variable might imply that locations are highly correlated to crop yields. However, all of these R 2 s are still smaller than the results in Section 6.1. This result might be due to the fact that yearly data is an averaged evidence of the entire year and ignores the monthly or daily extrema. In addition, the state-level might still be a large spatial scale and may eliminate local extrema.  Table A1. Column 2 is the result of the linear regression model. Column 3 is the result of the linear regression model added with a state effect variable. Column 4 is the result of the probit regression model. " † ", "*", and "***" indicate significance at 10%, 5%, and 0.1% levels, respectively.

Iowa Analysis with Standardized High-Level Resolution Climate Data
In this subsection, we use the TerraClimate dataset, which is generated based on approximate 4 km by 4 km grid-level data and contains monthly county-level historical weather data to predict crop yields. Especially, June to July PDSI (pdsi_6 and pdsi_7), May to August precipitation (pr_5, pr_6, pr_7, and pr_8), June to July soil moisture (soil_6 and soil_7), June to August maximum temperature (tmmx_6, tmmx_7, and tmmx_8), and April, May, and September minimum temperature (tmmn_4, tmmn_5, and tmmn_9) data from 1961 to 2018 in Iowa are selected based on the corn-growing season to estimate the corresponding county-level yearly corn yields.
We then replicate the ACI development approach by selecting a 30-year period from 1961 to 1990 as the reference period and standardizing Iowa's high-level resolution climate data in the following steps (using PDSI as an example): Step 1: Calculate the mean and standard deviation of PDSI for each month m in the reference period 1961 to 1990, written as µ f (m) and σ f (m); Step 2: Calculate the standardized PDSI for month m of year i in each county k within Iowa from 1961 to 2018, using the formula . Same as the estimation uses ACI data, the following linear regression is applied: where Y ik represents the corn yields of county k in Iowa for year i (i = 1961, . . . , 2018), X jik is the group of j explanatory variables described above for year i and county k, and β j is the estimated coefficient of each explanatory variable. β 0k is the intercept of county k, t is the time effect, and ik represents the error terms following a normal distribution with (0, σ 2 ). The results are shown in Table 5, column 2. The value of R 2 is 0.7982, indicating that this model with standardized high-level resolution climate data is appropriate. Further, except for two insignificant variables, July PDSI and June precipitation, all other variables are statistically significant. In addition, the F-test statistic is high enough to imply that all explanatory variables together significantly explain corn yields. Moreover, we remove pr_6 and pdsi_7 in the next two steps to get better-fit models as a result (shown in Table 5, columns 3 and 4), the significance level of the estimated coefficients is the same as the second column, and the R 2 value of 0.7982 does not change. Compared to the results in Section 6.1, this simplest linear regression model contains more effective coefficients. The backward stepwise selection results in the same as the simplest model and is shown in Table A6, column 2.   Table A1. Column 2 is the result of the linear regression model. Column 3 is the result of the linear regression model without June precipitation. Column 4 is the result of the same linear regression model without another insignificant variable, July PDSI. " † ", "*", and "***" indicate significance at 10%, 5%, and 0.1% levels, respectively.
The results of the probit regression model using standardized high-level resolution climate data in Iowa are shown in Table 6. Column 3 is the probit regression model removing the most insignificant variable, June soil moisture. We then apply the same approach in Section 6.1 to drop other insignificant variables, June precipitation, June maximum temperature, April minimum temperature, and July PDSI, step by step, to get the simplest probit model and display it in column 4. All remaining variables are statistically significant at a 0.1% level of significance, except for the May minimum temperature. The pseudo R 2 values of the original probit model and simplest probit model, 0.2867 and 0.2864, are very close and could represent that both models are a good fit.  Table A1. Column 2 is the result of the original probit regression model. Column 3 is the result of the probit regression model without June soil moisture. Column 4 results from the probit regression model without all other insignificant variables, July PDSI, June precipitation, and April minimum and June maximum temperature. " † ", "*",and "***" indicate significance at 10%, 5%, and 0.1% levels, respectively. Furthermore, from column 2 to column 4, the AIC value has a tiny decrease, and the p-value of the ANOVA Chi-square test is decreased but still insignificant. This demonstrates that the final simplest probit model could slightly improve the goodness of fit compared to the base model (ANOVA chi-square test is based on the current model and the original probit regression model). The backward stepwise selection results in the same as the simplest model and is shown in Table A6, column 3.

Midwest Analysis with Standardized High-Level Resolution Climate Data
We now expand the investigated area to the whole Midwest region and apply the ACI development approach to the Midwest high-resolution climate data. Similarly, we use the same self-selected variables in the linear regression model. Table 7 column 2 shows that all variables are statistically significant at a 0.1% level except for June soil moisture. The R 2 value of 0.7126 is acceptable, although it is smaller than the R 2 values of the models using ACI data. It might imply that changes in high-level resolution weather variables could explain about 71.26% of the corn yield changes across the Midwest. The F-test statistic is significant and demonstrates that all these variables together fit and explain corn yields well. Table 7. Midwest analysis with standardized high-level resolution climate data, the linear regression model.

Variables Linear Regression Linear Regression (Lasso Selection) Simplest Linear Regression (Lasso Selection)
Intercept  Table A1. Column 2 is the result of the linear regression models with variables selected based on experience. Columns 3 and 4 result from the linear regression models with variables determined by both the Lasso approach and experience, and column 4 removes one insignificant variable, May minimum temperature. "*", "**", and "***" indicate significance at 5%, 1%, and 0.1% levels, respectively.
The Least Absolute Shrinkage and Selection Operator (Lasso) regression approach is introduced to select variables and explore whether some variables could be included or dropped and increase the effectiveness of the high-level resolution climate data for estimating corn yields. Excluding the variables that are not in the growing season (broadly, April to September), variables selected to be supplementary variables in the linear regression model are May PDSI (pdsi_5), September precipitation (pr_9), June to August minimum temperature (tmmn_6, tmmn_7, and tmmn_8), and May and September maximum temperature (tmmx_5 and tmmx_9). The results are in Table 7, columns 3 and 4. All variables are statistically significant, except the May minimum temperature. After dropping it from the regression, all variables are significant. The R 2 value remains as 0.7297 and is slightly larger than 0.7126, indicating that all these variables together are fitted better than the regression with only subjectively selected variables. The F-test statistics and AIC values show the same conclusion. The backward stepwise selection method is applied to the linear regression model with the variables selected based on both the Lasso approach and experience. The results are the same as the simplest linear model and are shown in Table A7, column 2.
The results of the probit regression model with standardized high-level resolution climate data are shown in Table 8. Column 3 is the result of the simplest probit model without the insignificant variables, July PDSI and August precipitation. All remaining variables are statistically significant at a 0.1% level of significance, except for the June PDSI. The pseudo R 2 of the simplest model, 0.2341, is the same as the pseudo R 2 of the original probit regression and could represent a good fitness. Table 8. Midwest analysis with standardized high-level resolution climate data, the probit regression model.

Variables
Probit   Table A1. Columns 2 and 3 are the results of the probit regression models with the variables selected based on experience, and column 3 removes two insignificant variables, July PDSI and August precipitation. Columns 4 and 5 result from the probit regression models with variables selected based on both the Lasso approach and experience, and column 5 removes two insignificant variables, September precipitation and June minimum temperature. " † ", "*", "**", and "***" indicate significance at 10%, 5%, 1%, and 0.1% levels, respectively.
Columns 4 and 5 are the results of probit regressions using the variables selected through the Lasso approach. September precipitation and June minimum temperature are removed due to insignificance, and the simplest model is shown in column 5. With more variables in the probit regression, the pseudo R 2 of these two models is moderately improved to 0.2536. In addition, the AIC value and p-value of the ANOVA Chi-square test of the probit regression model with more variables (Lasso selection) are smaller than the values of the probit model only with self-selected variables and demonstrate that the probit model with more variables could better explain the effectiveness of the high-level resolution ACI for estimating corn yield losses (the ANOVA Chi-square test is based on the current model and the original probit regression). The backward stepwise selection method is applied to the original probit model and the probit model with the variables selected based on both the Lasso approach and experience. The results are the same as the simplest models and are shown in Table A7, columns 3 and 4.

Rolling Window Predictive Analysis
Prediction is critical in insurance pricing, as mentioned earlier, with which actuaries could develop and renew the premium of insurance products. To evaluate the effectiveness of the ACI and high-level resolution ACI for predicting crop yields, this paper employs rolling window regression and the Diebold and Mariano test to compare the predictive accuracy and forecast ability of the four linear regression models described in Section 5. Selecting a 30-year window width and running rolling window regression, we then compare each regression with a corresponding simple random walk model. The results are shown in Table 9, and p-values could reveal the predictive accuracy of the two methods in each model. Based on α = 0.05, the p-values suggest that the two methods in Model 1 and Model 2 have the same forecast accuracy, while the linear regressions of Model 3 and Model 5 are more accurate than the simple random walk model, which might illustrate that the higher-level resolution ACI is better at predicting corn yields than the ACI. Table 9. Predictive ability of different spatial scale ACIs.

Conclusions and Discussion
This paper examines the effectiveness of the Actuaries Climate Index (ACI), which is a climate index published recently by multiple actuarial associations in North America, in corn yield prediction and yield (re)insurance ratemaking. The ACI is intended to help actuaries, insurance companies, governments, and the general public understand the potential effects of climate trends and extreme weather events better. We constructed five linear regression models and five probit regression models to examine the predictive power of the ACI and related climate variables on different geographical and temporal scales. The probit regressions are used to identify insurance claims, which occur when the corn yield falls below a certain threshold calculated from the historical yields. When an insurance claim occurs, the linear regression models will then be used to predict the corn yield in that year, based on which the insurance severity can be calculated.
The empirical results show that the ACI provides valuable information in predicting crop yields and yield losses, as the goodness of fit from both the linear and the probit regression models are satisfying. However, we also see that when the weights of the climate variables which constitute the ACI could be determined using a data-driven method rather than being predetermined, the predictive accuracy could be further improved.
Moreover, in order to investigate the predictive power of the ACI at higher resolution levels, we construct the ACI index at the county level using a dataset (TerraClimate dataset) consisting of 4 km by 4 km resolution climate data. We find that, although the predictive accuracy at the county level is slightly worse than that at the Midwest region level, possibly due to noisier data at the county level, the predictive accuracy is also reasonable. The county-level data are more relevant to insurance practices because real-life insurance products typically do not cover corn yields of as large a geographical location as a state or the whole Midwest region (which consists of eight states). Hence, the ACI could further benefit the insurance industry if it could be developed to be a higher-resolution index.

Future Research
There are several directions for future research. First, since the aim of this paper is to evaluate the effectiveness of the Actuaries Climate Index, only the ACIs and the related climate variables are included in the regression models. It would be interesting to include additional variables that capture the potential spatial-temporal correlations of corn yields in the regression models in future research. Second, more advanced statistical or machine learning models, such as distributional forest and neural networks, could be used to develop accurate predictive models of corn yields. Third, utilizing the established predictive models, it would be interesting to explore the possibility of constructing indexbased corn yield insurance products, the payments of which are not determined by the actual loss of the insured farms but by the realization of an external index which cannot be affected by the behavior of the farmers (the insured). Data Availability Statement: Data are not publicly available, though the data may be made available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Abbreviations and descriptions of variables included in the dataset.

Variable Abbreviation
Variable Description CDD Maximum number of consecutive dry days in a year with precipitation less than one millimeter Rx5Day Maximum rainfall per month in five consecutive days T10 Change in frequency of cooler temperatures below the 10th percentile

Tn10
Percentage of days when the daily minimum temperature is less than the 10th percentile of the reference period T90 Change in frequency of warmer temperatures above the 90th percentile   174.2 *** 0.0001338 *** Note: Results from backward stepwise selection show the same significant variables as the results from the manual selection approach. While the results are the same, the process of the manual selection method would provide information about the changes in estimated coefficients, significance levels, R 2 (Pseudo R 2 ) values, and F-test statistics during each step. " † ", "*", "**", and "***" indicate significance at 10%, 5%, 1%, and 0.1% levels, respectively. Note: Results from backward stepwise selection show the same significant variables as the results from the manual selection approach. While the results are the same, the process of the manual selection method would provide information about the changes in estimated coefficients, significance levels, R 2 (Pseudo R 2 ) values, and F-test statistics during each step. " † ", "*", and "***" indicate significance at 10%, 5%, and 0.1% levels, respectively.   Results from backward stepwise selection show the same significant variables as the results from the manual selection approach. While the results are the same, the process of the manual selection method would provide information about the changes in estimated coefficients, significance levels, R 2 (Pseudo R 2 ) values, and F-test statistics during each step. " † ", "*", "**", and "***" indicate significance at 10%, 5%, 1%, and 0.1% levels, respectively. Results from backward stepwise selection show the same significant variables as the results from the manual selection approach. While the results are the same, the process of the manual selection method would provide information about the changes in estimated coefficients, significance levels, R 2 (Pseudo R 2 ) values, and F-test statistics during each step. " † ", "*", "**", and "***" indicate significance at 10%, 5%, 1%, and 0.1% levels, respectively.