## 1. Introduction

Damage from natural disasters in the United States, driven in large part by Hurricane Harvey and Hurricane Irma, reached a record high of

$313 billion in 2017 (

NOAA NCEI 2018). Hurricanes now account for seven of the top ten costliest disasters in the United States since 1980. They can have substantial impacts on local economic growth (

Strobl 2011), fiscal outlays (

Deryugina 2017), lending and borrowing patterns (

Gallagher and Hartley 2017), and on where people live and work (

Deryugina et al. 2018). Given the severity of these impacts, it is important to understand what drives the destructiveness of these events.

While there are many determinants of damage, hurricanes are local events whose impacts are affected by individuals’ and businesses’ adaptation choices. Decisions about how and when to protect property or evacuate are made in advance and rely on forecasts of the storm; see

Shrader (

2018);

Kruttli et al. (

2019) and

Beatty et al. (

2019). However, hurricane forecasts, despite dramatic improvements, are far from perfect and can exhibit large and unexpected errors. These errors, even up to just a few hours ahead, can lead individuals in a disaster area to protect their property less than they would have otherwise and lead to higher damage.

In this paper, I evaluate whether hurricane forecast accuracy matters for aggregate hurricane damage. I start by formulating an empirical model of damage with many determinants and then use model selection methods to determine the best damage model specification. Next, I exploit the natural variation in the forecast errors to quantify the impact of forecast accuracy on hurricane damage. Short-term forecast errors together with a handful of other variables, explain most of the variation in aggregate hurricane damage over the past sixty years. I find that a one standard deviation increase in the hurricane strike location forecast error is associated with up to $9000 in additional damage per household affected by a hurricane. This is about 18 percent of the typical post-hurricane flood insurance claim.

Little attention has been paid to the role that forecasts of natural disasters can play in reducing their destructiveness. For example

Deryugina et al. (

2018, p. 202) ignores the role of forecasts by claiming that Hurricane Katrina “struck with essentially no warning”. Others focus only on potential long-term impacts by arguing that improved forecasts can increase damage in the longer-term as people perceive declining risks and relocate to more hurricane-prone areas;

Sadowski and Sutter (

2005) and

Letson et al. (

2007). I take a different approach by using a semi-structural partial-equilibrium framework to show that the cumulative damage prevented due to improvements in forecast accuracy since 1970 is about

$82 billion. Furthermore, the cumulative net benefit is between

$30–71 billion after subtracting estimates of what the U.S. federal government has spent on hurricane operations and research. This illustrates that improvements in the forecasts produce benefits beyond the well-documented reduction in fatalities and have outweighed the associated costs.

I also assess the importance of forecast uncertainty by adapting

Rossi and Sekhposyan (

2015)’s measure of uncertainty to allow for time-varying forecast-error densities. I decompose this measure into the ex-post forecast errors and its ex-ante standard deviation. This allows me to test whether errors in ex-ante beliefs about the storm or the strength of those beliefs play greater roles in altering damage from natural disasters. I find that the errors in ex-ante beliefs matter most for hurricane damage.

The rest of the paper is structured as follows:

Section 2 describes the modeling framework and

Section 3 describes the forecasts. The model selection methods are described in

Section 4 and

Section 5 presents the results and predictive performance.

Section 6 considers alternative approaches for measuring the impact of forecast accuracy on damage and its implications while

Section 7 concludes.

## 2. Modeling Framework

I model hurricane damage using a Cobb-Douglas power function composed of natural forces

${\mathit{F}}_{i}$, vulnerabilities

${\mathit{V}}_{i}$, and adaptation

${\mathit{A}}_{i}$. Taking logarithms gives an expression for hurricane damage

where bold terms are vectors. This is an extension of

Bakkensen and Mendelsohn (

2016) to allow for adaptation under imperfect information. If adaptation is described by the relationship in

Bakkensen and Mendelsohn (

2016):

$ln\left({A}_{i}\left({\mathit{V}}_{i},{\mathit{F}}_{i}\right)\right)={\mathit{\gamma}}_{1}ln\left({\mathit{V}}_{i}\right)+{\mathit{\gamma}}_{2}ln\left({\mathit{F}}_{i}\right)$, where

${\mathit{\gamma}}_{j}$ are matrices, then (

1) can be re-parameterized by adding and subtracting adaptation under perfect information to get

where the final term captures the distance between the actual and predicted natural forces. This allows me to study the implications of short-term accuracy for damage, which is the focus here, as well as longer-term accuracy from seasonal and climate predictions.

The formulation can be extended to capture the joint impact of forecast accuracy and uncertainty around the forecast. The final term in (

2) can be replaced with a general measure of forecast uncertainty:

where

$\mathit{\eta}=\mathit{\delta}{\mathit{\gamma}}_{2}$,

${U}_{i}\left(\xb7\right)$ jointly captures accuracy and uncertainty for the forecast, and

${\mathit{\u03f5}}_{i}$ indicates the residual or approximation error when going from (

2) to (

3).

This equation embeds many existing models of hurricane damage.

Emanuel (

2005),

Nordhaus (

2010) and

Strobl (

2011) implicitly set

$\mathit{\eta}\equiv \mathit{0}$ and

$\mathit{\alpha}\equiv \mathit{1}$ to examine the relationship between damage and natural hazards. Others set

$\mathit{\eta}\equiv \mathit{0}$ to investigate the relationship between damage and vulnerabilities; see

Kellenberg and Mobarak (

2008) and

Geiger et al. (

2016).

Bakkensen and Mendelsohn (

2016) allow for

$\mathit{\eta}\ne \mathit{0}$ but implicitly assume

${U}_{i}\equiv \mathit{1}$. This means that they are unable to identify

$\mathit{\eta}$ directly but do so indirectly through the coefficient on income.

There are many ways to measure forecast uncertainty. A popular measure is the mean square forecast error (MSE); see

Ericsson (

2001).

Jurado et al. (

2015) propose a time-varying MSE measure across variables using a dynamic factor model with stochastic volatility. Alternatively, the log score (see

Mitchell and Wallis 2011) evaluates the predicted density,

${\widehat{g}}_{i}\left(\xb7\right)$, at

${\mathit{F}}_{i}$ conditional on the prediction

${\widehat{\mathit{F}}}_{i}$. When

${\mathit{F}}_{i}$ falls in the tail of

${\widehat{g}}_{i}\left(\xb7\right)$, it has a lower probability and so is associated with higher uncertainty. Another measure is the continuously ranked probability score, which compares against the cumulative distribution function.

Rossi and Sekhposyan (

2015) propose a measure of forecast uncertainty based on the unconditional likelihood of the observed outcome. Their measure, in the context of a single variable, is computed by evaluating the predicted cumulative distribution function at

${F}_{1i}$
which captures how likely it is to observe

${F}_{1i}$ given the predicted distribution. Their innovation is that

${\tilde{g}}_{i}\left(\xb7\right)$ is computed using historical forecast errors. While

Rossi and Sekhposyan (

2015) focused on the full-sample distribution, it is possible to allow

${\tilde{g}}_{i}\left(\xb7\right)$ to change across events (or time) so as to capture any changes in the predicted distribution; see

Hendry and Mizon (

2014). This measures the forecast accuracy in the context of the ex-ante uncertainty when the forecast was produced.

When

${\tilde{g}}_{i}\left(\xb7\right)$ roughly follows a normal distribution then for small distances between

${F}_{1i}$ and

${\widehat{F}}_{1i}$, (

4) can be approximated as

where

${\widehat{\sigma}}_{i,({F}_{1}\_,{\widehat{F}}_{1}\_)}$ is the time-varying standard deviation of the predicted distribution based on historical forecast errors. It represents the ex-ante risk (in a Knightian sense) ascribed to the forecast at the time of the forecast. If (

5) has an absolute value greater than one, then the forecast error falls outside of its expected mid-range and is associated with greater uncertainty. An absolute value less than one indicates there is less uncertainty since the forecast error is within the expected range. This is related to other comparisons of ex-post and ex-ante uncertainty; see

Clements (

2014) and

Rossi et al. (

2017).

The measure is generalized further by taking logs and relaxing the fixed 1-to-1 relationship between forecast accuracy and ex-ante risk

Plugging (

6) back into the model for damage in (

3) allows me to evaluate which aspect of forecast accuracy or uncertainty matters the most for hurricane damage.

## 3. Hurricane Forecast Errors, Uncertainty and Damage

The National Hurricane Center (NHC) maintains all hurricane forecasts produced since it was established in 1954. The NHC’s ‘official’ forecasts form the basis for hurricane watches, warnings and evacuation orders and are widely distributed to and used by news outlets. The forecasts are not produced by a single model but are a combination of different models and forecaster judgment which changes over time; see

Broad et al. (

2007).

Forecasts of the track and intensity are generated every 6 h for the entire history of a storm. While forecasts can extend out over 120-h in advance, I focus on the 12-h-ahead track forecasts. This is motivated by the fact that individuals typically wait until the last minute for their adaptation efforts and often focus on the forecast track; see

U.S. Army Corps of Engineers (

2004) and

Milch et al. (

2018). The track is also an integral part of the NHC’s forecasts of intensity including rainfall (

Kidder et al. 2005), wind speed (

DeMaria et al. 2009), and storm surge (

Resio et al. 2017). The 12-h-ahead forecasts are available for virtually every U.S. hurricane going back to 1955.

1I start by computing the 12-h-ahead forecast errors from every tropical storm in the North Atlantic since 1954. This includes 14,641 forecast errors from 744 storms. Forecast errors are calculated as the distance between the actual track of the storm using

Vincenty (

1975)’s formula for the surface of a spheroid. Next, I estimate the time-varying forecast error densities for every year since 1955 using a rolling window of the past five years.

2Figure 1 shows how the densities changed over time. Historically they were skewed to the right with long fat tails due to large outliers. Skewness declined dramatically after 1985 and a truncated normal distribution now appears to be a good approximation. Improved accuracy was driven by the use of satellites and supercomputers; see

Rappaport et al. (

2009).

The 67th percentile of each density is similar to how the NHC measures the radius of the ‘cone of uncertainty’. The radius is measured as the distance that captures 2/3 of all forecast errors in the past five years and denotes the uncertainty associated with the forecast location.

3 Thus, I refer to this as the time-varying ex-ante uncertainty.

The most relevant forecasts for hurricane damage are those that are made just before landfall. In total, 88 hurricanes made landfall between 1955 and 2015. Accounting for the fact that some hurricanes struck multiple locations, i.e., Katrina [2005] first crossed the Florida panhandle and then moved into the Gulf and struck Louisiana several days later, there were 101 strikes. I focus on the 98 strikes for which forecasts are available.

Since the forecast and location of the hurricane is only updated at six-hour intervals, I round the timing of each landfall to the closest point in that interval. Therefore, a hurricane landfall at 16:00 Universal Time (UTC) is rounded to 18:00 UTC. I subtract the length of the forecast horizon from the landfall time to get the time at which the 12-hour-ahead ‘landfall forecast’ was generated. So the 12-hour-ahead landfall forecast of a storm that made landfall at 18:00 UTC was generated at 6:00 UTC.

I then compute the landfall forecast error by calculating the distance between the forecast made 12 hours before landfall and the location at landfall. Although the forecast errors are purely distance measures, it is possible to derive forecast error directions based on the location of the hurricane when the forecast was generated. Panel A of

Figure 2 plots the spatial distribution of the landfall locations and forecast error directions for most hurricane strikes since 1955. It shows that there is a wide geographic distribution along the Gulf and Atlantic coasts with landfalls ranging from southern Texas to eastern Maine. There also does not appear to be any systematic pattern in the error directions.

Panel B of

Figure 2 plots the 12-h-ahead landfall forecast errors for all hurricane strikes along with the estimated ex-ante uncertainty from 1955–2015. The large variation in the forecast errors over time will allow me to identify if they have any impact on damage. Although ex-ante uncertainty has declined gradually since the 1990s, the forecast errors themselves do not have a clear trend despite the fact that larger errors do occur somewhat more often earlier on in the sample. Forecast errors exceed the ex-ante uncertainty in about 12 percent of all strikes; most recently for Sandy [2012].

I collate hurricane strike damage from multiple sources. I primarily rely on annual Atlantic Hurricane Season reports following

Pielke and Landsea (

1998)

4. I supplement and update these numbers using the latest tropical cyclone report for each storm and

Blake et al. (

2011). The resulting dataset is similar to

Pielke et al. (

2008). However, it is higher than NOAA’s ‘Storm Events’ database (which suffers from under-reporting; see

Smith and Katz 2013) and is somewhat lower than NOAA’s ‘Billion-Dollar’ database, which uses a broader definition of damage; see

Weinkle et al. (

2018)

5.

## 4. Model Selection Methods

While the relationship between hurricane damage and forecast accuracy is the primary interest of the analysis, there are many other potential determinants of damage. I use model selection to focus the analysis on those variables which are most important for damage. The approach I follow is broadly defined as an automatic general-to-specific (

Gets) modeling framework, which is described in detail by

Hendry and Doornik (

2014). Developments by

Hendry et al. (

2008),

Castle et al. (

2012),

Castle et al. (

2015),

Hendry and Johansen (

2015) and

Pretis et al. (

2016), illustrate its usefulness across a range of applications.

The approach starts with the formulation of a general unrestricted model (GUM) that includes all potentially relevant explanatory variables. It is assumed that the residuals of the GUM are iid normal but this can be relaxed in extensions of the framework

Theory-driven variables that are not selected over are denoted by ${x}_{s}$ and all variables that will be selected over are denoted by ${z}_{n}$. It is also assumed that the GUM is potentially sparse and nests the underlying local data generating process (LDGP). This ensures that Gets consistently recovers the same model as if selection began from the LDGP. Thus, formulation of the GUM is an integral part of the modeling process.

The properties of

Gets model selection are easily illustrated when each

${z}_{n}$ is orthogonal. In this case, selection can be performed by computing the sample of t-statistics under the null hypothesis that

${H}_{0}:\phantom{\rule{0.277778em}{0ex}}{\gamma}_{n}=0\phantom{\rule{0.277778em}{0ex}}\forall n\in N$. The squared t-statistics are then ordered from largest to smallest

where

$\alpha $ determines the critical value above which variables are retained. In this setup the decision to select is only made once, i.e., ‘one-cut’. Under the null hypothesis, the average number of irrelevant variables retained is

which shows that

$\alpha $ determines the false-retention rate of variables in expectation, i.e., the ‘gauge’ (

Castle et al. 2011;

Johansen and Nielsen 2016). The gauge plays the same role as regularization does in other machine learning or model selection procedures in that it controls the loss of information in the selection procedure; see

Mullainathan and Spiess (

2017). However, unlike other regularization parameters which are determined based on in-sample model performance, the gauge has a theoretical interpretation and is typically chosen based on the size of the GUM,

$\alpha =1/N$, so that

on average one irrelevant variable is kept.

If the variables are correlated then ‘one-cut’ selection is no longer appropriate. However, the approach can be extended by sampling different blocks of variables at a time and ensuring that any reduction satisfies the ‘encompassing principle’ as determined by the gauge; see

Doornik (

2008). This can be augmented with a multi-path tree search and diagnostic tests so as to minimize potential path dependencies and to ensure a statistically well-specified model; see

Hendry and Doornik (

2014).

The final model provides a parsimonious explanation of the GUM conditional on the acceptable amount of information loss as determined by the gauge. When multiple models are retained then information criteria are used to select between otherwise equally valid models. Alternatively, ‘thick modeling’, as proposed by

Granger and Jeon (

2004) and discussed in a

Gets framework by

Castle (

2017), can be used to pool selected models.

There are many different model selection algorithms available. I use the multi-path block search algorithm known as ‘Autometrics’ available in PcGive; see

Doornik (

2009) and

Doornik and Hendry (

2013). An alternative multi-path search algorithm is implemented using the ‘gets’ package in R; see

Pretis et al. (

2018). I also compare the results by performing model selection using regularized regression methods (i.e., Lasso) as implemented in the ‘glmnet’ package in R; see

Friedman et al. (

2010).

## 5. Selecting a Model of Damage

The GUM that I estimate includes most potential determinants of hurricane damage and several controls for spatial and temporal heterogeneity. It contains 37 explanatory variables and is estimated over a sample of 98 observations. Lower case variables are in logs:

The first line includes the vulnerabilities (V): housing unit density (hd), income per housing unit (ih) and ‘real-time’ hurricane strike location frequency (FREQ). The first two lines list the natural hazards (F): maximum rainfall (rain), storm surge (surge), negative minimum central pressure (press), maximum wind speed (wind), soil moisture relative to trend (MOIST), accumulated cyclone energy (ace) and global surface temperature (GST).

6 The second line captures forecast accuracy and uncertainty (U): 12-hour-ahead forecast track errors (forc12) and the ex-ante uncertainty (radii12). The last line lists additional spatial and temporal controls: strike and annual trends, month dummies, hour dummies (to control for the six-hour period in which the landfall occurred), and U.S. state dummy variables. See

Appendix B for further details on how the additional variables were constructed.

I use

Gets model selection to discover the most important drivers of hurricane damage without selecting over forecast accuracy and uncertainty. Since the GUM has 35 variables to select over, I set the target gauge equal to

$\frac{1}{35}\approx 0.03$. There are

${2}^{35}$ (>34 billion) possible model combinations when allowing for every variable to be selected over. For a target gauge of 3 percent, the selection algorithm narrows the search space to

${2}^{17}$ (<200 thousand). It eliminates entire branches of models and ultimately only estimates 304 candidate models. The algorithm finds 8 terminal models as acceptable reductions of the GUM; see

Appendix A Table A1. The final model is selected from the terminal models using the Bayesian information criterion (BIC).

Column (1) of

Table 1 illustrates the most important drivers of damage using

Gets. Wind speed does not appear. Instead, minimum central pressure is included. Even when wind speed is included in the model with central pressure, see

Appendix A Table A1, the estimated coefficient is the wrong sign. This is consistent with

Bakkensen and Mendelsohn (

2016), who find that central pressure provides a more reliable explanation of damage. I find more disaggregated measures of hazards with rainfall and storm surges: rainfall causes inland flooding whereas storm surges damage the coast. I also find that greater housing density and higher incomes are associated with more damage.

The results change with other selection methods. Lasso shrinks the estimated coefficients using a penalty, which is chosen here using BIC. The post-selection OLS coefficients (and standard errors) are shown in column (2). All of the variables retained by Gets are also retained by Lasso. However, an additional hazard and a proxy for long-term adaptation are both kept, which is consistent with the fact that BIC corresponds to a looser target gauge of 9 percent for the given sample size and number of variables; see

Campos et al. (

2003).

To account for possible model misspecification and to assess the robustness of the selected model, I extend the

Gets model by selecting over strike fixed effects to capture outliers relative to the model using impulse indicator saturation (IIS; see

Hendry et al. 2008;

Johansen and Nielsen 2016) and over squares of the explanatory variables to capture potential non-linearities. I find a non-linear relationship between income and damage; which is consistent with

Geiger et al. (

2016). I also find multiple outlying strikes which capture measurement issues in the 1950s as well as glancing landfalls; see

Appendix A Table A2. Since the outliers are all negative and roughly the same magnitude, I combine them into a single dummy variable to obtain a more parsimonious model presented in column (3).

Forecast accuracy is significant across the different models. In each case an increase in the errors is associated with an increase in damage even after controlling for ex-ante uncertainty. To put their relative importance into context, the standardized coefficients from the Robust Gets model indicate that it takes roughly a 5 standard deviation shock for forecast errors to have the same impact on hurricane damage as a 1 standard deviation shock to central pressure. This is roughly the same size that is required for rainfall. A simple back of the envelope calculation indicates that, on average, a one standard deviation reduction in the distance between where a hurricane is expected to strike and where it actually strikes is associated with a decline in damage by up to

$9000 in per affected household. This represents about 18 percent of the typical insurance claim or about 4 percent of the total replacement value of a typical at risk home in 2018

7.

Next I consider how sensitive the results are to alternative measures of forecast accuracy and uncertainty.

Table 2 presents the results. Column (1) presents a restricted robust gets model which excludes ex-ante uncertainty. Column (2) presents a theory consistent restriction of the robust gets model by imposing a fixed relationship between the forecast errors and the radius of uncertainty; see (

5). In both cases these restrictions are strongly rejected. Column (3) uses

Rossi and Sekhposyan (

2015)’s measure of uncertainty in (

4) with a non-parametric time-varying forecast-error distribution based on rolling window of the past five years of forecast errors. Column (4) considers a weighted RMSE measure which quickly decays after landfall.

The results in

Table 2 indicate that there is a positive and statistically significant link between alternative measures of forecast accuracy or uncertainty and damage. The size of this relationship is fairly stable regardless of which measure of accuracy or uncertainty is considered. Columns (3) and (4) indicate that imposing a normal distribution on the historical forecast errors does not substantially alter the parameter estimate or increase the noise.

To assess the stability of the models and to ensure that they are not overfitting the data, I evaluate the out-of-sample performance for hurricane strikes between 2016 and 2019. I assess performance relative to two theory-driven models; see

Nordhaus (

2010) and

Strobl (

2011) and

Bakkensen and Mendelsohn (

2016). These models are much simpler in that they only include one natural hazard (central pressure) and one vulnerability (income). To ensure that outliers do not bias the results, I estimate each of these models with a glancing strike dummy variable and without ex-ante uncertainty due to its general insignificance. This ensures that no model has an undue advantage. Thus, the only difference between the Gets and Robust Gets model in this exercise is that the latter includes a non-linear term for income.

Table 3 shows that each of the selected models outperforms the theory-driven models across each of the accuracy metrics. However, the robust model does substantially better than any other model. This is driven by Harvey [2017] where the robust model almost perfectly predicts official damage. However, as

Appendix A Figure A1 shows, the robust model consistently outperforms the other models. This suggests that performance is substantially improved when accounting for the non-linear effect of income. It also shows that the models are robust without overfitting the data.

## 7. Conclusions

In this paper I evaluate the relationship between forecast accuracy and hurricane damage using a model of damage for all hurricanes to strike the continental United States in the past 60 years. I start with many possible drivers of damage and use model selection methods to determine the most important ones. I show that a small sub-set of drivers explains most of the historical damage and performs well for the latest hurricane strikes.

Despite specifying a richer model than previous studies, I find that forecast accuracy matters for damage. This relationship is positive, statistically significant and is robust to outliers, alternative measures of accuracy, model specifications, out-of-sample storms, and additional controls. On average, a one standard deviation increase in the distance between the storm’s predicted and actual landfall location translates on average into roughly $\$9000$ additional dollars in damage per affected household. This value is equivalent to about 18 percent of the typical flood insurance claim or four percent of the total replacement value of a typical home at risk of hurricane damage in 2018.

Using a counterfactual exercise, I show that improvements in the forecasts since 1970 resulted in total damage being approximately $\$82$ billion less than it otherwise would have been. Although damage increased due to changes in vulnerabilities and natural hazards, improvements in forecast accuracy along with longer-term adaptation efforts have kept it from rising faster than it otherwise would have. I find that there is a net benefit of around $30–71 billion when compared against the cost of producing the forecasts. This illustrates that improvements in the forecasts produce benefits beyond the well-documented reduction in fatalities and outweighed the associated costs.

This is particularly important since hurricanes are expected to become more difficult to forecast in the future.

Knutson et al. (

2010) argue that climate change will increase hurricane intensity and there will be more hurricanes, such as Harvey [2017], whose dynamics are hard to predict (

Emanuel 2017a,

2017b). In light of this reality, these findings support maintained investment in and further measures to improve hurricane forecasting capabilities along with other longer-term adaptation efforts so that any future loss of life and property is minimal.