Quantitative Evaluation of the Haines Index’s Ability to Predict Fire Growth Events

: The Haines Index is intended to provide information on how midtropospheric conditions could lead to large or erratic wildﬁres. Only a few studies have evaluated its performance and those are primarily single ﬁre studies. This study looks at 47 ﬁres that burned in the United States from 2004 to 2017, with sizes from 9000 ha up to 218,000 ha based on daily ﬁre management reports. Using the 0-h analysis of the North American Model (NAM) 12 km grid, it examines the performance of the start-day Haines Index, as Haines (1988) originally discussed. It then examines performance of daily Haines Index values as an indicator of daily ﬁre growth, using contingency tables and four statistical measures: true positive ratio, miss ratio, Peirce skill score, and bias. In addition to the original Haines Index, the index’s individual stability and moisture components are examined. The use of a positive trend in the index is often cited by operational forecasters, so the study also looks at how positive trend, or positive trend leading to an index of 6, perform. The Continuous Haines Index, a related measure, is also examined. Results show a positive relationship between start day index and peak ﬁre daily growth or number of large growth events, but not ﬁnal size or duration. The daily evaluation showed that, for a range of speciﬁed growth thresholds deﬁning a growth event, the Continuous Haines Index scores were more favorable than the original Haines Index scores, and the latter were more favorable than the use of index trends. The maximum Peirce skill score obtained for these data was 0.22, when a Continuous Haines Index of 8.7 or more was used to indicate a growth event, 1000 ha/day or more would occur.


Introduction
Originally published by Haines [1]-hereafter H88-as the Lower Atmospheric Severity Index, the Haines Index is intended to indicate the potential for large or erratic fires. It filled a need expressed among fire weather forecasters and fire managers for information about atmospheric conditions above ground-conditions believed capable of influencing a fire but not directly observable at the surface or incorporated in surface-based fire danger measures. The design of the Index built on the work of Brotak [2] and Brotak and Reifsnyder [3]. These works examined the co-occurrence of fire with such conditions as atmospheric instability, dry air advection, and wind shear.
The Index has two components and three elevation-based variants. The A component is a measure of static stability, in the form of the temperature difference between two specified standard pressure levels. The B component measures dryness as dewpoint depression at a specified pressure level. The pressure levels for both components depend roughly on surface elevation. The A and B components each have a value of 1 (more stable, or wetter), 2, or 3 (more unstable, or drier), and are added together to yield an Index value of 2 to 6, with 6 expected to represent high potential for large or erratic fire.
Potter [4] discusses the incomplete nature of the original work, which was acknowledged by Haines. The fire data were a small, subjectively chosen sample. The climatology was rudimentary,

Fire Data
Fire data suitable for evaluation of danger, weather, or behavior indices is a perennial challenge for research. The only data consistently available, and even this is not without problems, are the daily size and growth data recorded for operational purposes. More recently, satellite measurements of fire radiative power are available, but spatial and temporal coverage of these measurements are much narrower. This study, because it seeks to examine numerous fires over multiple days, uses the more historic daily size records. While areal growth is not specifically what H88 states the index predicts, it is the only readily available fire characteristic, and it is of direct interest to fire managers.
Daily growth data were acquired for 47 fires, 45 from the western United States and two from the northeastern United States ( Figure 1 and Table 1). The fires were originally selected for a separate study, and comprise two sets. One set is fires over 36,400 ha, the other set is fires between 8100 and 30,400 ha, chosen primarily for their proximity to the fires in the first set. The separation of the two sets for the separate study was not maintained for this analysis. The fires in the combined set range from 9000 ha to 218,000 ha in final size, and occurred between 2004 and 2017. Growth data came from four sources: archived fire progression maps, ICS-209 reports, incident infrared (IR) overflight measurements, and the national daily Incident Management Situation Report (IMSR). In comparing the four sources for individual fires, it became clear that the individual daily IR overflight data were most accurately date-stamped. These size measurements and time stamps generally, but not always, carried over to the progression maps. Because the IR flights occur at night, their observations do not appear in the ICS-209 reports until the next afternoon, typically. And what appears in the ICS-209 reports does not get carried into the IMSR until the next day. Thus the ICS-209 sizes are often the fire's size at the end of the previous day and the national report is yet another day behind. When this sort of lag could be confirmed, data were adjusted to match the progression or IR sizes and growth, with the ICS-209 or Atmosphere 2018, 9,177 3 of 17 situation report sizes only being used to fill gaps in the progression or IR records. (Sometimes the IR data do not get filed correctly, or the progression map skips a day.) Fires for which progression or IR measurements were not available were not individually adjusted in any way. Because of these quality control measures, final fire sizes and end dates listed in Table 1 may not agree with official fire sizes or dates.
Fire growth was measured in three different ways for this study. The simplest growth measure was daily areal growth for each day i, ∆A i . The main focus of the analyses examines this growth metric, as it is the one most readily usable by the operational fire community.  Fire growth was measured in three different ways for this study. The simplest growth measure was daily areal growth for each day i, ΔAi. The main focus of the analyses examines this growth metric, as it is the one most readily usable by the operational fire community.
The second measure was the ratio of each day's areal growth, divided by the lifetime average daily area growth for that fire. This measure identifies anomalously large or small area increases, relative to the rest of a given fire: where is the areal growth ratio for day i, D is the fire's duration and Afinal is the fire's final size. The third measure is slightly more complicated. It reflects the fact that a given increase in area represents a higher rate of spread when it occurs on a small fire, rather than a large fire. To reflect this, each day's area is converted to the radius of a circle of the same area. The difference in successive days' radii is then compared to the average radial growth rate over the fire's life-time, producing what could be considered a relative measure of equivalent circle radial growth. Mathematically, this is determined as: where the relative radial growth for day i. Whenever there was a gap in the growth data, the days of missing data and the first day with a new size reported were dropped from the analysis for all growth measures.

Meteorological Data
Mid-tropospheric temperature and dewpoint data for the Haines Index were obtained from the National Weather Service's North American Model (NAM) 0000 UTC analysis, 0-h forecast. The NAM grid 218, with grid spacing of 12 km, was used. Data at the requisite pressure levels for the mid-or high-level HI were extracted for the grid point nearest to the ICS-209 listed location of each The second measure was the ratio of each day's areal growth, divided by the lifetime average daily area growth for that fire. This measure identifies anomalously large or small area increases, relative to the rest of a given fire: where ϕ Ai is the areal growth ratio for day i, D is the fire's duration and A final is the fire's final size. The third measure is slightly more complicated. It reflects the fact that a given increase in area represents a higher rate of spread when it occurs on a small fire, rather than a large fire. To reflect this, each day's area is converted to the radius of a circle of the same area. The difference in successive days' radii is then compared to the average radial growth rate over the fire's life-time, producing what could be considered a relative measure of equivalent circle radial growth. Mathematically, this is determined as: where ϕ ri the relative radial growth for day i. Whenever there was a gap in the growth data, the days of missing data and the first day with a new size reported were dropped from the analysis for all growth measures.

Meteorological Data
Mid-tropospheric temperature and dewpoint data for the Haines Index were obtained from the National Weather Service's North American Model (NAM) 0000 UTC analysis, 0-h forecast. The NAM grid 218, with grid spacing of 12 km, was used. Data at the requisite pressure levels for the midor high-level HI were extracted for the grid point nearest to the ICS-209 listed location of each fire. This usually corresponds to the location of the fire start. None of the fires studied were in the low-elevation HI variant area.
Haines Index A and B components (HI A and HI B ) were computed from the NAM data, retaining the actual temperature differences and dew point depressions used to obtain the integer index values. These same data determine the C-Haines.

Analysis and Statistical Methods
There are two primary analyses applied here, one of which has several subdivisions. The first analysis considers the index's performance on the terms originally used and considered [1]. Specifically, the value of the HI on the day each fire started is considered as an indicator of that fire's overall potential for large or explosive growth. The 47 fires are sorted based on first-day HI and examined to determine whether higher values of the index correspond to fires that are ultimately larger, last longer, or experience episodes of greater growth. For this analysis, growth is only considered in terms of hectares per day.
While H88 [1] used start-day index, the first published studies examining its performance [7,8] compared daily index values with daily fire behavior observations. This is also the way the index has been used operationally since its introduction. The second analysis examines the daily index values and daily growth. These are treated as dichotomous categorical events-the index says there will be a "growth event", or not, and a growth event does or does not occur. Weather forecasting has a long history of evaluating categorical forecast skill for severe storms, tornadoes, or heavy precipitation events [11][12][13][14][15][16][17]. The present study is in essence an evaluation of forecast skill for the 0-h Haines Index "forecast" to predict a growth event. It uses contingency-type scores to examine whether there is a correlation between a categorical index and fire growth.
Based on the aforementioned forecast evaluation protocols, the second analysis here uses the following metrics (see Table 2): true positive ratio (TPR), miss ratio (MR), bias (B), and Peirce' skill score (PSS, also known as the True skill score or statistic, Hanssen-Kuipers discriminant, or Kuipers' performance index) [18]. The TPR answers the question "Of all of the times the index predicted an event, how often were there actually events?" The MR answers the question "Of all the times the index predicted no event, how many times was there actually an event?" Bias is the ratio of event predictions to event occurrences. Ideally, a predictor has a bias score of 1. The PSS is an equitable skill score, meaning that random predictions and constant predictions have equal scores, zero, and a perfect predictor will have a score of 1. Values of PSS below zero indicate that using the predictor has lower skill than randomly assigning each day to a growth or non-growth category.  In operational meteorology, the hit rate and false alarm rate are more commonly used than TPR and MR. Wilks [17] refers to hit rate and false alarm rate as elements in the likelihood-base rate factorization and TPR and MR as elements in the calibration-refinement factorization. Questions answered by the latter measures are stated above. Hit rate answers the question "Of all the times the event occurred, how many were correctly predicted?" and false alarm rate answers the question "Of all the times there was no event, how many times was an event wrongly predicted." The calibrationrefinement factorization has an advantage in operational application, in that it allows one to consider the predictor's performance at the time the prediction is made, rather than needing to wait until the predicted event or nonevent has occurred, or not.
There is no clear or formal cutoff for either the index or for growth events. The skill metrics are computed for varying thresholds on each of these. Table 3 summarizes the threshold values considered for the indices examined, and for each of the three growth measures discussed previously.
In operational meteorology, the hit rate and false alarm rate are more commonly used than TPR and MR. Wilks [17] refers to hit rate and false alarm rate as elements in the likelihood-base rate factorization and TPR and MR as elements in the calibration-refinement factorization. Questions answered by the latter measures are stated above. Hit rate answers the question "Of all the times the event occurred, how many were correctly predicted?" and false alarm rate answers the question "Of all the times there was no event, how many times was an event wrongly predicted". The calibration-refinement factorization has an advantage in operational application, in that it allows one to consider the predictor's performance at the time the prediction is made, rather than needing to wait until the predicted event or nonevent has occurred, or not.
There is no clear or formal cutoff for either the index or for growth events. The skill metrics are computed for varying thresholds on each of these. Table 3 summarizes the threshold values considered for the indices examined, and for each of the three growth measures discussed previously. The second analysis, examining daily index performance has three major components. The first directly examines the original index, HI, and its two components, HI A and HI B . The second examines two trend-based applications of the index. Many operational users consider an increase in the Haines Index more important than the actual value. I examine the performance of an increase in the index, regardless of the magnitude of the index, and I then examine the performance when only an increase that leads to an index value of 6 constitutes a prediction of a growth event. This analysis also examines the performance of the C-Haines [6] for daily growth.
To reflect the uncertainty in actual growth dates for the fires, performance measures are computed for the data both according to the dates determined through the comparison of the various fire records, and with the growth data for all fires shifted to one day prior.
In the analyses looking at daily growth, the pairings of index measure and growth metric can require wordy descriptions. For brevity, an ordered-pair notation is adopted of the form (index = index threshold, growth metric = growth threshold). Thus, (HI = 5, ∆A = 500 ha) refers to the case where a Haines Index of 5 or more was considered the predictor, and a size increase of 500 ha or more for the day is an actual event. The abbreviations HI A and HI B will be used for those respective components of the index, +dt will indicate an increase in the index from the previous day, +dt6 will indicate "index increases to a value of 6" as noted above, and CH indicates the C-Haines.
Results are reported first using ∆A as the growth metric, and without the growth days shifted to adjust report dates. This is followed with brief summary comments regarding the other two growth measures, and the shifted-date results. Table 4 and Figures 2-4 summarize the results of start-day index results. Figure 2 shows that mean fire size was slightly greater for fires with an index of 2 on start days than any other starting day index. The smallest mean fire size was for those starting on days with index values of 3, and mean size increases thereafter. The relationship between fire duration and for the fires in this study appears in Figure 3. Mean duration was greatest for fires with a start-day index of 2; the minimum duration of any fire with a start-day index of 2 was 28 days, greater than the minimum duration of fires for any other start-day index value, and greater than the mean duration for start-day indices of 3 through 6. fires for any other start-day index value, and greater than the mean duration for start-day indices of 3 through 6.  Daily growth is a more commonly considered measure of a fire's behavior than size. It is the metric considered by Saltenberger and Barker [7] and Werth and Ochoa [8]. Figure 4 shows how startday index related to daily growth for the study set. In Figure 4a, the mean of the peak growth day for all fires with a given starting index is shown; Figure 4b shows the mean number of spikes exceeding 1000 ha for fires with a given starting index. Mean peak growth increases with increasing start-day index, with a spike at an index of 4 primarily due to one fire. Mean number of spikes is greatest for fires for any other start-day index value, and greater than the mean duration for start-day indices of 3 through 6.  Daily growth is a more commonly considered measure of a fire's behavior than size. It is the metric considered by Saltenberger and Barker [7] and Werth and Ochoa [8]. Figure 4 shows how startday index related to daily growth for the study set. In Figure 4a, the mean of the peak growth day for all fires with a given starting index is shown; Figure 4b shows the mean number of spikes exceeding 1000 ha for fires with a given starting index. Mean peak growth increases with increasing start-day index, with a spike at an index of 4 primarily due to one fire. Mean number of spikes is greatest for    Daily growth is a more commonly considered measure of a fire's behavior than size. It is the metric considered by Saltenberger and Barker [7] and Werth and Ochoa [8]. Figure 4 shows how start-day index related to daily growth for the study set. In Figure 4a, the mean of the peak growth day for all fires with a given starting index is shown; Figure 4b shows the mean number of spikes exceeding 1000 ha for fires with a given starting index. Mean peak growth increases with increasing start-day index, with a spike at an index of 4 primarily due to one fire. Mean number of spikes is greatest for start-day index values of 2, and all three fires with start-day index value of 2 contributed to this high value. Figure 5 shows the statistical scores for the daily Haines Index, HI A and HI B . For both HI = 6 and HI = 5 thresholds (Figure 5a), the TPR and MR values are similar to one another, and decrease with increasing growth threshold. For any given growth threshold, TPR is greater when the index threshold is HI = 6, while MR scores are almost equal for the two index thresholds tested. For the individual HI A and HI B components (Figure 5b), TPR and MR both decrease with increasing growth threshold. Each score is similar between the two index components. Figure 5c shows PSS for the various indices and growth thresholds. Skill is lowest for the lower growth thresholds, even negative for (HI A = 3, ∆A = 500 ha). It increases for both index thresholds and both index components, reaching a maximum of 0.11 for (HI A = 3, ∆A = 3000 ha). Skill was greater for the full index with a threshold of 5 than with a threshold of 6.

Original Haines Index and Components
Bias scores, B, are shown in Figure 5d, increasing for all index thresholds and index components as growth threshold increases. Increasing B is a consequence of the number of index-based predictions staying constant, while the number of growth events, in the denominator of B, decreases with increasing growth threshold. Regardless of the growth threshold, B for the index threshold of 6 is lowest, with a maximum value of 0.6, indicating that the index predicts events less often than events occur. Bias scores for the thresholds HI = 5 and HI B = 3 are similar to one another at all growth thresholds, both having B = 1 for growth thresholds between 1500 and 2000 ha. The HI A curve lays intermediate to these two curves and the HI = 6 curve, and attains B = 1 near a growth threshold of 2500 ha.
Atmosphere 2018, 9, x FOR PEER REVIEW 9 of 17 Figure 5 shows the statistical scores for the daily Haines Index, HIA and HIB. For both HI = 6 and HI = 5 thresholds (Figure 5a), the TPR and MR values are similar to one another, and decrease with increasing growth threshold. For any given growth threshold, TPR is greater when the index threshold is HI = 6, while MR scores are almost equal for the two index thresholds tested. For the individual HIA and HIB components (Figure 5b), TPR and MR both decrease with increasing growth threshold. Each score is similar between the two index components. Figure 5c shows PSS for the various indices and growth thresholds. Skill is lowest for the lower growth thresholds, even negative for (HIA = 3, ΔA = 500 ha). It increases for both index thresholds and both index components, reaching a maximum of 0.11 for (HIA = 3, ΔA = 3000 ha). Skill was greater for the full index with a threshold of 5 than with a threshold of 6.

Original Haines Index and Components
Bias scores, B, are shown in Figure 5d, increasing for all index thresholds and index components as growth threshold increases. Increasing B is a consequence of the number of index-based predictions staying constant, while the number of growth events, in the denominator of B, decreases with increasing growth threshold. Regardless of the growth threshold, B for the index threshold of 6 is lowest, with a maximum value of 0.6, indicating that the index predicts events less often than events occur. Bias scores for the thresholds HI = 5 and HIB = 3 are similar to one another at all growth thresholds, both having B = 1 for growth thresholds between 1500 and 2000 ha. The HIA curve lays intermediate to these two curves and the HI = 6 curve, and attains B = 1 near a growth threshold of 2500 ha.

Index Trend
Results for the analyses using +dt and +dt6 appear in Figure 6. Figure 6a shows that the TPR and MR scores are similar to those for the basic index; for any given index measure and growth threshold, TPR and MR are similar to each other, and both decrease with increasing growth threshold. However, for both +dt and +dt6, MR exceeds TPR. For +dt, PSS is negative for all growth thresholds (Figure 6b). PSS is negative for +dt6 when growth threshold is 500 or 1000 ha, and zero for higher thresholds. Bias (Figure 6c) is less than 1 for +dt with growth thresholds below 2000 ha, but greater than 1 for higher growth thresholds. Bias for +dt6 is always less than 1, a result largely due to the relative rarity of events where the Haines Index increases to a value of 6.

Index Trend
Results for the analyses using +dt and +dt6 appear in Figure 6. Figure 6a shows that the TPR and MR scores are similar to those for the basic index; for any given index measure and growth threshold, TPR and MR are similar to each other, and both decrease with increasing growth threshold. However, for both +dt and +dt6, MR exceeds TPR. For +dt, PSS is negative for all growth thresholds (Figure 6b). PSS is negative for +dt6 when growth threshold is 500 or 1000 ha, and zero for higher thresholds. Bias (Figure 6c) is less than 1 for +dt with growth thresholds below 2000 ha, but greater than 1 for higher growth thresholds. Bias for +dt6 is always less than 1, a result largely due to the relative rarity of events where the Haines Index increases to a value of 6.

C-Haines
All of the scores for CH are slightly higher than the basic index scores (Figure 7), for a given growth threshold. The difference between TPR and MR (Figure 7a) is larger for CH than for the basic index, also. Trends in the scores are similar to those for the basic index-TPR and MR decrease as growth threshold increases, B increases as growth threshold increases (Figure 7b). All PSS values (Figure 7c) are greater for CH than for the basic index, and while PSS decreases with increasing growth threshold for the basic index, with a threshold of 5 or 6, it reaches a maximum value for a growth threshold of 1000 ha with the CH.

C-Haines
All of the scores for CH are slightly higher than the basic index scores (Figure 7), for a given growth threshold. The difference between TPR and MR (Figure 7a) is larger for CH than for the basic index, also. Trends in the scores are similar to those for the basic index-TPR and MR decrease as growth threshold increases, B increases as growth threshold increases (Figure 7b). All PSS values (Figure 7c) are greater for CH than for the basic index, and while PSS decreases with increasing growth threshold for the basic index, with a threshold of 5 or 6, it reaches a maximum value for a growth threshold of 1000 ha with the CH. Since one of the intentional changes incorporated in the CH is that it can increase beyond 6, index thresholds of 6 to 10 were examined. Figure 7d shows PSS values for a growth threshold of 1000 ha, and indicates a peak PSS of 0.24 for a CH threshold of 7. Additional testing of both growth and index thresholds (not shown) revealed that the highest PSS and the B closest to 1 occurred for a (CH = 8.7, ΔA = 1000 ha). For these thresholds, PSS = 0.21, TPR = 0.62, and MR = 0.41.

Other Growth Measures and Lagged Fire Data
The TPR and MR values when growth events were identified by or over the ranges shown in Table 3 were roughly half those for growth in hectares. In contrast, B values were higher Since one of the intentional changes incorporated in the CH is that it can increase beyond 6, index thresholds of 6 to 10 were examined. Figure 7d shows PSS values for a growth threshold of 1000 ha, and indicates a peak PSS of 0.24 for a CH threshold of 7. Additional testing of both growth and index thresholds (not shown) revealed that the highest PSS and the B closest to 1 occurred for a (CH = 8.7, ∆A = 1000 ha). For these thresholds, PSS = 0.21, TPR = 0.62, and MR = 0.41.

Other Growth Measures and Lagged Fire Data
The TPR and MR values when growth events were identified by ϕ Ai or ϕ ri over the ranges shown in Table 3 were roughly half those for growth in hectares. In contrast, B values were higher for the alternative growth measures (but comparable to one another) and PSS values were comparable for all three growth measures. Shifting the growth data by one day to allow for possible reporting lag made only minor difference in any of the scores, for a given index, index threshold, and growth threshold (not shown).

Discussion
For the 47 fires in this study, high values of the Haines Index on fire start day do not appear to correspond to overall fire size or duration. When the mean peak-day growth of fires is averaged based on start-day index, there is a slight positive slope. The growth value for a start index of 4 is heavily influenced by one fire, but otherwise there is a positive trend in the growth values-from roughly 12,000 ha for an index of 2, to 16,000 ha for an index of 6. These averages likely reflect the range of fire sizes chosen for this study, and if smaller fires were included, the difference would necessarily decrease.
The number of growth spikes appears to be extremely high for fires starting on days with HI = 2, and then shows a positive slope for index values of 3 to 6. The spike for a start-day value of 2 is due to the fact that there are only three fires in that group, and one of them had a 73-day duration, allowing time for many spikes. The number of spikes roughly doubles, from 5 for a start-day index of 3 to 10 for a start day index of 6.
Interpretation of the performance measures for daily comparisons is complex. Marzban and Lakshmanan [19] describe the importance of the relative costs of correct and incorrect forecasts when interpreting contingency table scores. When the cost of forecasting an event that does not occur differs greatly from the cost of forecasting that no event will occur but it actually does, the operational significance of the scores is not the same as it is when the two costs are comparable. Thus, the ultimate evaluation of what scores are acceptable is based on social values and beyond the scope of this number-driven paper. The discussion here focuses on relative values of the performance measures for the index thresholds and variants considered, and for the different thresholds used to define growth events.
The TPR and MR scores are similar to one another for any given growth threshold and for a specific choice of the basic Haines Index and its threshold (5 or 6, in this study). The same is true for the index components. Recall that these two performance measures answer the questions "Of all the times the index predicted an event, how often were there actually events?" (TPR) and "Of all the times the index predicted no event, how many times was there actually an event?" (MR). In this study, as long as the index threshold was held constant, as it was for each line in Figure 4a,b, the denominator in the measure was also constant, and all that changed was the numerator. The decreasing numerator as the growth event threshold increased is the cause of all change in the measures. For low growth event thresholds, TPR and MR are closer in value, indicating that basically, the index is right about events happening as often as it is wrong about non-events. For higher growth event thresholds, the difference in the performance measures increases, showing that the index correctly predicts events more often than it incorrectly predicts non-events.
As noted in Methods, the calibration-refinement factorization allows consideration of the TPR and MR scores at the time a forecast is made. For example, consider the case with (HI = 6, ∆A = 1000 ha). If, at some time and location, the Haines Index is 6, then this can be weighed in combination with the earlier result that 51% of the time when an event is predicted (HI = 6), there really is an event (growth of 1000 ha or more that day), based on TPR. Conversely, if the Haines Index is less than 6, one can use the MR to see that 47% of the time when the index does not predict an event, an event does in fact occur. Such a statement is not possible when the likelihood-base rate factorization is used. The Peirce Skill Scores for the index with a threshold of either 5 or 6 to indicate an event, and for the separate HI A and HI B components, are less than 0.1 for all growth event thresholds, with one exception. That exception is for (HI A = 3, ∆A = 3000 ha), which has PSS = 0.11. Even that highest PSS is closer to the random-forecast score (PSS = 0) than it is to a perfect-forecast score (PSS = 1), and the growth event threshold of 3000 ha in a day is a very high threshold for the fires in this sample, let alone for fires with smaller final size.
In terms of bias, using an index threshold of 6 to identify events and requiring a bias score of 1 would require using a growth threshold of 14,000 ha in a day, but for this pair of thresholds, one must accept a PSS of 0.1, a TPR of 0.2 and a MR of 0.1. In short, an index threshold of 6 predicts too few growth events to possibly predict all actual events. Looking again at the best-case scenario for PSS, (HI A = 3, ∆A = 3000 ha), B is 1.3, indicating events were predicted 30% more often than they occurred.
The fact that the performance measures did not change appreciably when the fire data were shifted one day is not entirely surprising. The PSS values for the unshifted data are close to what one would get for random predictions, and a one-day shift could be considered a random prediction. Serial correlation in the index would make the shifted data nonrandom, but the similarity of the measures suggests the index predictions are still, essentially, random.
Using an increasing index trend to predict growth events yields lower scores for TPR, MR, and PSS than did any of the basic index applications. Not only were trend TPR scores lower than basic index TPR scores, and the same true for MR and PSS in place of TPR, but when an increasing index is used as the predictor of a growth event, MR is greater than TPR for any chosen growth event threshold-the predictor is wrong about non-events more often than it is right about events. The Peirce skill scores for trend-based predictions are on the order of 10 −2 , where 0 is equivalent to a random or constant value prediction. Some values of PSS are negative, indicating that the index-trend predictor for that growth event threshold is correct less often than a constant or random prediction would be.
For the C-Haines, three of the four performance scores are favorable, compared to the basic index scores. The C-Haines TPR scores, for a given growth threshold, are higher. The MR scores are lower, as well as more separated from the TPR scores for given growth thresholds. For example, for the basic index with a threshold of 5, with growth threshold of 1000 ha, TPR is 0.49, and MR is 0.47, but for C-Haines with the same threshold, TPR is 0.57 and MR is 0.31. Peirce skill scores are higher for C-Haines than for the basic Haines Index, though still closer to random or constant than they are to a perfect predictor. Bias scores for the C-Haines are higher than for the basic index at the same growth threshold, exceeding one for the lowest threshold tested and increased more rapidly as the event growth threshold increased. For the maximum PSS case noted earlier, (CH = 7, ∆A = 1000 ha), B is 1.4, indicating more predictions than actual events. However, it is possible to decrease B with only a small decrease in PSS by using (CH = 8.7, ∆A = 1000 ha).
The data set used in this study included only fires over 9000 ha, and this will have affected the results. Smaller fires would necessarily have smaller growth events, and it is likely that a threshold below 500 ha would yield different performance scores for one or all of the variations tested here. Because the larger fires used in this study are less common in the eastern United States, the performance of the Haines Index and the C-Haines on eastern fires cannot be determined with any confidence based on the present data set and analysis.
To obtain a large sample size for fires, in terms of both number of fires and fire days, it was necessary to use fire size and growth data as the predictand. This remains as problematic and limiting as it has ever been, with size errors, missing days, and in most cases inaccurate time stamps for the sizes. Haines (1988) did not specify what fire measure was used for the original index development, but given what was available at the time, size or duration is really the only possibility. It is possible that with currently available satellite and other remotely sensed data, other fire measures could be used to look for relationships between the Haines Index or the C-Haines and fire.
The meteorological data from the NAM are among the highest resolution data available for an extended historical period. Because the NAM is also one of the primary National Weather Service operational models, it is also one used frequently by incident meteorologists and forecast offices. There are newer, higher resolution models, as well as coarser resolution models that have been run further into the past. Results of a similar analysis using one of these would differ from the current results, as would analysis using raw observational soundings. Using a 0-hour analysis, rather than a model initialization, for this study, provided the best estimate of the relevant meteorological properties, and so reduced the likelihood that model characteristics are the cause of any particular finding.

Conclusions
Based on a multi-fire, multi-day data set, and the 0-h NAM analysis, this study characterized the ability of the Haines Index to indicate large fire growth. Both start-day index, as used by Haines [1], and daily index values as commonly used operationally, were considered using standard forecast verification measures. The results show that the measures depend on the definition of a growth event as well as what level of the index is used to predict an event. The results clearly showed, however, that using an increasing trend in the index, instead of the index itself, to determine high growth days leads to worse overall performance. The Continuous Haines Index [6], with a threshold of 8.7, correctly predicted growth events over 1000 ha more often than the original Haines Index did, mis-predicted nonevents less often, had a relatively high Peirce skill score, and had no bias. Combining the Haines Index with near-surface TKE [5] was not examined in this study. Such an evaluation requires a number of decisions regarding what NAM pressure level(s) of TKE to use, and model resolution might affect the results. The scale of such an effort merits a study in its own right, and is a potential topic for future work.
Management decisions for wildland fires incorporate a vast array of factors, such as infrastructure at risk, resources available, fuel conditions, weather conditions, firefighter safety, public safety from fire and smoke, and cost effectiveness. The relative weights of these factors are highly dependent on the specific situation, and the uncertainty or reliability of any data used in the decisions is an important piece of information. While it is not possible to say with authority that a certain TPR, MR, PSS, or B for the Haines Index is acceptable or not for all situations, these scores each provide fire weather forecasters and fire managers with more information than just the value of the index.