Considerations for Categorizing and Visualizing Numerical Information: A Case Study of Fire Occurrence Prediction Models in the Province of Ontario, Canada

: Wildland ﬁre management decision-makers need to quickly understand large amounts of quantitative information under stressful conditions. Categorization and visualization “schemes” have long been used to help, but how they are done affects the speed and accuracy of interpretation. Using traditional ﬁre management schemes can unduly restrict the design of new products. Our design process for Ontario’s ﬁne-scale, spatially explicit, daily ﬁre occurrence prediction (FOP) models led us to develop guidance for designing new schemes. We show selected historical ﬁre management schemes and describe our method. It includes specifying goals and requirements, exploring design options and making trade-offs. The design options include gradient continuity, hue selection, range completeness and scale linearity. We apply our method to a case study on designing the scheme for Ontario’s FOP models. We arrived at a smooth, nonlinear scale that accommodates data spanning many orders of magnitude. The colouring draws attention according to levels of concern, reveals meaningful spatial patterns and accommodates some colour vision deﬁciencies. Our method seems simple now but reconciles complex considerations and is useful for mapping many other datasets. Our method improved the clarity and ease of interpretation of several information products used by ﬁre management decision-makers.


Introduction
Situational awareness and decision-making for operational wildland fire management is supported by a large amount of complex, numerical information, often covering large areas and sometimes spanning multi-day forecasts. Comprehending and interpreting that quantity of information under time-limited and stressful conditions is challenging. Among other ways, this task is commonly made faster and easier by categorizing and visualizing the numerical information. There are many ways to do so, but how it is done can help or hinder interpretation, highlight or obscure valuable information and accurately portray or distort the data.    The other major CFFDRS subsystem that has outputs commonly communicated using classes is the FBP System, which provides quantitative estimates of fire behaviour outputs [6]. A primary output is fire intensity, the rate of energy or heat release per unit time per unit length of a spreading fire front [16], which ranges from 1 to ~100,000 kW/m in Canadian conditions. Fire intensity is also commonly categorized into fire intensity classes (ICs). Rather than adjective classes (i.e., low-extreme) they were given five numeric labels (IC 1-5 [17]), and later six (IC 1-6 [18]). Higher IC numbers correspond with the higher intensity values, but the boundaries are not evenly spaced ( Table 2). The most commonly used IC boundaries in Canada delineate distinct differences in fire type characteristics (for example, surface, torching or crowning) in mature jack pine (Pinus banksiana Lamb.) stands and the corresponding general effectiveness of different types of fire suppression activities (for example, hand tools, pumps and hose, airtankers). These ICs are described in the Field Guide to the FBP System [19] and the ICs are also used to map fire intensity by many fire management agencies, for various purposes ( Table 2). Table 2 shows some of the different intensity values for higher-end class boundaries (i.e., additional thresholds beyond IC 6). The choice of colour when mapping can be operationally significant because colours covey information rapidly and have strong associations with levels of alarm-for example, red for danger [20] and green for calm [21]. Such psychological factors are not always considered in the visualization, however, which could lead to misinterpretation. For example, IC 4 in Table 2, which is associated with the upper limit of direct fire suppression effectiveness [17,22], is variously coloured a calming light green, a cautionary yellow or a warning orange. In Table 2 there are also cases where the same colour refers to different IC classes-for example, calming light green is used for IC 2 in Ontario, IC 3 in Alberta and IC 4 nationally. The other major CFFDRS subsystem that has outputs commonly communicated using classes is the FBP System, which provides quantitative estimates of fire behaviour outputs [6]. A primary output is fire intensity, the rate of energy or heat release per unit time per unit length of a spreading fire front [16], which ranges from 1 to~100,000 kW/m in Canadian conditions. Fire intensity is also commonly categorized into fire intensity classes (ICs). Rather than adjective classes (i.e., low-extreme) they were given five numeric labels (IC 1-5 [17]), and later six (IC 1-6 [18]). Higher IC numbers correspond with the higher intensity values, but the boundaries are not evenly spaced ( Table 2). The most commonly used IC boundaries in Canada delineate distinct differences in fire type characteristics (for example, surface, torching or crowning) in mature jack pine (Pinus banksiana Lamb.) stands and the corresponding general effectiveness of different types of fire suppression activities (for example, hand tools, pumps and hose, airtankers). These ICs are described in the Field Guide to the FBP System [19] and the ICs are also used to map fire intensity by many fire management agencies, for various purposes (Table 2). Table 2 shows some of the different intensity values for higher-end class boundaries (i.e., additional thresholds beyond IC 6). The choice of colour when mapping can be operationally significant because colours covey information rapidly and have strong associations with levels of alarm-for example, red for danger [20] and green for calm [21]. Such psychological factors are not always considered in the visualization, however, which could lead to misinterpretation. For example, IC 4 in Table 2, which is associated with the upper limit of direct fire suppression effectiveness [17,22], is variously coloured a calming light green, a cautionary yellow or a warning orange. In Table 2 there are also cases where the same colour refers to different IC classes-for example, calming light green is used for IC 2 in Ontario, IC 3 in Alberta and IC 4 nationally.  Table 2. Examples of the diverse classification of fire intensity and colouring used in Canada. The Ontario, Alberta and national schemes are used in daily maps. The British Columbia (BC) scheme is used in a static map of the 90th percentile of historical fire intensity. The Field Guide to the Fire Behaviour Prediction System (Field Guide) scheme is used in printed tables. Intensity classes IC 1-IC 6 are as defined in [19]; the higher classes are informal. The colours are approximate. An additional caveat in the classification is that simplifying numerical information by aggregation into few classes has a cost. For FWI and fire intensity, the wide range within some classes is operationally significant for some decisions-for example, FWIs of 11-22 become "High". Furthermore, for interpolated maps, neighbouring points that are displayed as different classes will not have operationally meaningful differences. To compensate, maps often include the raw point values that were used for interpolation ( Figure 1b). The numeric information cannot, however, be read and interpreted as quickly, thus reducing the benefit of categorization. The categorizing of data for spatial application such as FWI and FBP is common, and there are many recognized considerations (for example, [26]) and built-in solutions in geographic information systems. However, using a built-in classification option without a deep understanding may be unsuitable because potential distortions can lead to radically different interpretations, as others have noted [27]. Consequently, there is a strong need to use schemes that convey information accurately.

Methods
We propose five steps to categorize and visualize model outputs for use in fire situational awareness and decision-making:

1.
Understanding and scoping the data 2.
Understanding the decision-making uses of the information 3.
Specifying the design goals and requirements 4.
Designing the categorization and visualization scheme 5.
Evaluating and revising the scheme We first describe these steps in general, below, and then with further detail on their application, in our case study. Although the method is described in a linear sequence, the work is partly concurrent and highly iterative, especially within Step 4.

Step 1: Understanding and Scoping the Data
Our method applies to data with a continuous numerical scale of measure (real numbers). With minor modifications, it can also apply to data with a discrete numerical scale of measure (integers) and to ordinal categorical data (for example, Very Low, Low, Low-Moderate, . . . ). The modification is that colouring with continuous gradients (described below) does not apply unless there are a great many discrete values or ordinal categories.
The technical details of the raw model output data may be straightforward, but unfamiliar units, scaling, storage or other conditions can lead to misinterpretation. The following need to be understood by the designers: • The data's precision of measurement and storage and the data's accuracy of measurement or estimation The stored precision may not correspond with the accuracy. Measured data such as weather observations have low precision (for example, to 0.1 • C), but calculated data such as FWI System values should be calculated and may be stored with full machine precision (~16 decimal digits), which is well beyond the accuracy of typical fire management data.
• If the data are generated by a model, then the model's meaning, structure, assumptions, limitations, precision and accuracy Understanding data precision and accuracy is necessary for presenting information accurately and having decision-makers understand it easily and correctly. Regarding data storage and all subsequent calculations using data, full machine precision should be maintained to avoid accumulating rounding errors. Regarding the numbers displayed for decision-makers, the displayed precision should not exceed the data's accuracy, because that could be misleading. The numbers displayed for decision-makers should ideally have the lowest precision that is operationally significant to minimize unnecessary mental processing. Further discussion and examples are given in the Supplementary Material.

Step 2: Understanding the Decision-Making Uses of the Information
The purpose of the information is to support decision-making, so it is necessary to know who is using the information and how it is used. Working directly with fire management staff to understand their needs is necessary for ensuring that new model outputs are effectively integrated into the decision-making process [28,29].
There are key questions to consider. What decisions are being supported? Are certain parts of the range more important, needing higher attention? Is a higher resolution (smaller class size) needed in some parts of the range rather than others? For example, consider the categorizing and mapping of the accumulated 24-h rainfall from a precipitation radar [30]. For differentiating the degree and duration of the reduced fire behaviour potential, a high resolution is useful at the low end but not at the high end. Conversely, for differentiating the degree and duration of flood potential, a high resolution is useful at the high end but not at the low end. Moreover, a higher top category is appropriate.

Step 3: Specifying the Design Goals and Requirements
As stated in the introduction, the high-level goals of categorization and its visualization are simply to show the information completely and accurately and to have the information be understood quickly and easily. These goals are elaborated into criteria as follows.

•
Regarding the complete and accurate display of information: Are the magnitudes shown with the original or reduced precision? Are the magnitudes undistorted or distorted by categorization, scale nonlinearity or truncation? Are the relative magnitudes evident by the colouring without or with referral to the legend? • Regarding the quick and easy understanding of information: Can the colouring be easily matched to the legend's magnitude numbers? Does the colouring draw attention and convey a suitable psychological meaning for the degree of alarm? What is the overall ease of understanding?
Possible design requirements include the accommodation of colour vision deficiencies and other technical considerations such as the adequate appearance on low-quality displays, colour printers or standard photocopiers.

Step 4: Designing the Categorization and Visualization Scheme
There are four design options for categorizing and colouring the values in the scale ( Figure 2):

1.
Gradient continuity: whether to use the original values unaltered or categorized 2.
Hue selection: the number and choice of colours and design of gradients 3.
Range completeness: whether to show the full range or truncate the top or bottom of the range 4.
Scale linearity: whether to have a linear or nonlinear scale or progression of category boundaries, and whether to have colour gradients that are linearly or nonlinearly proportional to the data magnitudes  Regarding the complete and accurate display of information: o Are the magnitudes shown with the original or reduced precision? o Are the magnitudes undistorted or distorted by categorization, scale nonlinearity or truncation? o Are the relative magnitudes evident by the colouring without or with referral to the legend?
 Regarding the quick and easy understanding of information: o Can the colouring be easily matched to the legend's magnitude numbers? o Does the colouring draw attention and convey a suitable psychological meaning for the degree of alarm? o What is the overall ease of understanding?
Possible design requirements include the accommodation of colour vision deficiencies and other technical considerations such as the adequate appearance on low-quality displays, colour printers or standard photocopiers.

Gradient Continuity
The alternatives for gradient continuity are either to use the original values or categorize them (Figure 2, parts 1a and 1b). For a continuous gradient, the percentage of colour saturation is proportional to the raw datum magnitude. Continuous gradients are therefore precise and accurate but harder to interpret using the legend compared to categorized gradients.

Hue Selection
Colours have strong psychological associations that affect the inferred meaning of information and its speed and ease of interpretation. Blue and green are associated with relaxation, calm and hope [21], and red is associated with danger [20] (Figure 2, part 2f). Historically, variations of a blue-green-yellow-orange-red sequence (which alludes to water, growing vegetation, dried vegetation and flame) have been used to represent escalating fire danger (see examples in Table 2). Accommodating colour vision deficiencies reduces the colour combination choices, particularly most of those in the traditional fire danger sequence [31]. Tools are available to assist with the testing of colour palettes for accessibility [31,32]. The design task is to choose colours appropriate for the implications of the data magnitudes, particularly the degree of attention or alarm.
Regarding the gradient design, there are a few distinct alternatives [33] ( Figure 2, parts 2a-2e). A single sequential gradient is for data ranging over a meaning of zero or neutral to bad or good. A divergent sequential gradient is for data ranging over a meaning of good through neutral to bad. In Figure 2, parts 2b and 2c grade from a calming blue through white to an alarming red. The left and right ends each have a single hue. Part 2b is neutral in the middle, whereas part 2c has compressed and expanded ends. Part 2d, multiple sequential, is analogous to part 2a except that part 2d has multiple hues, which are in a rainbow spectrum in the example. Compared to a single hue, multiple hues provide more contrast over the range, making it easier to match the legend and signal levels of attention or alarm. Part 2e, multiple divergent, is analogous to part 2b except that part 2e has multiple hues for each side. The R software [34] package "inlmisc" [35,36] is useful for constructing continuous or categorized gradients.
A key concern is how these many design alternatives support or oppose the design goals. Only the single sequential and single divergent gradients ( Figure 2, parts 2a and 2b) have the accuracy of continuous gradient continuity (Figure 2, part 1a), but they have a difficult interpretability. The remaining gradient alternatives require matching the legend to identify the magnitudes, but this can become quick and easy to interpret with familiarity and an effective use of colour psychology. • For categorized gradient continuity, a nonlinear scale is used to vary the resolution over the range 2.
Colour gradient: whether to have colour gradients that are linearly or nonlinearly proportional to the data magnitudes • For categorized gradient continuity, a nonlinear colour gradient is used to communicate the varying meaning or importance of the information over the range For a continuous gradient, the same result can be achieved from nonlinearity in either of the above two components.
Nonlinear-systematic (part 4b) methods use a smooth function such as log or power to transform the output, while nonlinear-irregular methods use a non-smooth progression such as Jenks [37]. Nonlinear colouring requires the referral to the legend to understand the magnitudes. This is a trade-off between the goals of drawing attention to where it is needed and improving the speed and ease of understanding. There are copious settings and combinations of alternatives for the four design options. Getting to a result is an iterative process, with analysts and subject matter experts trying alternatives and making trade-offs, hopefully avoiding the anchoring to tradition or early trials. We cannot recommend a path through the four design options other than saying it is iterative and concurrent. We do, however, recommend a starting point or baseline, which is the extreme of displaying all the data completely and accurately and ignoring the goals of quick and easy understanding. The baseline has continuous gradient continuity, an achromatic colour gradient of white through greys to black and a linear scale with no truncation (Table 3). Table 3. The baseline case alternatives for the design options. This is a starting point that presents complete and accurate information, while ignoring the goals of a quick and easy understanding of the information.

Design Option
Alternative Description  [38]. The number of categories is a trade-off of accuracy (requiring more) and speed and ease of understanding (requiring fewer). Ideally, individual categories have no operationally significant physical differences, while adjacent categories do. In practice, all considerations require compromise. An example of determining categories for FWI System outputs based on physical differences is given by [9].

Step 5: Evaluating and Revising the Scheme
Once the design process is done and implemented, an essential further step is the ongoing work with decision-makers to evaluate the outputs and revise the design as necessary.

Case Study: Designing the Scheme for Ontario's FOP Models
We now describe the application of the above method to categorizing and visualizing FOP data.

Case Study-Step 1: Understanding and Scoping the Data
The data are outputs from process and statistical FOP models for Ontario, so we begin with their description. The lightning-and the human-caused FOP models have been used operationally since the mid-2000s and 2015, respectively. The daily lightning-caused fire occurrence is modelled as two separate processes [3]: the probability of a lightning strike will lead to a holdover ignition and the probability that an existing ignition "arrives" (is reported). The ignition model is mainly driven by the forest floor organic layer's moisture content, which determines the sustainability of smouldering and the survivability of the ignition. Additional factors are other moisture indicators, ecoregional modifiers and lightning strike polarity. The "arrival" model, which is conditional on a holdover ignition being present, is influenced by the surface litter moisture, organic layer moisture, wind speed and ecoregional differences. Ontario's human-caused fire prediction system uses a set of logistic generalized additive models to model inherently nonlinear relationships with key drivers of human-caused fire occurrence, including seasonal and spatial patterns, fuel moisture and the characteristics of human land use [1]. The models are stratified regionally and by cause categories to account for different seasonal patterns in fire occurrence.
Both the lightning-and human-caused FOP models produce outputs for each of the 2574 cells in a grid that spans the province's approximately 91.9 million ha wildland fire management area (Figure 3)   Regarding the range and frequency distribution, we analysed historical FOP data for each cell for each day from May 15 to August 31; 2016-2018 for human-caused and 1992-2006 for lightning-caused fires. For this analysis, the start and end dates in each fire season were chosen to avoid the variability in spring and fall snow-free conditions, when the models are not making predictions for the entire province. The lower limit of the range is zero; there is no theoretical upper limit. Table 4 presents summary statistics for the data, with zeros excluded to characterize the important data more clearly. Most of the distribution statistics of the human-and lightning-caused data differ by about an order of magnitude.  Regarding the range and frequency distribution, we analysed historical FOP data for each cell for each day from 15 May to 31 August; 2016-2018 for human-caused and 1992-2006 for lightning-caused fires. For this analysis, the start and end dates in each fire season were chosen to avoid the variability in spring and fall snow-free conditions, when the models are not making predictions for the entire province. The lower limit of the range is zero; there is no theoretical upper limit. Table 4 presents summary statistics for the data, with zeros excluded to characterize the important data more clearly. Most of the   Figure 4 shows the empirical probability distributions of the non-zero data. Those data were mostly clustered close to zero in both models, so we log-transformed the data for illustration (untransformed data are required for operational use). The magnitudes of data between the lower tails of the two distributions differ by orders of magnitude. This presents a challenge for categorizing and colouring the data on a common scale for mapping.

Case Study-Step 2: Understanding the Decision-Making Uses of the Information
We used a variety of methods to understand the FOP information needed and how it is used for daily decision-making: reviewing documentation, observing operational decision-making and (for some) working part-time in operational functions where FOP information is used. Most importantly, we held a series of engagements with fire management agency personnel. For example, we hosted a workshop in 2017 attended by agency personnel including regional and provincial Fire Intelligence Officers, external researchers and students. The workshop's topics included the purposes and methods of subjective FOPs by experts.
To understand the agency's use of FOP information, it is necessary to outline the agency's hierarchical structure and the responsibilities of each level. Ontario's fire management area (Figure 3) has two main parts, the Northeast and Northwest Regions, each of which is divided into six or seven Fire Response Sectors. There is also a Provincial level. Each level has a sole or shared responsibility for various decisions; most are made with consultation or coordination between adjacent levels. The Regions are primarily responsible for strategic fire response decisions and management, and the Sectors are primarily responsible for tactical fire response decisions and operations. The Province is primarily

Case Study-Step 2: Understanding the Decision-Making Uses of the Information
We used a variety of methods to understand the FOP information needed and how it is used for daily decision-making: reviewing documentation, observing operational decision-making and (for some) working part-time in operational functions where FOP information is used. Most importantly, we held a series of engagements with fire management agency personnel. For example, we hosted a workshop in 2017 attended by agency personnel including regional and provincial Fire Intelligence Officers, external researchers and students. The workshop's topics included the purposes and methods of subjective FOPs by experts.
To understand the agency's use of FOP information, it is necessary to outline the agency's hierarchical structure and the responsibilities of each level. Ontario's fire management area (Figure 3) has two main parts, the Northeast and Northwest Regions, each of which is divided into six or seven Fire Response Sectors. There is also a Provincial level. Each level has a sole or shared responsibility for various decisions; most are made with consultation or coordination between adjacent levels. The Regions are primarily responsible for strategic fire response decisions and management, and the Sectors are primarily responsible for tactical fire response decisions and operations. The Province is primarily responsible for adjusting the near-term (1-to 21-day) capacity according to the demand via temporary commercial hiring and inter-provincial and international resource sharing.
Several decisions are directly dependent on the potential number of fires anticipated in various parts of the province. These decisions are made at the stated levels: The various FOP-dependent decisions have diverse needs in terms of the spatial extent and resolution of FOP information. For example, detection route designers can use relatively fine resolution information on the order of kilometres, while the Province needs only aspatial, numeric FOPs by region for resource-sharing decision-making.

Case Study-Step 3: Specifying the Design Goals and Requirements
The primary and conflicting goals are of course to display complete and accurate information and have the information quickly and easily understood. In twice-daily briefings, decision-makers have limited time (minutes) to view, interpret and absorb each of many information items regarding, for example, weather values, FWI and FBP System outputs, FOP, active fires, logistics and personnel. Completeness and accuracy are important because the information supports the many decisions described above, which are made under uncertainty and have potentially significant consequences.
Regarding specific information requirements: • There is a need for both maps and numeric subtotals and totals of fire occurrence by cause and location (i.e., Sectors, Regions, Province) • All the maps need to use the same categories and colours for FOP magnitudes • The FOP models' output is the expected or average occurrence, but the actual occurrence varies around the average, so an indication of the variability is needed.
Decision-makers expressed strong preferences for the number of categories, ranging from three to many categories, and they desired a familiar colour sequence (blue-greenyellow-red). There was also a strong preference for integers for all numbers related to FOP. Finally, we wished to accommodate colour vision deficiencies.

Case Study-Step 4: Designing the Categorization and Visualization Scheme
Design has been described as a messy process with a tidy outcome. We do not detail our circuitous journey but show and describe some key alternatives, stages and considerations. Figure 5a illustrates the baseline alternative (Table 3) applied to the humancaused FOP and actual fire arrivals for a selected day. The range of the colour gradient is 0-3 fires/cell, the upper limit of which is between the maximum human-and lightningcaused FOP ( Table 4). The map area looks mostly white, with three small, pale grey patches; there is little useful information, especially considering the six actual fires that day, which is a low-to-moderately busy day for this cause. Adding more hues alone would make no meaningful difference because the data are clustered very near zero. Categorizing at this stage would make it worse. Truncating the upper limit to a low value somewhat close to zero would add resolution and colour to the human-caused FOP here, but such truncation would lose all resolution of the rarer but critically important high lightning-caused FOP.  (Table 3), which shows little useful information; (b) 4 + 1 categories; (c) 10 + 1 categories; (d) 20 + 1 categories. The additional category in (b), (c) and (d) are for a "no forecast model" or true zero. Using more categories and hues greatly increases the information portrayed but makes matching the colour to the legend more difficult. The spatial pattern in (d) corresponds with roads and settlements.
Our original solution was to use separate scales for the two fire occurrence causes ( Figure 6). The lightning-caused FOP scale was linear. For the human-caused FOP scale, we led subject matter experts through a scenario for an area of Ontario that has a relatively high occurrence. When the FFMC (a strong indicator of sustainable ignition) was 90, that area was considered to have an elevated concern suitable for a High classification. We used the corresponding FOP magnitude (0.4 fires/cell) as the upper limit for the then 10category scale. We determined Moderate similarly and interpolated with equal linear steps. In Figure 6, this scale is shown by the blue line, except that categories 1 to 10 in the original have been mapped to a 0 to 20 scale for comparability. Scaling the original totalfires map required a creative logic. The maps by individual cause were acceptable to decision-makers, but the inconsistency between the causes was ultimately unacceptable (and motivated the present work).  (Table 3), which shows little useful information; (b) 4 + 1 categories; (c) 10 + 1 categories; (d) 20 + 1 categories. The additional category in (b-d) are for a "no forecast model" or true zero. Using more categories and hues greatly increases the information portrayed but makes matching the colour to the legend more difficult. The spatial pattern in (d) corresponds with roads and settlements.
Our original solution was to use separate scales for the two fire occurrence causes ( Figure 6). The lightning-caused FOP scale was linear. For the human-caused FOP scale, we led subject matter experts through a scenario for an area of Ontario that has a relatively high occurrence. When the FFMC (a strong indicator of sustainable ignition) was 90, that area was considered to have an elevated concern suitable for a High classification. We used the corresponding FOP magnitude (0.4 fires/cell) as the upper limit for the then 10-category scale. We determined Moderate similarly and interpolated with equal linear steps. In Figure 6, this scale is shown by the blue line, except that categories 1 to 10 in the original have been mapped to a 0 to 20 scale for comparability. Scaling the original total-fires map required a creative logic. The maps by individual cause were acceptable to decision-makers, but the inconsistency between the causes was ultimately unacceptable (and motivated the present work). Several alternatives were considered for a unified scale that accommodated the conflicting needs of fine resolution at the low end and a high upper limit. Discussions with decision-makers indicated that concern increases relatively quickly as the likelihood of fire rises from zero. Providing a high resolution at the low end while retaining a high upper limit would require a great many colours (for example, like those of precipitation radar maps). That would be unfamiliar and confusing and would not correspond with psychological colour associations nor accommodate colour vision deficiencies. Piecewise linear and irregular scales were explored, but their abrupt changes made them difficult to interpret. We wanted a smooth, systematic progression of category boundaries and tested logarithmic and power functions. Those functions can be made to match fairly closely, but the power function had a more suitable shape at the low end. A power function takes the general form ( ) = , with the shape controlled by parameters and . Our desired behaviour for the scale to increase quickly for low FOP but then increase progressively more slowly is provided when > 0 and 0 < < 1, since this family of power functions is monotonically increasing and concave down. The FOP scale is , and the category scale is ( ). We developed a parametric scaling tool with three inputs to generate and plot boundaries: shape parameter, 1 ; the upper limit of the FOP scale, ; and the number of categories, . The boundary for the top of category is Any FOP > stays in the highest category. A category for true zero can be added if required. For convenience, we parameterized 1 , for which we tested values in the range of 1.5 to 4.5; for example, 2 yields a square root shape.
works out to be . The red curve in Figure 6 illustrates the boundaries for 1 = 3, = 3 and = 20; for example, the boundary for = 11 (the top of category 11) is ~0.5. The tool lists the boundaries and graphs them as in Figure 6.
While working directly with subject matter experts, the tool facilitated the joint testing of alternatives for the number of categories, truncation and nonlinearity design options. These alternatives plus colouring needed to be adjusted simultaneously when trading off the goals because their effects interact. The FOP outputs for a set of representative days were mapped using R [35] for candidate sets of boundaries and colouring. Table 5 lists ways in which the alternatives for the number of categories, the amount of truncation Figure 6. The original and unified fire occurrence prediction (FOP) classification scales. Because of order-of-magnitude differences in FOP, a separate scale was originally used for each cause: linear for lightning (black) and subjectively determined, irregular, nonlinear for human (blue). The unified scale (red) is a systematic, nonlinear one generated by a power function; a cube root in this example.
Several alternatives were considered for a unified scale that accommodated the conflicting needs of fine resolution at the low end and a high upper limit. Discussions with decision-makers indicated that concern increases relatively quickly as the likelihood of fire rises from zero. Providing a high resolution at the low end while retaining a high upper limit would require a great many colours (for example, like those of precipitation radar maps). That would be unfamiliar and confusing and would not correspond with psychological colour associations nor accommodate colour vision deficiencies. Piecewise linear and irregular scales were explored, but their abrupt changes made them difficult to interpret. We wanted a smooth, systematic progression of category boundaries and tested logarithmic and power functions. Those functions can be made to match fairly closely, but the power function had a more suitable shape at the low end. A power function takes the general form f (x) = ax b , with the shape controlled by parameters a and b. Our desired behaviour for the scale to increase quickly for low FOP but then increase progressively more slowly is provided when a > 0 and 0 < b < 1, since this family of power functions is monotonically increasing and concave down. The FOP scale is x, and the category scale is f (x). We developed a parametric scaling tool with three inputs to generate and plot boundaries: shape parameter, 1 /b; the upper limit of the FOP scale, FOPMax; and the number of categories, NumCat. The boundary for the top of category Cat is Any FOP > FOPMax stays in the highest category. A category for true zero can be added if required. For convenience, we parameterized 1 /b, for which we tested values in the range of 1.5 to 4.5; for example, 2 yields a square root shape. a works out to be NumCat FOPMax b . The red curve in Figure 6 illustrates the boundaries for 1 /b = 3, FOPMax = 3 and NumCat = 20; for example, the boundary for Cat = 11 (the top of category 11) is~0.5. The tool lists the boundaries and graphs them as in Figure 6.
While working directly with subject matter experts, the tool facilitated the joint testing of alternatives for the number of categories, truncation and nonlinearity design options. These alternatives plus colouring needed to be adjusted simultaneously when trading off the goals because their effects interact. The FOP outputs for a set of representative days were mapped using R [35] for candidate sets of boundaries and colouring. Table 5 lists ways in which the alternatives for the number of categories, the amount of truncation and nonlinearity and number of hues generally interact in affecting several attributes related to completeness and accuracy of information or speed and ease of understanding. There are exceptions for some combinations and edge conditions. Every alternative for the design options improves some attributes and worsens others. The trade-off behaviour is more straightforward to work with than it may seem because the tool shows most of the trade-offs immediately. The difficulty lies in subjectively assessing the results and compromising on the attributes and goals. Table 5. Tabulation of how alternatives for the number of categories, amount of truncation and nonlinearity and number of hues generally interact in affecting several attributes related to completeness and accuracy of information or speed and ease of understanding. There are exceptions for some combinations and boundary conditions. Every alternative improves some attributes (blue-grey shading) and worsens others (orange-tan shading).

Case Study-Step 4: Results of the Design Process
We state our current category and colouring design and give the rationale for the trade-offs made. The pressure to have a small number of categories and colours (~4) could not accommodate the need for fine resolution at the low end; we used 20 categories (plus a true zero if needed). Figure 5b-d show the same FOP data as Figure 5a but with 4, 10 and 20 categories, respectively. Only the largest number of categories reveals the meaningful network pattern of lines and nodes that corresponds roughly with roads and settlements.
In addition, a highly nonlinear scale was needed to show the patterns in the data. We used a nonlinear-systematic scale with boundaries obtained using Equation (1) with parameters 1 /b = 3, FOPMax = 3 and NumCat = 20. The boundaries are given in Table 6, stated in units of fires/cell and cells/fire. The final shape of the nonlinear scaling corresponds to parameter settings of a = 20·3 − 1 /3 and b = 1 /3.
which is illustrated by the red curve in Figure 6.   Figure 6. The original and unified fire occurrence pred order-of-magnitude differences in FOP, a separate scal for lightning (black) and subjectively determined, irregu scale (red) is a systematic, nonlinear one generated by a Several alternatives were considered for a conflicting needs of fine resolution at the low en with decision-makers indicated that concern incre of fire rises from zero. Providing a high resolutio upper limit would require a great many colours ( radar maps). That would be unfamiliar and conf psychological colour associations nor accommoda linear and irregular scales were explored, but thei interpret. We wanted a smooth, systematic progre logarithmic and power functions. Those functions the power function had a more suitable shape at t general form ( ) = , with the shape controlle behaviour for the scale to increase quickly for lo more slowly is provided when > 0 and 0 < < is monotonically increasing and concave down. Th is ( ). We developed a parametric scaling tool boundaries: shape parameter, 1 ⁄ ; the upper lim number of categories, . The boundary for Any FOP > stays in the highest ca added if required. For convenience, we parameter the range of 1.5 to 4.5; for example, 2 yields a . The red curve in Figure 6 illustrates the and = 20; for example, the boundary fo ~0.5. The tool lists the boundaries and graphs them While working directly with subject matte testing of alternatives for the number of categor options. These alternatives plus colouring neede trading off the goals because their effects int representative days were mapped using R [35] colouring. Table 5  Regarding truncation, we considered the rarity and operational importance of extreme FOP magnitudes. The 20th category ends at 3 fires/cell according to Equation (1), but that category is used for all higher FOP magnitudes. The maximum FOP in Table 4 is ≈3.9 fires/cell.
Regarding colouring, we used a nonlinear, divergent scale with one hue for the low end and multiple hues for the high end ( Table 6). The hues transition from light blue to yellow through orange to red, which mostly follows the traditional blue-to-red progression. Avoiding green in that sequence accommodates some types of colour vision deficiency [31]. Even though there are 20 categories, having four main colours is easy to interpret and consistent with other CFFDRS outputs (for example, Table 1). The gradient within each main colour is difficult to match with the legend, but nonetheless reveals meaningful spatial patterns in the maps. Figure 7a-c show the final categorization and colouring scheme in example daily FOP maps of human-and lightning-caused fires and total fires, respectively. We intentionally show maps as formatted for operational use. They are intended for display on large monitors, but these reduced versions still show the colouring and categorization results adequately. Larger versions are provided in the Supplementary Material.  Note that a sequential scale is logical for FOP because any non-zero FOP is "bad" in this context. But the calming colours are assigned to very low magnitudes, and this provides a slightly greater distinction between the remaining colours. A much greater distinction could easily be achieved by adding more colours-for example, magentapurple-black for the highest categories, which would also draw more attention to the critical extremes. This, however, would not accommodate colour vision deficiencies.
Broad adjective categories were added to emulate the familiar four-or five-category pattern and simplify interpretation. Those boundaries fit the general association between Ontario's fire arrival density and fire situation severity. The three-significant-digit category boundaries were replaced in the legend by integers for selected category midpoints or boundaries ( Table 6). Note that the units change from fires/cell to cells/fire to show integer magnitudes, which are far more meaningful than fractions.
3.6. Case Study-Step 5: Evaluating and Revising the Scheme Several significant revisions were done in arriving at the current design in Figure 7. The first lightning-caused FOP maps had a linear scale with four equal-interval categories using traditional blue-green-yellow-red. When human-caused FOP was added, the scale was changed to 10 categories using a smooth green-yellow-orange-red transitional gradient and later to a special blue-yellow-orange-red to accommodate some colour vision deficiencies. As stated above, the inconsistent subjective scales were replaced completely in early 2020 per our method. The maps originally showed the expected number of fires/cell but were changed to show the density of the expected number of fires/unit area because of fractional cells. Note that the map legends intentionally omit this complication; density seems to be the automatic, intuitive interpretation. Additional revisions are planned, and further evaluation is always ongoing.

Discussion
While implementing the FOP models for operational evaluation, we arrived at and applied this method, which may seem well structured and straightforward now, but it was far from that during the design work. The method emerged as a by-product, having evolved during iterative deliberations. As illustrated in the introduction and identified in [9], the categorization of the FWI System outputs could benefit from this approach. We have since applied the method to categorize and colour outputs from other models for operational display [39][40][41], examples of which are in the Supplementary Material. The many considerations discussed in this paper need to be addressed to ensure that the models are interpreted and used appropriately. We emphasize that any schemes including those built into software applications need this careful consideration. Incorporating the outputs from scientific models into operational decision-making is not straightforward.
The key for a successful application is that researchers and practitioners work together closely throughout the process, from problem identification through to implementation and evaluation [1]. Through this collaborative approach, outcomes tend to have a higher acceptance and usefulness.
An additional factor not addressed by our method is that categories need to be meaningful for more than just map design, because there is a tendency to extend the use categories to guidelines and standard operating procedures and vice versa [9]. The classifications and their boundaries can also be used as unwritten mental shortcuts or heuristics in the place of a more deliberative consideration of complex information. Well-designed, science-based classifications can be consistent with less time-constrained situational analyses, while poor classifications may lead to suboptimal decision-making.
Design considerations for presenting quantitative data for decision support go beyond the categorization and colouring of numerical scales. Also important are options for spatial resolution and the use of simulated three-dimensional displays, examples of which are given in the Supplementary Material [42].