Number Preference as a Source of Measurement Error in the U.S. National Forest Inventory

: Number preference, i.e., the human tendency to gravitate toward or away from speciﬁc numbers, is a potential source of measurement error in forest inventory. Identifying its presence is an important step to ensure unbiased results. This study evaluated U.S. national forest inventory data for number preference and identiﬁed factors that inﬂuence the proportion of tree cull volume, tree diameter, tree height, and seedling count observations ending with the digit zero or ﬁve (ED 0,5 ) and seedling count observations that were multiples of four (M 4 ). Two-sided hypothesis tests determined that ED 0,5 occurred signiﬁcantly more frequently than expected by chance for all metrics tested, though not in every inventory region of the country nor to the same degree. Consistently, tree-level ED 0,5 was more likely when metrics were estimated visually rather than measured instrumentally. Logistic regression indicated that the effect of species class, species type, tree status, treetop status, and stem size on tree-level ED 0,5 and the effect of plot-level water depth on seedling count ED 0,5 also varied by region. Though the effect was small, ﬁndings suggest that some inventory regions may be employing an approved multiplicative shortcut that results in a greater-than-expected proportion of M4 observations among seedling counts.


Introduction
The goal of forest inventory is to accurately characterize forest attributes and do so with reasonable precision. As such, estimates of forest attributes are typically presented in terms of a confidence interval x ± cs x (1) where x is the attribute mean, s x is the standard error of the mean, and c is a constant determined by the desired level of confidence. For any given level of confidence, the width of the confidence interval is determined by the error, or variability, in the sample. In general, there are three sources of error for forest inventories: sampling error, modeling error, and measurement error [1]. Emphasis usually focuses on sampling error, with some attention given to modeling error. Measurement error typically receives little attention because it is commonly assumed that observations are made without any error or with an error that is small and inconsequential when compared to other sources of error [2]. In reality, observational errors are unavoidable and may be quite severe. Their presence can produce biased and imprecise estimates and mask true relationships [3,4]. Common sources of measurement error in a forest inventory include uncalibrated or faulty equipment, negligent record keeping, and lax or improper field techniques. A less cited source of measurement error occurs when data are coarsened by rounding or other approximation. Number preference (NP), i.e., the human tendency to gravitate toward or away from specific numbers [5], is a type of data coarsening. NP occurs in a variety of circumstances, ranging from baseball players modifying their at-bat strategy to end the season with a batting average over 0.300 [6] to diners leaving gratuities in wholedollar amounts or in amounts that make the total bill a whole-dollar amount [7]. Other

Materials and Methods
The data used in this study were collected by the Forest Inventory and Analysis (FIA) program of the U.S. Department of Agriculture, Forest Service (Forest Service). FIA inventory plots are located across the U.S. quasi-systematically with a baseline sampling intensity of 1 plot per 2428 ha [20]. Some states, national forests, and other areas are sampled at intensities two or three times that of the baseline. Each plot consists of four 7.32 m fixedradius subplots on which trees ≥ 12.7 cm in diameter are measured. Observations on trees < 12.7 cm in diameter are made on a 2.07 m fixed-radius microplot within each subplot. The cluster of subplots is arranged with one central subplot, and three other subplots located 36.58 m from the central subplot at azimuths of 0 • , 120 • , and 240 • . Each plot is monumented, georeferenced, and measured on an ongoing basis once every 5-10 years.
When plots are partially forested or straddle heterogeneous forest conditions, they are subdivided by a procedure known as condition mapping [20]. Multiple conditions are classified on the basis of reserved status, owner group, forest type, stand size class, regeneration status, and tree density [21]. Several ancillary attributes are used to further describe the condition classes but are not used to delineate new classes. Any number of condition classes may be recorded for each plot. NP 0,5 was evaluated for three tree-level attributes: rotten/missing cull volume, diameter, and actual height. NP 0,5 and NP 4 were evaluated for the microplot metric seedling tree count. Rotten/missing cull volume (cull) is the estimated percentage of tree volume that is rotten or missing. Cull is visually estimated and recorded to the nearest 1%. The diameter of timberland tree species is recorded at breast height (d.b.h.), typically 1.37 m above the ground line on the uphill side of the tree. For woodland (mostly multi-stemmed) species, diameter is recorded at the stem root collar (d.r.c.) or groundline, whichever is higher. Diameter is measured instrumentally unless circumstances warrant otherwise and recorded to the nearest 0.254 cm. Actual height (height) is the tree length from ground level to the highest remaining portion of the tree still present and attached to the bole. Height is measured instrumentally unless circumstances warrant otherwise and recorded to the nearest 30.48 cm. The seedling count is the number of live trees with a diameter < 2.54 cm present on the microplot. To qualify for counting, conifer (softwood) seedlings must be at least 15.24 cm tall, and hardwood seedlings must be at least 30.48 cm tall. Seedlings are tallied by species and condition class up to a count of five and estimated beyond that. Tree-, condition-, and subplot-level data for the most recent available inventory year of all states except Hawaii were included in the analysis (Figure 1). Data were collected with protocols outlined in FIA field guide versions 7-9 [21]. Only plots of the baseline sampling frame were included. The number of tree and seedling count observations available for analysis varied by region and ranged from <1000 to >235,000 (Table 1). All data are available to the public through the FIA online database [22] and were downloaded during the first week of July and the second week of August 2022. lings are tallied by species and condition class up to a count o that.
Tree-, condition-, and subplot-level data for the most rec of all states except Hawaii were included in the analysis (Fi with protocols outlined in FIA field guide versions 7-9 [21 sampling frame were included. The number of tree and seedl able for analysis varied by region and ranged from <1000 to are available to the public through the FIA online database during the first week of July and the second week of August    To test for NP 0,5 , each cull, diameter, height, and seedling count observation was assigned its end digit (ED i ). For example, the end digits of numbers 7, 19, and 23.6 were 7, 9, and 6, respectively. End-digit assignments for diameter and height were based on the U.S. customary units of measure employed by FIA (inches and feet, respectively). The proportion of observations ending in zero or five (p 0,5 ) was estimated for each attribute with a logistic regression model. Cull values = 0%, seedling counts < 11, and seedling counts observed in condition classes with a stocking value < 10, i.e., non-stocked conditions, were not included. Confidence intervals for p 0,5 (α = 0.01) were computed with a Wald-type interval on the log odds scale and transformed to the probability scale. Analyses were completed with R [23] packages survey [24] and srvyr [25]. Tree and seedling count observations were treated as being clustered on plots by designating plot identification number as the primary sampling unit, i.e., cluster variable, in the survey design specification. Estimations of p 0,5 were made for each attribute by region: Interior West (IW), Northern, Pacific Northwest (PNW), and Southern ( Figure 1). In the absence of NP, the digits i = 0, 1, . . . , 9 were expected to occur with equal frequency (p i = 0.1) at the end of a number. Therefore, the null hypothesis for NP 0,5 was The null hypothesis was rejected if the 99% confidence interval for p 0,5 did not include 0.2.
The test for NP 4 was limited to seedling count and warranted by a recommended tally shortcut: when seedlings are distributed evenly on a microplot, inventory crew members may estimate the total count by multiplying the number of seedlings on one-quarter of the microplot by four [21]. Therefore, all seedling counts > 10 were categorized as either a multiple of four (M 4 ) or not a multiple of four. Procedures used to estimate p 0,5 were repeated to estimate the proportion of M 4 seedling counts (p 4 ). Seedling counts made on microplots with more than one condition class were excluded. The proportion of M 4 numbers from 11 to 999 (the maximum seedling count allowed) is approximately 0.25. Thus, the null hypothesis for NP 4 was The null hypothesis was rejected if the 99% confidence interval for p 4 did not include 0.25.
Multivariate logistic regression [26] was used to identify factors associated with ED 0,5 and M 4 . Four tree-level attributes were included as potential predictors of cull ED 0,5 : species group (hardwood, softwood), species type (timberland, woodland), tree status (live, standing dead), and treetop status (intact, broken/missing). Five tree-level attributes were included as potential predictors of diameter ED 0,5 : diameter point (at breast height, above breast height, below breast height, root collar), method (measured, estimated, different location), species group, stem size (sapling, tree), and tree status. Five tree-level attributes were included as potential predictors of height ED 0,5 : method (measured, estimated), species group, species type, stem size, and tree status. A detailed description of these factors is provided in Table S1. Seven condition-level attributes and one subplot-level attribute were included as potential influencers of seedling count ED 0,5 and M 4 : stand size (small, medium, large), stand origin (natural, artificial), disturbance (undisturbed, disturbed), treatment (untreated, treated), depth of water or snow on the subplot (<3 cm, 3-30 cm, >30 cm), owner group (Forest Service, other federal, state/local government, private), physiography (mesic, hydric, xeric), and slope (0%-155%). A detailed description of these factors is provided in Table S2. Some factors were omitted in some regional regressions due to inadequate sample sizes.
For the ED 0,5 regression, an end digit of zero or five was considered a success (S = 1), and any other end digit a failure (S = 0). For the M 4 regression, multiples of four were considered successes (S = 1), and other values were considered failures (S = 0). The probability that S = 1 was modeled for each attribute by region in the linear form as where parameter β i refers to the effect of attribute x i on the log odds that S = 1, controlling for all other attributes. Dichotomous (0/1) dummy variables were used to represent the categorical attributes. Parameters were estimated with a logit link function under a quasibinomial distribution with R [23] packages survey [24] and srvyr [25]. Tree and seedling count observations were treated as being clustered on plots by designating plot identification number as the primary sampling unit, i.e., cluster variable, in the survey design specification.

Rotten/Missing Cull Volume
Severe heaping at values ending in zero or five was evident in the frequency distribution of cull observations ( Figure 2). The null hypothesis (Equation (1)) was rejected (α = 0.01) for all four FIA regions (Table 1). Factors influencing ED 0,5 (α = 0.01) varied among regions ( Table 2). All other attributes being equal, the odds for ED 0,5 was significantly greater for woodland tree species in the Southern region and trees with broken/missing tops in the IW, Northern, and Southern regions than for timberland tree species and trees with intact tops, respectively. In contrast, the odds for ED 0,5 was significantly less for softwood trees than hardwood trees in the IW and Southern regions. Tree status was the only significant attribute associated with ED 0,5 in the PNW region. There, ED 0,5 was less likely for standing dead trees than live trees; the opposite was true in the IW and Southern regions.  predicting the probability that rotten/missing cull volume ends with region.   4 x 4 predicting the probability that rotten/missing cull volume ends with the digit 0 or 5 by region.

Diameter
Though heaping at values ending in zero or five was not readily apparent in the frequency distribution of tree diameters (Figure 3), the null hypothesis (Equation (1)) was rejected (α = 0.01) for all four FIA regions (Table 1); however, no p 0,5 was greater than 0.22 in any region. The method by which the diameter was acquired and the size of the tree being measured were the only significant (α = 0.01) predictors of ED 0,5 (Table 3). This was the case in all regions except the IW, for which no attribute proved to be significant. All other attributes being equal, estimated diameters were 1.5 to 2.1 times more likely to exhibit ED 0,5 than measured diameters, and diameters < 12.7 cm (saplings) were 9% less likely to exhibit ED 0,5 than diameters ≥ 12.7 cm.   predicting the probability that diameter ends with the digit 0 or 5 by region.  Table 3. Estimated coefficients (and standard errors) for the model logit(π) = α + β 1 x 1 + β 2 x 2 + · · · + β 8 x 8 predicting the probability that diameter ends with the digit 0 or 5 by region.

Actual Height
Some heaping at values ending in zero or five was apparent in the frequency distribution of height ( Figure 4). The null hypothesis (Equation (1)) was rejected (α = 0.01) for all four FIA regions, though p 0,5 was no greater than 0.24 in any one region ( Table 1). All of the attributes included as potential predictors of ED 0,5 were significant (α = 0.01) in at least one region (Table 4). Similar to diameter, estimated heights were more likely to exhibit ED 0,5 than measured heights, but only in the Northern and PNW regions. Heights of softwood trees in the Northern and PNW regions were less likely to exhibit ED 0,5 than hardwood trees. In addition, less likely to exhibit ED 05 were heights of woodland species in the IW region and heights of standing dead trees in the Northern region. ED 0,5 was more likely to be exhibited among heights of trees ≥ 12.7 cm d.b.h./d.r.c. than among heights of smaller (sapling-sized) trees in the Northern and Southern regions; this is the opposite of what was observed for diameter. Table 4. Estimated coefficients (and standard errors) for the model logit(π) = α + β 1 x 1 + β 2 x 2 + · · · + β 5 x 5 predicting the probability that actual height ends with the digit 0 or 5 by region.

Seedling Count
Some heaping at values ending in zero or five was apparent in the frequency distribution of seedling count ( Figure 5). The null hypothesis (Equation (1)) was rejected (α = 0.01) for the IW region only (Table 1). Disturbance and water class were the only qualitative attributes significantly (α = 0.01) associated with ED 0,5 and only in the Southern region (Table 5). There, seedling counts made on disturbed subplots or subplots with ≥3 cm of standing water were 1.5 and 2.1 times more likely to exhibit ED 0,5 , respectively, than those made on undisturbed subplots or subplots with <3 cm of standing water. In addition, in the Southern region, a 1% increase in slope was estimated to have a multiplicative effect of 0.98 on the odds of seedling count ED 0,5 . the multiplication-by-four shortcut to estimate seedling count may be emplo ten in the Northern and Southern regions than in the IW and PNW region was the only attribute significantly (α = 0.01) associated with ED4 and only region (Table 6), where all other attributes being equal, M4 seedling counts dium-sized stands were 81% less likely than those made in small-sized stand    Table 5. Estimated coefficients (and standard errors) for the model logit(π) = α + β 1 x 1 + β 2 x 2 + · · · + β 14 x 14 predicting the probability that seedling count ends with the digit 0 or 5 by region.  The single condition class requirement applied to the analysis of M 4 had a minimal effect on the data, reducing the number of observations by two in the IW region, three in the PNW region, and four in both the Northern and Southern regions. Among the remaining observations, p 4 was significantly (α = 0.01) greater than expected (Equation (2)) in the Northern (0.28 ± 0.02) and Southern regions (0.30 ± 0.03) but not in the IW (0.26 ± 0.05) and PNW (0.25 ± 0.04) regions. Though small, the regional differences among p 4 suggest that the multiplication-by-four shortcut to estimate seedling count may be employed more often in the Northern and Southern regions than in the IW and PNW regions. Stand size was the only attribute significantly (α = 0.01) associated with ED 4 and only in the PNW region (Table 6), where all other attributes being equal, M 4 seedling counts made in medium-sized stands were 81% less likely than those made in small-sized stands. Table 6. Estimated coefficients (and standard errors) for the model logit(π) = α + β 1 x 1 + β 2 x 2 + · · · + β 14 x 14 predicting the probability that seedling count is a multiple of four by region.

Considerations for Data Quality Control
The application of statistical tools to minimize uncertainty and ensure that data are of sufficient quality is part of the Quality Assurance (QA) program implemented by the U.S. national forest inventory [27]. Evaluating measurement errors resulting from NP is a way to identify aspects of data collection that need adjusting and for which training should be improved. In general, results suggest that emphasized training is needed for attributes that are visually estimated either by design or because of special circumstances. This evaluation also suggests that cull may be measured best in 5% increments rather than 1% increments. Doing so would remove unwarranted expectations of precision and might speed up data collection. Modifying standard operating procedures, however, should be performed carefully and with a thorough evaluation of the consequences, especially when data are used for long-term monitoring because changes may introduce bias and disrupt trend analyses [4]. Moreover, the statistically significant presence of minimal heaping, e.g., in the case of diameter and height, may not translate to practical significance in subsequent analyses and applications.
Without evidence that seedlings are distributed more homogeneously across the forest floor in the eastern U.S. than in the western U.S., the observation that the multiplication-byfour shortcut to estimate seedling count may be employed more often in the Northern and Southern regions than in the IW and PNW regions may indicate a difference in training among the regions. Furthermore, that p 4 was closest to the expected value in the PNW region may be related to the fact that the shortcut is not stated explicitly in the PNW variant [28,29] of the national data collection manual [21].

Source of Number Preference
Identifying the reason for heaped data is complex because NP is the result of multiple interacting factors. These include the inherent nature of numbers, human psychology and behavior, and the circumstances of measurement. Unless there is a context for a different set of numbers, e.g., 7, 14, and 21 for days spent vacationing, multiples of zero and five are typically preferred [30]. This is largely due to the decimal place-value system, which makes factors of 10 and their halves convenient and readily understandable [5].
One theory behind NP is the behavior known as satisficing, i.e., providing a number that is considered "good enough". Giving a satisficed answer requires less knowledge and less effort than providing a precise answer. Krosnick [31], as cited in [10] (p. 191), expressed the probability of satisficing as P satis f icing = a 1 (task di f f iculty) a 2 (ability) × a 3 (motivation) (5) In this model, task difficulty is a measure of the complexity required to retrieve information (from memory or elsewhere) in order to answer a question. Ability is a measure of cognitive competence and experience with the topic under questioning. Motivation is a measure of how important a precise answer is perceived to be and the respondent's interest in the topic of inquiry. When motivation and/or ability are high, the tendency for satisficing decreases. When difficulty is high, the tendency increases. Though initially proposed in the context of economic decision making, this model is applicable to forest inventory where task difficulty can be described primarily by forest conditions, ability by field personnel characteristics, and motivation by both forest conditions and personnel disposition, in addition to business expectations.

Task Difficulty
The difficulty of conducting a forest inventory is largely governed by the size, form, and condition of trees, both individually and collectively. For example, McRoberts and others [32] observed greater discrepancies among repeated diameter measurements on larger trees than on smaller trees, and Westfall [33] found more frequent differences in cull proportions on trees that were minimally or mostly culled than on trees that were moderately cull. In this study, traits for which task difficulty, and thus NP, were expected to be greatest were 1.
Cull measurements of hardwood species due to their deliquescent crown forms [33]; 2.
Cull measurements of trees with broken/missing tops due to their irregular crown form; 3.
Diameters measured above breast height or at root collar because of the awkward positioning observers must achieve in order to obtain the measurements, i.e., stretching high or crouching low; 4.
Heights of hardwood species due to their deliquescent crown form [33]; 5.
Heights of trees greater than sapling size due to poorer lines of sight for the observer because the treetops are taller and farther from the observer and potentially obscured by understory vegetation; 6.
Heights of timberland species due to poorer lines of sight for the observer because timberland species are generally taller than woodland species, which places the treetop farther from the observer, and because timberland stands typically have more crown cover than woodland stands [34,35]; 7.
Seedling counts in small-sized stands and disturbed conditions due to dense understory vegetation; 8.
Seedling counts in conditions where the forest floor is obscured due to snow cover or water; and 9.
Seedling counts on steep slopes because of the precarious stance observers must maintain in order to obtain the counts.
With the exception of numbers 3, 7, and 9, these expectations were met, though not necessarily in every region. Results were most consistent with expectation 2, which was met in three of the four regions (IW, Northern, Southern). Expectation 1 was met in two regions (IW, South), as were expectations 4 (Northern, PNW) and 5 (Northern, Southern). Expectation 6 was met solely in the IW region, and expectation 8 was met only for ED 0,5 in the Southern region.
In addition to tree size, form, and condition, accurate assessments based solely on personal judgment are more difficult to make than assessments with little to no room for individual judgment [32]. Thus, p 0,5 was expected to be greater for cull than for diameter and height, except in instances when the latter two were estimated. This was indeed the case: ED 0,5 among all cull observations was 1.5-5.3 times more likely than ED 0,5 among all diameter observations and 1.2-2.7 times more likely than ED 0,5 among all height observations. Likewise, ED 0,5 was 1.2-2.1 times more likely to occur when diameter and height were estimated visually than when they were measured instrumentally. Fortunately, estimations of diameter and height are relatively rare. In this study, diameter estimation occurred for <2% of the trees observed in any region. Estimated heights were also relatively rare in the IW and Southern regions (≤6% per region) but less so in the Northern and PNW regions (26% in each region).

Motivation and Ability
Neither motivation nor ability was evaluated in this study, but steps could be taken to do so in the future. Of the two, ability is more easily quantified. Practical experience increases familiarity with forest inventory methods and provides exposure to rare and unusual situations by which personal judgments can be refined. Assuming a positive correlation with experience, ability could be measured as the number of years employed or the cumulative number of plots completed.
Observers rarely seek to purposefully bias results [36]; therefore, NP may emerge unintentionally due to diminished motivation caused by mental and physical fatigue [37,38]. Forest inventory crew members are exposed to multiple sources of fatigue during the course of an inventory, including steep topography, dense stands, lengthy traverses, long commutes, and early waking hours. Quantifying these factors as surrogates for motivation could be accomplished at the plot or worker level. Because the inclination for NP varies from person to person, factors such as steps taken, heart rate, and hours slept might correlate more strongly with fatigue and motivation than factors such as slope, trees per hectare, and kilometers driven. Nevertheless, plot-related factors may be the better alternative because worker-specific factors are highly individualized, context-specific, and ideally kept private [39].
In addition to plot-related factors, decreased motivation may be caused by weather conditions. In a meta-analysis of temperature effects on worker performance, Pilcher and others [40] reported that performance declined by >7% at hot experimental temperatures ≥ 26.67 • C, wet bulb globe temperature (WBGT), and cold experimental temperatures < 18.3 • C. Productivity was especially affected (~14% decline) at the hottest experimental temperatures (≥32.22 • C WBGT) and coldest experimental temperatures (<10 • C). The length of exposure at higher or lower temperatures prior to and during the task also affected performance. Although Bowen and others [39] did not find a decrease in performance due to high temperatures during forest harvest operations in New Zealand, workers reported that the work felt harder when temperatures were higher (summertime vs. wintertime). Thus, temperature or season of the year might serve to quantify motivation in future NP analyses.

Study Limitations
Given that 20% of all numbers are expected to end in zero or five, Beaman and others [41] noted a conundrum with the NP concept. That is, p 0,5 includes some values ending in zero or five by chance and some by the preference of the observer. Separating the two so that p 0,5 is a true representation of observer preference requires more than simply subtracting the expected value from the observed value. Three components are required: a model of the underlying distribution as if the data had been reported without NP, an assumption about which true values were assigned to the heaps resulting from NP, and the set of heaps and probability with which the values were assigned to them [15]. Multiple approaches to this problem have been developed (e.g., [12,15,30,36]), but addressing such was beyond the scope of this study.
Data from only one inventory year were included in the study. This was more than enough to achieve an adequate sample size overall, and there is no reason to expect dissimilar results from inventory years not included in the study. Field protocols for the attributes included in the study have been stable for many years, and the spatially and temporally balanced design of the FIA inventory ensures even coverage across the country year to year. Furthermore, the FIA QA program includes a rigorous training and certification regimen for new employees, so yearly differences in data collection technique due to turnover in field personnel should be minimal

Conclusions
Multiple factors at all levels of a forest inventory, from tree and forest conditions to the experience and training of data collection personnel, have the potential to influence the quality of field-collected data. In this study, NP 0,5 was identified as a potential source of measurement error, particularly among rotten/missing cull volumes and estimated diameters and heights. Though often overlooked due to its relatively small contribution to overall error, any measurement error that can be identified should be corrected. As such, improved training and/or modification of field protocols may alleviate unwarranted heaping of ED 0,5 values, especially for visually estimated metrics. This evaluation is just one of many internal feedback procedures promoting continuous improvement of the FIA program. Additional work is needed to fully understand the consequences of using heaped data in population estimates and practical applications.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/f14030459/s1. Table S1: Description of the factors used to predict number preference for rotten/missing cull volume, diameter, and actual height; Table S2: Description of the factors used to predict number preference for seedling count.