Association of Recent Incidence of Foliar Disease in Pine Species in the Southeastern United States with Tree and Climate Variables

: Pine forests in the southern United States are a major contributor to the global economy. Through the last three decades, however, there have been concerns about the decline of pine forests attributed mostly to pests and pathogens. A combination of biotic agents and environmental factors and their interaction often inﬂuences outbreaks and the resultant damage in the forests. Southern pines experience periodic mortality from bark beetles and root rot fungi and losses from fusiform rust and pitch canker have long been important for management. In recent years, there is also growing evidence of increasing damage from foliar disease in southern pines. Early detection of diseases following changes in foliar characteristics and assessment of potential risks will help us better utilize our resources and manage these forests sustainably. In this study, we used Forest Inventory and Analysis (FIA) data to explore the intensity of foliar disease in three common pines: loblolly ( Pinus taeda L.), longleaf ( Pinus palustris Mill.), and slash ( Pinus elliottii Engelm.) in spatial and temporal terms using tree-level and climatic variables. Results from a tree-level model suggests that crown ratio may be an important factor in pine foliar disease ( p < 0.1). We applied the MaxEnt model, a presence-only species distribution model (SDM), to explore any association of foliar disease incidences with the climatic variables at a landscape level. Results indicate that mean dew point temperature, maximum vapor pressure deﬁcit, and precipitation during cold months had more inﬂuence over disease incidences than other climatic variables. While the sample size is limited as this is an emerging disease in the region, our study provides a basis for further exploration of disease detection methods, disease etiology studies, and hazard mapping.


Introduction
A deeper understanding of forest health consequences from global economic trade, changing land use, climate change, and management practices is a key area for investigation to provide accurate projections of forest carbon storage and productivity. Emerging pests and pathogens pose a large threat to the ecological and economic stability of forested ecosystems. A triangular interaction between host trees, pests/pathogens, and environmental factors is often described as a major contributor in potential disease outbreaks [1]. With predicted changes in climate, there will be potential shifts in favorable environments for both hosts and pests or pathogens; however, there are counter arguments on whether there would be significant increase in damage from these biotic factors [2]. Early detection is key to

FIA Data
We initially screened Forest Inventory and Analysis (FIA) data from 1998 to 2019 [30], however we limited our analysis to data in the time period between 2013 and 2019, as there were no incidences of foliar diseases recorded prior to these years (refer Supplementary IV). We also selected only those FIA plots that had recorded at least one loblolly, longleaf or slash pine tree >12.7 cm in diameter at breast height. FIA crews are foresters who go through extensive training and quality checks to assure the validity of data. For this analysis, we considered a disease incident as any record of foliar disease which, according to FIA inventory guidelines, includes any damage to at least 20% of the foliage with at least 50% damage to the needles/leaves [31]. According to the FIA field manual, these damages are particularly caused by fungi with needle loss leading to growth reduction and potential mortality [31]. Although the exact codes that represent foliage damage in FIA data have changed, there has always been this damage available to record by crews in FIA inventory using this same definition. Therefore, since this category of damage was only recorded since 2013, we are assuming that this trend of emerging foliar disease in FIA data matches well with our recent field observations, but cannot definitely determine if the foliar disease was a needle cast. Finally, we subset the data to include only the plots that were measured during winter to late spring (between November and June) to focus on the time period when needle cast symptoms would be most probably observable [26,27].

Spatio-Temporal Distribution
We performed an exploratory spatial analysis to understand the distribution of the disease using hot spot analysis in ArcGIS (ArcGIS Desktop: Release 10.7.1, Redlands, CA: Environmental Systems Research Institute). We used each county as the unit to investigate the spatial pattern of foliar disease. Counties within the southeastern states were limited by the overall distribution of FIA plots with at least one individual of the species of interest. First, we calculated the total number of plots affected with disease within each county using the 'spatial join' tool. Next, we applied hot spot analysis in ArcGIS spatial tools to explore hot and cold spots in disease distribution, based on Getis-Ord Gi * statistic [32]. We assumed that there was an equal probability of disease incidence across the selected counties. The resulting map shows significant hot or cold spots at different significance levels based on the Getis-Ord Gi * statistic.

Tree-Level Analysis
We wanted to explore the relation between presence or absence of foliar disease with different tree-level variables (Table 1). We wanted to understand if initial tree conditions, size and crown ratio, influenced the likelihood of foliar disease, so we subset the data to include only trees that had a previous measurement reducing disease incidences further. A total of 9567 plots and 149,936 trees were selected for this portion of the study, where 21 plots and 44 individual trees were found to have foliar disease between 2013 and 2019 in the months between November and June. Table 1. List of predictor variables used in the generalized mixed model.

Predictor Variable
Diameter at breast height (cm) Crown ratio, (%) Diameter growth from past inventory (cm) Crown ratio change from past inventory (%) We used the following generalized linear mixed model with a binomial error distribution [33] Pr where, α j[i] is the grouping variables with random effects, and β k[i] are the individual covariates with fixed effects. We used this model at tree-level data, where trees were units under consideration for fixed effects and plots were the groups considered for random effects. Each tree would have presence or absence of the disease along with independent covariates including diameter, compacted crown ratio (hereafter referred as 'crown ratio'), and changes in those variables (Table 1). Crown ratio is defined as the percentage of tree bole supporting foliage which is visually compacted to fill in gaps in the crown [31]. The plot-level random intercept accounts for unmeasured plot level variables and reduces the weight of plots with many infected trees in the model.
We used the glmer function in the R 'lme4' package [34] in R version 4.0.2 [35] to model disease incidence. Predictor variables were centered on their means and scaled to one standard deviation to allow for direct comparisons of relative influence of each predictor on the odds that a tree would be infected, thus allowing direct comparison of the strength of each tree-level predictor variable in predicting disease incidence [33]. Variable inflation factor (VIF) for all the variables were calculated [36] to observe any correlation among the predictor variables and avoid over-fitting.

Range-Wide Climatic Analysis
To better understand the climatic correlates that might result in foliar disease, we used the machine learning method: MaxEnt. MaxEnt is a free software [37] that uses Maxentropy theory to predict species distribution probability based on climatic or environmental variables [7,38]. The MaxEnt model is a common approach used in species distribution modelling (SDM), with good prediction performance for a limited number of observations [39][40][41]. In this method, the SDM is based on presence-only data rather than presence-absence data. A major assumption in using the presence-only data is that the sampling should be well representative of the entire landscape [38,42,43], as it is more sensitive to this assumption compared to presence-absence data. Because FIA data is a representative sample of the landscape being investigated [44,45], this assumption was met.
In this analysis, we provide a list of samples with presence-only locations along with the environmental variables (predictors) in a grid format for the entire landscape under investigation. MaxEnt randomly selects background locations from within the landscape and compares them against the locations where the disease was observed. Predictors are used in a machine learning approach where the predictors are first transformed by simple mathematical transformations (e.g., linear, quadratic, and product functions) called features [42] to establish a relationship of the target variable with the environment variables before upscaling the probability to the entire study area. MaxEnt maximizes the gain function which is a penalized form of maximum likelihood function Supplementary Equation. While maximizing the gain function, the model tries to differentiate between presence and background locations. In addition, we adopted an optimum regularization coefficient to maximize AUC from the model validation.
We used annual subsets from the years 2016 and 2017 to run the MaxEnt model separately for each year because these two years had the highest number of observations, with 11 observations for 2016 and 12 observations for 2017. For the MaxEnt model, however, we did not exclude trees without previous measurements since we were only using spatial patterns at the landscape level. To derive the climatic predictor variables, which are listed in Table 2, we used climate mean data from 1981 to 2010 [46] and specific years of data from 2015 and 2016 [47]. For each of the climatic variables, we used deviation from the historical mean (data from respective year-historical mean) to better portray annual variation in the respective climate data. Coldest months from last season includes data from December of the previous year, January of the current year, and February of the current year. We masked out part of western Texas from these environmental layers because there were no representatives of the pine species of interest.

Spatio-Temporal Distribution of Pine Foliar Disease
Based on our Getis-Ord hot spot analysis, we identified those counties that had higher incidences compared to others. Specifically, we observed three hot spots with significant disease observations ( Figure 2), and of these, the most intense hot spot was observed in LA, east TX, and MS.

Tree Level Analysis
We explored the distribution of each tree level predictor variable against disease presence or absence ( Figure 4). Crown ratio and crown ratio change had the greatest apparent difference between trees with and without foliar disease, while only a small difference was observed for the diameter variables (i.e., diameter and diameter growth).

Tree Level Analysis
We explored the distribution of each tree level predictor variable against disease presence or absence (Figure 4). Crown ratio and crown ratio change had the greatest apparent difference between trees with and without foliar disease, while only a small difference was observed for the diameter variables (i.e., diameter and diameter growth).

Tree Level Analysis
We explored the distribution of each tree level predictor variable against disease presence or absence (Figure 4). Crown ratio and crown ratio change had the greatest apparent difference between trees with and without foliar disease, while only a small difference was observed for the diameter variables (i.e., diameter and diameter growth).  We explored the relationships of the following explanatory variables with the disease, using a mixed modeling approach as given below.
Disease~Diameter + Diameter Growth + Crown Ratio + Crown Ratio change + (1|Plot) The use of a mixed model was justified as a large portion of total variance was captured by the plot variable (random effect) which means there was a good deal of difference among plots in the baseline probability of having a diseased tree. We tested fixed effect components using both Z-test and chi-square test. VIF among all variables were lower than 1.41, suggesting weak multicollinearity among explanatory variables. While crown ratio showed some signal in predicting the disease (Table 3) it was not statistically significant, and the other predictors were not as strong. Trees with larger crown ratios were associated with lower disease probability.

Range-Wide Climatic Variables
We tested multiple MaxEnt models and selected models for 2016 and 2017 that had the highest AUC values by year (Table 4). For 2016, the best AUC was observed with the beta multiplier of 0.90 while for the 2017 model, best results were found at 0.10 beta multiplier. Model for 2016 had an AUC of 0.73 which would produce a prediction well above that of a random chance. The performance of the 2017 model was slightly lower compared to 2016 model, with AUC of 0.70. Most influential climatic predictors in both models were mean dew point temperature, maximum vapor pressure deficit, and precipitation during cold months in decreasing order for both models. Mean dew point temperature was found most dominant in the 2016 model, while no such strong influence of any single predictor was seen for the 2017 model. Three of the predictors in the 2016 model and two in the 2017 model were not used at all in the selected models owing either to their low influence or correlation with other predictors.
The jackknife test for the 2016 model further highlights that mean dew point temperature and maximum vapor pressure deficit are strong predictors of the disease even if used alone ( Figure S1), with 0.68 AUC for the former and 0.67 AUC for the later predictor. By leaving out mean dew point temperature or maximum vapor pressure deficit from the overall model, we would only achieve a AUC of 0.64 and 0.67, respectively. On the other hand, removing either of the unused variables from the overall model did not change the model's explanatory power. We removed annual mean temperature and annual mean precipitation from the 2017 model as these variables reduced overall AUC. Among the predictor variables used, results suggest that mean dew point temperature and maximum vapor pressure deficit have strong predictive power as well as marginal influence. Both the 2016 and 2017 models suggest that mean dew point temperature close to normal is related to higher disease probability and major deviation from normal would reduce probability. The response curve for maximum vapor pressure deficit behaved differently for two models ( Figure S2). Based on the 2016 model, higher vapor pressure deficit compared to normal would suggest higher disease presence, but in 2017 the relationship was the opposite until the point where maximum vapor pressure deficit was 3 hpa higher than the normal. After this threshold, however, it behaved similar to the 2016 model where disease probability increased with higher vapor pressure deficit compared to normal. Both SDM models suggest higher disease probability with mean precipitation during cold months being close or slightly less than normal. After specific points (corresponding to 50 mm for the 2016 model and 10 mm for the 2017 model), higher mean precipitation than normal suggested decrease in disease probability compared to lower precipitation.
MaxEnt models produced logistic output maps showing the probability of disease distribution ranging from 0 to 1 across the landscape ( Figure 5). The models were based on the gain function (Supplementary I) added from a total of 12 features (linear and quadratic features) derived from each of the 6 climatic variables.

Discussion
Early detection of emerging pests and diseases is a critical step to halting outbreaks [4,48]. However, the lack of extensive research regarding the epidemiology of their possible causal agents, which usually characterizes emerging diseases, makes the task even more difficult. While models built with limited data have limited statistical power, in the case of emerging diseases, they are nevertheless a useful tool, as they can identify trends that can provide support for the design of detection protocols for further investigation. Needle casts of southern pines are a relatively unstudied phenomenon, especially if compared with other more aggressive needle casts and blights such as Swiss needle cast [49] or Dothistroma needle blight [50]. The MaxEnt SDM model developed in this study suggested an association of the climatic variables with the emergence of needle casts on southern pines. We restricted the type of predictors used in the model to climatic variables as this was the primary focus of our study. The 2016 model produced more areas with higher disease probability compared to the 2017 model, but this was expected because the two models had different regularization constants (beta multipliers), which produce different levels of smoothing in the output probability. We would expect better model performance with larger sample sizes, but given our limited observational data, MaxEnt is the most appropriate SDMs as it outperforms most of other modeling approaches [39][40][41]. Hernandez et al. 2006 [41] found good model performance with MaxEnt compared to other SDMs with a sample size as low as 5 when the sample is well representative (unbiased) of the landscape like the FIA sample.
While a careful consideration of environmental factors is important for MaxEnt, further studies with edaphic and topographic variables may enhance the goodness of fit of the model [51]. We found that, at the range level, incidence of foliar disease was related to mean dewpoint, vapor pressure deficit, and cold season precipitation. The climatic factors we identified as important to disease presence-precipitation and humidity-are in agreement with some previous studies of needle cast diseases [24,29,52], however the direction of the associations with disease did not match well with these previous studies. For example, Drenkhan [29] found that Swiss needle cast incidence is related to higher precipitation and humidity using summer-time precipitation and humidity, and Wyka et al. [52] found that warmer climate and wetter springs favor development of needle diseases on white pine (P. strobus L.), but we found lower probability of disease with higher winter precipitation and humidity. One of the reasons for this mismatch could be the difference in the seasons used in each analysis. It is also noteworthy that previous studies, although within the broad umbrella of needle casts, focused on different fungal species and plant hosts. Our results suggest a degree of annual variability in the prevalence of foliar diseases in southern pine species, which is potentially driven by these climatic factors. Monitoring programs and remote sensing campaigns aimed at detecting needle casts in southern pines could be targeted towards areas experiencing such conditions during the critical late winter and early spring months to assess the degree and severity of infection. Additionally, raising the awareness of this emergent disease may help lead to greater reporting of incidents, which in turn would help provide better understanding of the phenomenon.
At the tree level, the vast majority of monitored trees did not meet the threshold for recording foliar disease during the years included in this study, but those that did, tended to have smaller crown ratios. This prompted us to ask the question whether this was a possible cause or the result of the disease. The FIA field manual [31] indicates that damaged or diseased foliage is not supposed to be excluded from estimating the crown ratio of a tree, which suggests that the crew members should not have reduced the crown ratio due to the presence of disease, but we cannot rule out bias in this measurement due to loss of needles. At any rate, the change in crown ratio between censuses was not significant, which suggests that the diseased trees had a stable crown ratio between censuses, which further suggests a causal relation between crown ratio and likelihood of foliar disease. The limited number of data on the disease incidence is likely a contributing factor to the poor association between tree-level traits and disease [53]. Moreover, the detail provided by FIA data is not enough to allow the discrimination of needle cast from other foliar diseases. To focus on highly probable cases of needle cast, we limited our disease observation to the seasonal window when needle cast diseases are likely to be symptomatic, but we cannot rule out some level of human error in misidentification of foliage damage from some factors other than fungi, hence introducing some noise to the observed data. Cases falling below the damage code thresholds could also lead to underrepresentation of the disease abundance across the study area.
While the term needle cast for southern pines is generic and includes several species, we have ongoing studies to better identify the causal agents of the recently emerging disease in the southern pine forests that will complement this study, and hopefully provide forest managers with useful information for the development of a management strategy. Further work with a higher number of observations, as well as better data filtering and ground truthing, could be helpful in confirming the identified associations of the disease at individual and range level. Limited data also suggests that foliar damage was not a widespread problem during the years included in this study, but continued monitoring is crucial to detect a potential increase in prevalence in a timely manner.

Conclusions
In this study, we explored whether tree-level variables could be related to foliar disease in the southeastern pine species. We observed a marginal association of crown ratio with the disease occurrence while there was no significant relationship with tree diameter. We also found that there was no significant change in crown ratio or diameter for the trees affected by the foliar disease between the last two inventories. A small number of foliar disease observations limits our inference about foliar diseases in southern pine species studied. The MaxEnt model, which we used to predict disease probability based on environmental variables, produced disease probability maps across the landscape despite the low incidence and supported our hypothesis that the foliar disease was influenced by the temporal patterns in precipitation and humidity. Continued monitoring and identification of causal agents will help pinpoint the disease dynamics and their impact on southern pines.

Supplementary Materials:
The supplement area available as designated. The following are available online at http://www.mdpi.com/1999-4907/11/11/1155/s1, Figure S1. Jackknife test using AUC on test data for different predictor variables for the 2016 and 2017 models. We removed annual mean temperature and annual mean precipitation from the 2017 model due to correlation with other predictor variables. Figure S2. Response curves of three most influential climatic variables for 2016 and 2017 models. These curves are based on the MaxEnt model created using only each of these variables.
Author Contributions: K.P., and D.J.J. conceived this study. K.P. and D.J.J. performed the analyses and wrote the initial drafts of the manuscript. K.P., J.S., T.Q., C.V., and D.J.J. contributed to the final version of the manuscript. All authors have read and agreed to the published version of the manuscript.
Funding: Support for this project was provided by the USDA FS FHP Emerging Pest Line Program (Challenge Cost Share Agreements with UGA no. 19-CS-11330129-036 and Challenge Cost Share Agreements with UFL P0128048).