Mapping Forest Health Using Spectral and Textural Information Extracted from SPOT-5 Satellite Images

Forest health is an important variable that we need to monitor for forest management decision making. However, forest health is difficult to assess and monitor based merely on forest field surveys. In the present study, we first derived a comprehensive forest health indicator using 15 forest stand attributes extracted from forest inventory plots. Second, Pearson’s correlation analysis was performed to investigate the relationship between the forest health indicator and the spectral and textural measures extracted from SPOT-5 images. Third, all-subsets regression was performed to build the predictive model by including the statistically significant image-derived measures as independent variables. Finally, the developed model was evaluated using the coefficient of determination (R2) and the root mean square error (RMSE). Additionally, the produced model was further validated for its performance using the leave-one-out cross-validation approach. The results indicated that our produced model could provide reliable, fast and economic means to assess and monitor forest health. A thematic map of forest health was finally produced to support forest health management.


Introduction
Forests are the largest terrestrial ecosystems on the earth and play a significant role in providing ecological, economic and social benefits [1,2].However, the total service that they could provide significantly depends on forest stand conditions.Pfilf et al. [3] stressed that how forest disturbance events play out depend on stand condition, and ignoring it "is tantamount to ignoring the health of the forest community."Forests of rich species composition and complex structure (good condition) are documented to be capable of providing much more ecological services as well as timber production compared with forests that have a simple structure (poor condition) [4][5][6].This might be the reason why, currently, irregular forest management towards complex structure and high diversity is being widely adopted and prevails [7][8][9].Forest health, a more formal and scientific term, is normally used in forestry to describe the forest stand condition.While this term first appeared in the forestry literature in the 1980s [10,11], there was no widely accepted definition for almost 10 years.The lack of a universal definition hindered the assessment of forest health as well as the monitoring of its dynamics.In this context, O'Laughlin et al. [12] integrated the definitions of forest, ecosystem and health and finally defined forest health as a condition of forest ecosystems that sustained their complexity while providing for human needs.This definition made a great effort to combine the social, ecological and economical perspectives [13] and was adopted by the US forest service and is frequently used in the forestry literature [13][14][15].
In addition to the precise definition, how to measure forest health or how forest health could be quantitatively represented arose when we wanted to assess the current forest health conditions and monitor their dynamics.A large number of studies have been dedicated to derive quantitative measures, i.e., forest health indicators.In Europe and North America, the tree crown condition, e.g., defoliation and crown dieback, is the most widely used indicator for forest health evaluation [16][17][18].For instance, the USDA Forest Health Monitoring (FHM) Program has systematically produced a series of crown condition indicators to quantify the forest health condition, and the related materials can be found at [19].To assess the forest health in East China, Wang et al. [20] first calculated the five crown condition indicators live crown, crown density, crown diameter, crown dieback, and foliar transparency, and then assigned these indicators to different health categories according to the USFS (United States Forest Service) Crown Condition Classification Guide (CCCG) standard.Similarly, Wang et al. [21] reported considerable forest damage of Masson pine trees in southern China by monitoring the crown condition.The importance of the tree crown condition for representing forest health can be attributed to its significant role in regulating forest ecosystem as well as its high sensitivity to natural or anthropogenic disturbance [22,23].Zarnoch et al. [22] documented the significance of tree crowns in regulating solar energy, nutrient recycling, precipitation distribution and moisture retention in forests.Additionally, other indicators, e.g., soil properties [24], lichen communities [25], mycorrhizal mushroom diversity [26] and faunal taxa [27] were also documented to assess forest health.All these precedents indeed provided us a promising and quantitative means to understand and monitor forest health.These indicators, however, focused on only one aspect to describe the health condition.Actually, the mechanism that determines forest health is a complex process, which might include the complicated interaction between biotic and abiotic elements.As a result, forest health might not be fully represented if we examined only one aspect.Therefore, more aspects such as the site quality, forest species composition and forest structure, which might influence the forest ecosystem process as well as forest health, should be taken into consideration for better exploration of forest health.
Regardless of the indictors we employed for forest health evaluation, the data collection process for producing these indicators through a field survey was rather difficult, expensive, time-consuming and not spatially exhaustive.Fortunately, remote sensing provided us a rapid and economic approach to obtaining such data.In fact, remote sensing has been extensively employed in estimation of forest stand attributes such as forest biomass and carbon [26,[28][29][30], forest diversity [31][32][33][34][35], leaf area index [35][36][37], forest age [38][39][40] and tree height [41][42][43][44].In addition, it is also commonly used to produce forest maps, which serve as a fundamental basis for forest management and forest inventory [45][46][47][48].For example, using waveform Lidar, Hyde [49] produced forest structure maps, which were prerequisites for wildlife habitat analysis.Chuvieco and Congalton [50] mapped forest fire hazards using remote sensing and geographic information systems.
The most commonly used remote sensing data in forestry are airborne Lidar data and optical multispectral satellite data.Although Lidar demonstrates a very promising capability to predict forest stand attributes, especially tree height estimation [51][52][53], its application is largely restricted due to its high cost [4,54].In contrast, the optical remote sensing methods are more cost-effective and repetitive, and their performance is also reported to be as promising as airborne Lidar for most forest attributes.For instance, Wallis et al. [55] compared the performance of Lidar and optical satellite data in modeling the spatial distribution of bird biodiversity across a complex tropical mountain forest ecosystem and found that except for phylodiversity, the optical satellite data showed almost the same efficiency as the Lidar data in predicting the Shannon diversity and a measure of community structure.These authors therefore concluded that the optical multispectral satellite data could replace costly Lidar data for modeling certain aspects of biodiversity.Similarly, Maack et al. [56] found that Stereo-VHR images showed a great potential for canopy height model generation and could be an adequate alternative to Lidar and InSAR techniques.Using high spatial resolution aerial photos (1.0 m spatial resolution), Meng et al. [57] demonstrated the capability of Fourier-based textural ordination (FOTO) indices to obtain a higher prediction quality of forest aboveground biomass compared with Lidar.
In the present study, we investigated the potential of mapping forest health conditions using SPOT-5 satellite images.The objectives of the present study were to: (1) derive a comprehensive forest health indicator using forest stand attributes from field survey data; (2) investigate the correlations between the forest health indicator and the imagery-derived spectral and textural measures; (3) build a model predicting the forest health indicator using imagery-derived measures as potential independent variables; and (4) produce a forest health map based on the developed model.

Field Data
The field survey data were obtained from the eighth Chinese National Forest Inventory (CNFI) of the Guangxi Zhuang Autonomous Region in 2010.The sampling design of the CNFI was systematic sampling with a square grid of 4 × 6 km, i.e., one plot per 4 × 6 km grid cell (Figure 1).Square plots were employed with a size of 0.067 ha.In each plot, the DBH of all trees was measured, and each tree was identified to the species level.The coordinate information for each tree was recorded for spatial analysis.In addition to the sampled trees, information concerning the sample plots, e.g., elevation, slope, aspect, soil type, soil depth, and canopy closure, was also documented.The CNFI was carried out every 5 years to monitor the forest dynamics.In the present study, 233 sampling plots were included in the satellite images, of which only 48 plots were identified as forest stands.Unfortunately, 9 plots were found to lack plot information.Therefore, 39 plots were employed to derive forest stand variables.These forests are mostly degraded secondary forests caused by historic disturbance, which are now under strict protection for self-restoration.The main management objective of these forests is to reconstruct the mixed-species irregular forest structures.

Remote Sensing Data and Processing
We used 3 scenes of SPOT-5 images to derive the imagery variables (independent variables).These images were captured on 21 September 2010, with the K-J numbers 275/300, 275/301, and 275/302.The multi-spectral images have 4 bands, i.e., near-infrared (0.78-0.89 mm), red (0.61-0.68 mm) and green (0.50-0.59 mm) bands at a resolution of 10 m and a shortwave infrared band (1.58-1.75mm) at a resolution of 20 m.The panchromatic image was recorded at a resolution of 2.5 m.
Prior to analysis of the images, geometrial and atmospheric corrections were conducted by the Survey & Planning Institute of State Forestry Administration, China.Terrestrial control points were used for geometrical correction with a differential GPS.Atmospheric correction was conducted using the improved Dark Object Subtraction method proposed by Castillo-santiago et al. [58].

Forest Stand Attributes for Calculating the Forest Health Indicator
The candidate forest stand attributes used to derive the forest health indicator included both traditional stand attributes and forest structural diversity indices.The traditional forest variables, i.e., quadratic mean diameter (QMD), basal area (BA), number of trees (NT) and stand volume (SV), were calculated for each plot.Since forest structural diversity provided more details on forest structure and had an important underlying implication for formulating a forest management strategy [4,59,60], we also calculated structural diversity indices to derive the forest health indicator.Structural diversity can be subdivided into 3 categories: tree species diversity, tree dimension diversity and tree position diversity [61,62].In the present study, we used the Shannon-Wiener index (SHI), Simpson index (SII) and Pielou's evenness index (PI) to characterize species diversity.Tree size diversity was measured by the Gini coefficient (GC) and the standard deviation of the DBHs (SDDBH).Tree position diversity was represented by the uniform angle index (UAI), tree species intermingling index (TSII), DBH dominance index (DBHDI) and diameter differentiation index (DDI).A detailed description of these indices can be found in Meng et al. [63].Additionally, we also included humus depth (HD) and canopy closure (CC) for the calculation of the forest health indicator.HD can be representative of site quality, whereas CC might represent the competition status of the forest stands.

Forest Health Indicator Derivation
We performed factor analysis to produce the forest health indicator using all 15 stand attributes mentioned above.Prior to factor analysis, Bartlett's test of sphericity and the Kaiser-Meyer-Olkin index were conducted and calculated to investigate whether there was a correlation between these variables, which is a prerequisite for factor analysis [64,65].
Factor score and factor weight were calculated using the following formulas: where F jm is the score for the m-th factor in plot j, W m is the weight of the m-th factor, D mi represents the score coefficient of the i-th variable for the m-th factor, C m accounts for the variance explained by the m-th factor, and U ij represents the i-th standardized forest stand attributes in plot j.
The standardized forest stand attributes have a mean value of 0 and a variance of 1.The standardization is conducted using the following formula: where X ij is the observed value for the i-th variable in plot j, and X i and S i represent the mean value and standard deviation of the i-th variable, respectively.The forest health indicator for plot j (FHI), is calculated as follows: Because the produced FHI has both negative and positive values, it is not very convenient for comparison and classification.We therefore transformed this variable using the following formula to generate the transformed final forest health indicator (FFHI): where H min and H max are the minimum and maximum value of FHI, respectively.
The FFHI ranges from 0 to 10 and is therefore more convenient for use in assessing and monitoring forest health conditions.

Spectral Measures
The average surface reflectance of each plot was first extracted using the multiple spectral bands (mean_green, mean_red, mean_swir and mean_nir) as well as the panchromatic band (mean_pan).Additionally, we also calculated the 10 vegetation indices that are documented to be widely used in forestry researches.The formulas of these indices and their relevant applications in forestry are summarized in Table 1 [54,63,[66][67][68][69].
Table 1.Spectral vegetation indices derived from the SPOT-5 images.

Textural Measures
First-and second-order textural measures were extracted for each plot.We used the standard deviation of gray levels (SDGL) as the first-order textural statistics: where k is the number of reflectance values in the window, and µ is the mean reflectance value.We calculated the SDGL for all multispectral reflectance bands, i.e., near-infrared, red, green and the shortwave infrared band.
In comparison to the first-order textural measures, the second-order textural measures were calculated using only the panchromatic band, which was documented to be well suited to textural analysis due to its relatively high spatial resolution [63,70].In the present study, we employed the grey level co-occurrence matrix (GLCM) as the second-order textural measure to represent the textural features.The 8 GLCMs and their relevant studies in forestry are summarized in Table 2 [58,61,71,72].

Textural Measures
Formula Reference The textural measures are multi-scale phenomena and therefore, the window size was a significant component when calculating these textural statistics [20,73].A smaller window might not contain sufficient information about the area, whereas a larger window might result in edge effects or boarder problems [20,58].The most common approach for determining the optimum window size is to compare the correlation coefficient between the textural statistics and the dependent variables (forest stand attributes) [63] or classification accuracy [20] at different window sizes.The optimum window size is the one that should represent a trade-off between a desirable high correlation coefficient or classification accuracy and a desirable minimum window size [58].A window size of 9 × 9 pixels was reported to be optimal by Shaban and Dikshit [74], Castillo-Santiago et al. [58] and Meng et al. [63], who also employed SPOT-5 satellite images in their studies.Following them, we used a 9 × 9 pixel window as the optimum to derive the textural measures.

Statistical Methods
We first conducted pairwise correlation analysis to identify the image-derived measures that were significantly correlated with the FFHI.These significant image-derived variables were then employed as independent variables to build the predictive models using all-subsets regression.As multicollinearity normally occurs between remotely sensed variables [38,39,75], we used a cut-off value for the variance inflation factor (VIF) of less than 4 and also restricted the number of independent variables to 4 to avoid multicollinearity.In regression, it was assumed that there was no homoscedasticity and the residuals did not deviate significantly from normality [76].We then produced residual plots.Additionally, the Shapiro-Wilk test and Breush-Pagan test were also performed to respectively investigate the normality and homoscedasticity of the residuals.The produced models were evaluated for precision using the coefficient of determination (R 2 ) and the root mean square error (RMSE).The predictive models were further validated for their performance or robustness using the leave-one-out cross-validation procedure by calculating the corresponding cross-validated coefficient of determination (R 2 cv ) and the root mean square error (RMSE cv ).

Experimental Procedure
The workflow for the derivation of the forest health indicator, the extraction of imagery measures, the development of the predictive model and forest health mapping is shown in Figure 2. We first developed the forest health indicator with 15 stand variables using factor analysis.Second, spectral and textural measures were extracted from the SPOT-5 images with the optimum window size.Third, the predictive model predicating the forest health was built using the imagery-derived measures as independent variables by performing all-subsets regression.Finally, we produced the thematic map for forest health using the developed model.
Remote Sens. 2016, 8, 719 7 of 20 employed as independent variables to build the predictive models using all-subsets regression.As multicollinearity normally occurs between remotely sensed variables [38,39,75], we used a cut-off value for the variance inflation factor (VIF) of less than 4 and also restricted the number of independent variables to 4 to avoid multicollinearity.In regression, it was assumed that there was no homoscedasticity and the residuals did not deviate significantly from normality [76].We then produced residual plots.Additionally, the Shapiro-Wilk test and Breush-Pagan test were also performed to respectively investigate the normality and homoscedasticity of the residuals.The produced models were evaluated for precision using the coefficient of determination (R 2 ) and the root mean square error (RMSE).The predictive models were further validated for their performance or robustness using the leave-one-out cross-validation procedure by calculating the corresponding cross-validated coefficient of determination (R 2 cv) and the root mean square error (RMSEcv).

Experimental Procedure
The workflow for the derivation of the forest health indicator, the extraction of imagery measures, the development of the predictive model and forest health mapping is shown in Figure 2. We first developed the forest health indicator with 15 stand variables using factor analysis.Second, spectral and textural measures were extracted from the SPOT-5 images with the optimum window size.Third, the predictive model predicating the forest health was built using the imagery-derived measures as independent variables by performing all-subsets regression.Finally, we produced the thematic map for forest health using the developed model.

Forest Health Indicator Derivation
Bartlett's test of sphericity (chisq = 717.23,df = 105, Sig.= 0.00) and the Kaiser-Meyer-Olkin index (MSA = 0.66) indicated that there was correlation between these 15 independent variables, which suggested factor analysis could be conducted to reduce the dimensions of the independent variables and form several comprehensive components (factors) to represent the forest health condition.The total variance explained by the factors is shown in Table 3.The cumulative variance explained by the first five components accounted for 83.9% of the total variance, which indicated that these five factors represented most of the original information (Table 3).The factor loadings, which are the correlation coefficients between the factor and original variables (15 stand attributes), are listed in Table 4 for each factor.The variables with an absolute value of the coefficient above 0.5 are normally said to be dominating factors and allow for meaningful interpretation of the factor [77,78].The first factor (F1) accounted for 29.4% of the total variance and was highly correlated with TSII, SHI, PI and SII.Therefore we regarded F1 as the indicator for species diversity.The second component (F2) represented 20.4% of the total variance and showed high correlations with UAI, BA, NT, SV and CC.F2 was considered to represent the competition status of the forest stand.The third component (F3) explained 19.9% of the total variance and was highly correlated with DDI, QMD and SDDBH.The fourth component (F4) accounted for 7.2% of the total variance and was only highly correlated with DBHDI.We might consider F3 and F4 as indicators of tree size diversity.Only 7.1% of the total variance was explained by the fifth component (F5), and it was only highly correlated with HD, which suggests F5 is a good indicator of site quality.Numbers in bold font denote a dominating indicator (factor loading ≥0.5 or ≤−0.5).
Table 5 depicts the score coefficients for each stand attribute, which can be used to calculate the factor score for each factor.Using Equations ( 1)-(5), the FFHI was derived for each plot, which is listed in Table 6.

Correlation Analyses
The correlation analyses between the FFHI and spectral measures are summarized in Table 7.The average surface reflectance of all bands was significantly negatively correlated with the FFHI, with correlation coefficients ranging from −0.548 to −0.606.In contrast, only half of the 10 vegetation indices, i.e., brightness, NDVI, SR, VI and SAVI, indicated a significant correlation with the FFHI.Brightness and SAVI showed a negative correlation with the FFHI, whereas a positive pattern was observed for NDVI, SR and VI.Brightness had the largest absolute correlation coefficient with respect to the FFHI (0.656) followed by VI (0.633) SAVI (0.547) and SR (0.333).
Among the textural measures, the second-order statistics demonstrated much higher correlation with the FFHI than the first-order statistics.Five of the eight second-order statistics, i.e., Glcm_contrast, Glcm_correlation, Glcm_dissimilarity, Glcm_mean and Glcm_Variance, were significantly correlated with the FFHI, among which only Glcm_correlation had a positive correlation.The absolute correlation coefficient values were 0.607, 0.548, 0.543, 0.540 and 0.320 for Glcm_mean, Glcm_dissimilarity, Glcm_Variance, Glcm_contrast and Glcm_correlation, respectively.In comparison, the first-order statistics did not show any significant correlation with the FFHI expect for SDGL_red.In summary, there were a total of 12 imagery derived measures that were significantly correlated with the FHEI, i.e.Mean_green, Mean_swir, Mean_nir, Mean_red, Mean_pan, Brightness, Glcm_contrast, Glcm_correlation, Glcm_dissimilarity, Glcm_mean, Glcm_Variance and SDGL_red.All of these measures were considered as potential independent variables for producing the predictive model.

Model Establishment and Forest Health Mapping
The predictive model and its general statistics are listed in Table 8.While 12 imagery derived measures were significantly correlated with the FFHI (Table 7), only mean_swir and mean_pan were included as independent variables due to multicollinearity.We considered this predictive model to be appropriate for estimating FHEI since it was statistically significant (p < 0.01) and its correlation coefficient (R 2 ) was 0.47, which was very close to 0.5, the cut-off value.The performance of this linear model was further substantiated using the cross-validation scores calculated from the leave-one-out cross-validation approach (R 2 = 0.43, RMSE cv = 1.804).The residual plot of the model is presented in Figure 3, and we did not observe any particular patterns or trends.Furthermore, the Shapiro-Wilk (W = 0.97506, p-value = 0.5282) and Breush-Pagan (BP = 0.0032012, df = 1, p-value = 0.9549) tests statistically demonstrated the normal distribution and homoscedasticity of the residuals.Based on this model, the thematic map of forest health was produced (Figure 4).In this thematic map, forest health was grouped into four categories with the same FFHI interval lengths, i.e., 0-2.5, 2.5-5.0,5.0-7.5 and 7.5-10.0.
The residual plot of the model is presented in Figure 3, and we did not observe any particular patterns or trends.Furthermore, the Shapiro-Wilk (W = 0.97506, p-value = 0.5282) and Breush-Pagan (BP = 0.0032012, df = 1, p-value = 0.9549) tests statistically demonstrated the normal distribution and homoscedasticity of the residuals.Based on this model, the thematic map of forest health was produced (Figure 4).In this thematic map, forest health was grouped into four categories with the same FFHI interval lengths, i.e., 0-2.5, 2.5-5.0,5.0-7.5 and 7.5-10.0.The residual plot of the model is presented in Figure 3, and we did not observe any particular patterns or trends.Furthermore, the Shapiro-Wilk (W = 0.97506, p-value = 0.5282) and Breush-Pagan (BP = 0.0032012, df = 1, p-value = 0.9549) tests statistically demonstrated the normal distribution and homoscedasticity of the residuals.Based on this model, the thematic map of forest health was produced (Figure 4).In this thematic map, forest health was grouped into four categories with the same FFHI interval lengths, i.e., 0-2.5, 2.5-5.0,5.0-7.5 and 7.5-10.0.

Forest Health Indicator Derivation
Forest indicators are crucial for forest health evaluation and might directly influence the final evaluation results [12].Consequently, many studies have been conducted to investigate the selection and formulation of these indicators.For instance, O'Laughlin et al. [79] subdivided the indicators into seven categories, i.e., soil, water, vegetation, animals, ecosystem cycling, landscape patterns, and non-native plants and animals.Additionally, they also identified individual indicators for some categories.For example, microbial activity, litter dynamics and the soil productivity index were suggested to represent the soil status.Visual symptom of foliar damage, tree growth efficiency and understory vegetation were considered to be good indicators of the vegetation status.O'Laughlin and Cook [12] systematically grouped the forest health indicators into two classes, i.e., indicators for measurement of stand structure (e.g., stand density, species composition, mortality/growth rate and growth-to-removal ratio) and other ecosystem structures and processes (e.g., soil and watershed processes, tree physiology, insect populations, tree resistance to insects and micro-environmental variables as well as nutrient cycles, energy flows, and ecological processes facilitating recovery from damage).Woodall et al. [80] reported that the forest indicators for the US Forest Inventory and Analysis (FIA) included tree crown condition, lichen communities, forest soils, vegetation diversity, downed woody material, and ozone injury, which represented a compromise between budgetary constraints and field sampling efficiency.In our present study, we selected 15 stand attributes to derive a comprehensive forest health indicator, and these stand attributes covered many aspects of the forest ecosystem.For instance, QMD, BA, NT and SV accounted for the tree growth efficiency; SHI, SII, PI, GC, SDDBH, UAI, TSII, DBHDI and DDI were representative of forest structural diversity; and HD represented the soil status or site productivity.In comparison to indicators only focusing on one aspect, e.g., tree crown condition [22], soil property [24], and mycorrhizal mushroom diversity and productivity [26], the forest health indicator generated here by factor analysis using these 15 stand attributes might contain slightly more information and might better represent the forest health condition.In factor analysis, it is generally understood that the use of large samples tends to provide results such that sample factor loadings are more precise estimates of population loadings and also more stable, or less variable, across repeated sampling [81,82].For instance, Comrey [83] and Gorsuch [84] reported that 50 was a minimum sample size for factor analysis.In the present study, there were only 39 independent plot observations, and only 15 variables were used in the factor analysis.This small sample size might make our forest health indicator less precise and stable.However, many authors reported different opinions.For example, de Winter et al. [82] found that when data are well conditioned, i.e., high loadings, low number of factors, and high number of variables, factor analysis can yield reliable results for a sample size well below 50, even in the presence of small distortions.In fact, there is considerable divergence of opinion and evidence related to the minimum sample size for generating reliable results using factor analysis, and the recommendations and findings are diverse and even contradictory [81].
While our indicator can provide a general description of forest health status, it might not be suitable for health assessment for a particular purpose because the health condition of the same forest might differ significantly with different management objectives.For example, if we used our indicator to evaluate the health status of a eucalypt planation, which is mainly grown for timber production, an unhealthy status would be returned since eucalypt plantations have extremely low species diversity.However, it should be judged healthy from a timber production perspective.In fact, Tuominen et al. [12] already pointed out that based on different perspectives, the definition of a healthy forest could even appear contradictory.Therefore, O'Laughlin et al. [79] suggested that forest scientists and managers should work together with their customer to design forest health indicators for desired conditions and particular purposes.
Although our health indicator derived from factor analysis contains much information on forest structure and diversity and might well represent the forest status, we consider that our current definition of forest health, i.e., a more complex structure characterizes a healthy forest, might be rather broad and simplistic.Rather than a forest health indicator, this indicator might be more appropriate as an indicator of forest structure.However, since forest functions are determined by forest structure [63], a more complex structure might mean multiple functions and higher stability and hence represent a healthier forest status.Additionally, because the forests we studied are mainly degraded secondary forests, which are under strict protection for self-restoration, we therefore assumed that the more complex the forest structure, the healthier the forest, though it is not applicable or even wrong in most conditions.Regardless, our indicator allowed for a general assessment and comparison of forest status through a spatial and temporal horizon, which served as significant information for formulating forest management strategies.However, we would like to stress again that this health indicator might not be applicable under certain conditions, i.e., plantations grown mainly for timber production, and its validation should be judged by the management objectives.In these conditions, this so-called forest health indicator developed in the present study only represent indicator of stand structure and should be never used for forest health assessment.Therefore, we strongly recommend that both the management objectives and the forest current status should be carefully assessed before using the methods suggested in the present study for forest health assessment.Additionally, we simply classified our forests into four groups with the same FFHI interval rather than assigning them to different health categories.This was because it was extremely difficult to determine the category thresholds.However, the criteria to define a reasonable ecosystem threshold have been documented in detail by many authors [85][86][87].Following these criteria, further research on the determination of a reasonable forest health category is therefore strongly encouraged.

Predictive Model Development
The coefficient of determination (R 2 ) is normally used to evaluate the modeling efficiency [88].Ozdemir and Karnieli [61] stated that 0.5 is the threshold for selecting a reliable predictive model and argued that models with R 2 values lower than 0.5 are incapable of providing reliable predictions.However, many studies reported different opinions.For instance, Murfitt et al. [89] developed seven models predicting a forest health score using seven individual vegetation indices as independent variables, and the R 2 values ranged from 0.23 to 0.38.They used a model with an R 2 of 0.38 to map ash health for an entire area.The individual basal area growth model for Pinus halepensis produced by Condés and Sterba [90] had an R 2 of 0.36, which was increased to 0.47 by including additional random effects.Therefore, we concluded that we succeeded in producing a model capable of predicting a forest health indicator, although the R 2 was only 0.47.Furthermore, the predictive ability of our model was also supported by the leave-one-out cross-validation.
While 16 imagery-derived measures (10 spectral measures and six textural measures), which were significantly correlated with the health indicator, were employed as potential independent variables to build the model, the final predictive model contained only two variables, i.e., mean_swir and mean_pan, which made the model a simple measure of the amount of forest cover.This might be due to the problem of multicollinearity.Moreover, it is also noteworthy that these two variables were spectral measures, and no textural variables entered into the model.In contrast, many researches have reported the promising performance of textural measures in forest classification as well as in predicting forest stand attributes.For instance, Kayitakire et al. [91] produced models predicting top height, circumference, stand density and age variables using the second-order texture derived from IKONOS-2 imagery, and the correlation coefficients ranged from 0.76 to 0.82.Pu and Chen [92] demonstrated that texture-based features had higher capability than spectrum-based features for estimating and mapping forest LAI using WorldView-2 images.Johansen et al. [93] reported that the classification accuracy of vegetation structure was increased by 2%-19% with the inclusion of textural measures derived from QuickBird images.These studies had at least one aspect in common, i.e., the satellite images used were of very high spatial resolution.The reason for the absence of textural measures in our model might be attributed to the relatively lower spatial resolution of the SPOT-5 images, which could not detect the spatial variability of the forest health indicator.A similar result was reported by Castillo-Santiago et al. [58], who also employed both textural and spectral measures extracted from SPOT-5 images to predict stand variables; these authors reported that textural measures were not included in the predictive models.An even worse correlation between the textural measures extracted from TM images (much coarser spatial resolution) and forest stand variables was documented by Cohen and Spies [94].In fact, considerable literature has already demonstrated that the significance of textural measures increased with the spatial resolution of the image [95,96].This is because the higher the spatial resolution, the more pixels represent the ground objects and therefore, the textural information becomes increasingly significant [97].
In this study, we used a regression technique to build a predictive model.Although this statistical approach has been extensively used in remote sensing studies of the environment and forestry [63,67,[98][99][100], many authors do not recommend it due to its rigid assumptions of the data.In regression analysis, there are four principal assumptions, i.e., linearity and additivity of the relationship between dependent and independent variables, statistical independence of the errors, homoscedasticity of the errors and normality of the error distribution [76].However, under most conditions, the assumptions cannot be met since ecological and remotely sensed data are highly complex and in most cases, nonlinear relationships are often observed between these two types of datasets [39].Furthermore, in order to correct the multicorrelation of the independent variables, certain variables, which significantly account for the total variance, might be excluded from the predictive model.For instance, the textural measures in the present study, which did not enter into our final predictive model, might be excluded due to multicollinearity concerns, though some of them showed significant correlations with the derived FFHI.Therefore, using simple linear regression, we might run the risk of losing information richness.Additionally, in the case of possible non-linearity, we might only use a simple linear regression approach.In this present study, only 39 plots were used to derive the predictive model.However, many studies used a similar or even lower number of plots to produce models predicting forest attributes using variables extracted from satellite images, e.g., Means et al. [101] used 19 plots, Ozdemir and Karnieli [61] used 29 plots, and Cohen and Spies used 41 plots [93].Consequently, we consider that 39 plots were sufficient to produce a model of moderate precision and generality.
Instead of simple linear regression, more robust statistical methods, which do not make any assumptions about the data, have been extensively documented to explore the relationship between forest stand attributes and remotely sensed data.The most frequently used approaches in forestry include regression and decision trees [102,103], artificial neutral networks [70,104] random forests [105][106][107], and support vectors [108,109].The size of the training data set for machine learning greatly influences the stability and accuracy of the trained model [110].Koprinska [111] recommended a ratio of at least 10 times more training instances than features.In the present study, we had 15 features, and the minimum training data set should contain at least 150 features according to the recommendation of Koprinska [111].Unfortunately, we only had 39 plots, which prevented us from using a machine learning algorithm to produce the predictive model.If the training data set is sufficient, these robust statistical approaches are encouraged, and the minimum training data sets could be defined to create a learning curve that generates (average) model performance as a function of the training sample size [112,113].

Implication for Forest Management
Forest health management is widely applied to avoid catastrophic forest disturbance or to maintain forest stability, productivity and vitality.For instance, Oliver et al. [114] documented that thinning an overly dense stand by removing excess stems would reduce the susceptibility of the remaining trees to insects, diseases, and fires, thus reducing the potential for catastrophic fires.Using stand density management diagrams, López-Sánchez and Rodríguez-Soalleiro [115] and Castedo-Dorado et al. [116] designed optimal quantitative thinning schedules for Pseudotsuga menziesii and Pinus radiata plantations, respectively, to avoid or reduce the potential of windthrow or forest fire.A detailed review article on forest health management was proposed by Oliver et al. [114], who demonstrated the necessity of forest health management, explored the way in which forests should be managed to restore and maintain health, described the available tools and barriers associated with forest health management, and discussed the potential way to reduce the management cost and to overcome the barriers.
In the present study, we succeeded in producing the quantitative tools, i.e., the predictive model and forest health map, for forest health management.These tools provide forest managers a detailed and quantitative picture of the distribution of forest health status and therefore, provide significant insights into formulating forest management strategies.Similarly, Haywood and Stone [117] have already stated that the identification of forest health status could assist forest managers in prioritizing and formulating management strategies.For instance, forest fire monitoring should be intensified in poor health areas since these forests are more susceptible to forest fire.The thematic map could also be used to determine the area with top priority when conducting restoration activities.With the chronological thematic maps, the change in forest health can be detected, and the corresponding forest management strategies and policy can be formulated.Similarly, many studies have been conducted where remote sensing techniques have been used to assist in forest health management.For example, Xiao [118] used multispectral remote sensing data and GIS techniques to determine tree health at the University of California, Davis.Using WorldView-2 (WV2) imagery, Murfitt et al. [89] produced a remote sensing-based method for mapping ash trees undergoing various infestation stages.Both of these efforts could directly indicate or estimate the exact health status of the trees and therefore, provide solid and straightforward information for forest managers.
While our predictive model is a promising tool for prescribing forest management strategies, it should be used with great care.For example, incorrect results might be generated if we used the model to predict forest health in an area beyond which the model was developed.Further, the prediction might not be reliable if we used the model in a forest with extremely poor health, for which the model has demonstrated poor predictability.Budget-permitting, the model is encouraged to be further improved by including additional ground plots and by using very high spatial solution imagery as well as more robust statistical approaches.

Conclusions
Forest health assessments are of great importance for the formulation of forest management strategies as well as forest policy.However, the lack of rapid, economic and quantitative measures of forest health hinders our understanding of forest health status.In the present study, we derived a quantitative forest health indicator using 15 forest stand attributes.Additionally, we also succeeded in building a model to predict the forest health indicator using imagery-derived variables.This predictive model allowed for a rapid, economic and quantitative assessment of the forest health status, which could facilitate the decision making process of forest health management planning.Budget-permitting, the predictive model is strongly encouraged to be improved by increasing the number of survey plots and employing Lidar and other satellite imagery with very high spatial resolution.

Figure 1 .
Figure 1.Geographical location of the study area and the images and sampling plots used in the study.

Figure 2 .
Figure 2. A flow chart of the forest health indicator derivation, imagery measures extraction, prediction model development and forest health mapping.

Figure 2 .
Figure 2. A flow chart of the forest health indicator derivation, imagery measures extraction, prediction model development and forest health mapping.

Figure 3 .
Figure 3. Plot of predicted FFHI against observed FFHI and the residual plot.

Figure 4 .
Figure 4. Thematic map of the final forest health index (FFHI) for a county in the Guangxi Autonomous Region.

Figure 3 .
Figure 3. Plot of predicted FFHI against observed FFHI and the residual plot.

Figure 3 .
Figure 3. Plot of predicted FFHI against observed FFHI and the residual plot.

Figure 4 .
Figure 4. Thematic map of the final forest health index (FFHI) for a county in the Guangxi Autonomous Region.

Figure 4 .
Figure 4. Thematic map of the final forest health index (FFHI) for a county in the Guangxi Autonomous Region.

Table 3 .
Total variance explained by the components generated using factor analysis.

Table 4 .
Rotated component matrix generated by factor analysis.

Table 5 .
Component score coefficient matrix generated by factor analysis.

Table 6 .
Final forest health indicator (FFHI) for each plot.

Table 7 .
Pearson correlation coefficients between the image-derived measures and the FFHI.

Table 8 .
Regression model predicting the FFHI derived using all-subsets regression.