Identifying Variables to Discriminate between Conserved and Degraded Forest and to Quantify the Differences in Biomass

The purpose of this work was to determine which structural variables present statistically significant differences between degraded and conserved tropical dry forest through a statistical study of forest survey data. The forest survey was carried out in a tropical dry forest in the watershed of the River Ayuquila, Jalisco state, Mexico between May and June of 2019, when data were collected in 36 plots of 500 m2. The sample was designed to include tropical dry forests in two conditions: degraded and conserved. In each plot, data collected included diameter at breast height, tree height, number of trees, number of branches, canopy cover, basal area, and aboveground biomass. Using the Wilcoxon signed-rank test, we show that there are significant differences in canopy cover, tree height, basal area, and aboveground biomass between degraded and conserved tropical dry forest. Among these structural variables, canopy cover and mean height separate conserved and degraded forests with the highest accuracy (both at 80.7%). We also tested which variables best correlate with aboveground biomass, with a view to determining how carbon loss in degraded forest can be quantified at a larger scale using remote sensing. We found that canopy cover, tree height, and density of trees all show good correlation with biomass and these variables could be used to estimate changes in biomass stocks in degraded forests. The results of our analysis will help to increase the accuracy in estimating aboveground biomass, contribute to the ongoing work on REDD+, and help to reduce the great uncertainty in estimation of emissions from forest degradation.


Introduction
Forest biomes provide important ecosystem services and habitat for many species [1] and sequester carbon through photosynthesis [2]. Forest degradation affects forest structure, function, ecosystem processes, and the capacity for carbon sequestration, generally also diminishing the provision of most ecosystem goods and services [3][4][5]. Unlike deforestation, there is no internationally accepted definition of forest degradation. Here, we adopt the definition from [6] that it is a human-induced disturbance in a forested landscape that results in carbon emissions but not a change in land cover. Forest degradation is usually a gradual process, which results in long-term damage while the forest remains in principle a forest. Forest degradation is often associated with anthropogenic activities, such as selective felling and cyclical use of forest, including shifting cultivation [7]. Cattle grazing causes forest degradation highly labor intensive. Hence, the question arises of whether other variables, data on which could be acquired with less effort, could be used as proxies. Canopy cover, for example, has been assessed using medium-to high-resolution remote sensing platforms, such as Landsat [23,24], Sentinel-1 [25], and Sentinel-2 [26]. High spatial resolution canopy height models could be derived using images, such as GeoEye and Worldview-3 combined with a digital elevation model [27], and density of trees could potentially be obtained from photos taken from drones [28,29]. If such variables correlate strongly with AGB, then they could substitute for physical measurements of tree diameters in the calculation of emissions.

Study Focus and Area: Degradation in Forests of the Watershed of the Ayuquila River
The study area is located in the western Pacific area of Mexico, in the watershed of the Ayuquila River ( Figure 1). Its topography ranges from 260 to 2500 m above mean seal level; the average annual precipitation is 800-1200 mm and occurs mostly between June and October; the average monthly temperature ranges from 18 to 22 • C [30]. Tropical dry forest (TDF) is distributed in the lower topographic areas. It is composed of deciduous and semi-deciduous woodlands with low biomass density, canopy height, and cover, and has low AGB compared to other areas of TDFs in the Neotropics [31]. TDFs in Mexico are widespread and they suffer higher rates of forest loss compared to the humid tropical forests due to higher population densities [32]. TDF in Ayuquila watershed is heavily used for shifting cultivation, fuelwood extraction, cattle grazing, extraction of poles for constructing fences, and for mushrooms and medicinal plants [33]. As in the rest of Mexico, most of the forests in the Ayuquila watershed are under the authority of ejidos, which are communally managed rural agrarian settlements [33]. The Mexican government created ejido in the 1930s as a type of community landholding for small-scale subsistence-based agriculture [34,35]. Across Mexico, about 52% of all land and 55% of all forestlands are in the hands of ejidos [36].

Sampling Design
For our analysis, we started with a recent land cover map (2018) to identify all patches in the study area that were covered by tropical dry forest at the time of our study. We then used Google Earth Pro, comparing images from 2016-2018 to identify patches of tropical dry forest where there had been visible changes in the texture between these dates. We randomly selected 30 points in areas where change had evidently occurred (i.e., where forest had clearly been degraded) and 30 in areas that had not changed There are both permanent and shifting cultivation agriculture systems in the Ayuquila area. Permanent agriculture, which is under rain-fed and irrigated farming systems, is distributed over the lower elevations and flat areas, which have been completely cleared of TDF in the past 20-30 years [37], while areas of shifting cultivation are distributed within existing TDF in a mosaic landscape, which changes from year to year, as cultivated areas are left fallow and secondary forest regrows. Shifting cultivation is the main driver of forest degradation in the TDF in this area, although cattle grazing, fuelwood extraction, and extraction of poles for fences are contributory causes [38].

Sampling Design
For our analysis, we started with a recent land cover map (2018) to identify all patches in the study area that were covered by tropical dry forest at the time of our study. We then used Google Earth Pro, comparing images from 2016-2018 to identify patches of tropical dry forest where there had been visible changes in the texture between these dates. We randomly selected 30 points in areas where change had evidently occurred (i.e., where forest had clearly been degraded) and 30 in areas that had not changed (representing conserved forest). We then considered the accessibility of the sampling points following the criteria that the sampling points should not be further than 3000 m from principal roads and should have slopes of less than 30 degrees (for practical reasons). We also checked these 60 sample locations with the Junta Intermunicipal del Río Ayuquila (JIRA), a local environmental non-governmental organization (NGO), regarding security. As a result of these two filters, we excluded 5 of the plots identified in conserved forests and 12 of the plots in degraded forests, leaving 43 potential sampling sites, of which 25 were in conserved forest and 18 in degraded. In order to balance the sampling design, we eliminated at random 7 plots from the conserved forest group.

Forest Survey Data Collection
The field survey was carried out between May and June 2019, which is at the end of the dry season and the beginning of the rainy season. The field data were collected in 36 plots of 500 m 2 each. As explained above, of the total 36 plots, 18 were in conserved and 18 in degraded forest.
In each plot, data were collected on diameter at breast height (DBH), tree height, number of branches, canopy cover, and number of trees per plot, from which basal area, AGB, and density of trees per hectare were calculated. All trees with DBH greater than 2.5 cm were measured in a radius of 3 m from the center of the plot, and all individuals with DBH greater than 5 cm were measured up to a radius of 12.6 m ( Figure 1). Canopy cover was measured as the percentage of a spherical densometer grid covered by forest canopy. In addition, the presence of anthropogenic activities, such as cattle and cattle feces, logged tree trunks, leaves, and seedlings, was noted, as this supported identification of plots as being "degraded".

AGB and Basal Area
AGB data was calculated using the allometric Equation (1) developed by Martínez-Yrizar et al. [39]: This equation is adjusted to tropical dry forests, where AGB is in kg/m 2 , A is the regression constant-here, we adopt the value-0.5352-and BA is the basal area in cm 2 .
Basal area was calculated from the following formula (Equation (2)) using the diameter at breast height (DBH):

Statistical Methods
The study assumes that variables that are found from ground level data to be statistically different between degraded and conserved forest can potentially be modelled using spectral information. The values of the structural variables in forests of conserved and degraded status were first analyzed by computing statistics, such as the maximum, minimum, mean, and standard deviation. Since the distributions of the forest structure variables were found to violate the assumption of normality [40], we applied the Wilcoxon signed-rank test, the non-parametric version of a paired samples t-test, to check which variables were significantly different between degraded and conserved forest. The null hypothesis was that the mean value of the structural variables would be the same in degraded and conserved forests.
Afterwards, we tested which structural variables discriminate well between the two types of forest. We fitted six binomial logistic regressions in which each model included a single structural variable as the independent variable and the forest condition (conserved vs. degraded) as the dependent variable. Using these models, the threshold value was determined to separate degraded and conserved forest with a 50% probability for each structural variable. We predicted forest types according to these logistic regressions and computed the overall accuracy for each model as the proportion of correctly classified plots over the total number of plots. The complete set of plots was used to fit the model as well as to evaluate the accuracy.
Finally, we explored the correlation between the structural variables, first with forest types combined and then separately. Special emphasis was placed on identifying attribute variables that have a significant correlation with AGB. If these variables can be quantified with remote sensing sensors, they could be used to assess AGB and estimate emissions from forest degradation.
All the statistical data analysis including Wilcoxon signed-rank test, logistic regression, classification, and Kendall's correlation, was carried out in R [41]. The data and the scripts can be accessed from https://github.com/JonathanVSV/Conserved-vs-Degraded-Forest.

How Well Can the Structural Variables Differentiate These Two States of Forest?
We first evaluated whether the values of forest structural variables differ between the two types of forest (i.e., conserved and degraded). The structural parameters measured are summarized in Table 1. Looking at the data in Table 1, it appears that the variable that is most commonly used in remote sensing to distinguish degraded from conserved forest-canopy cover-is not necessarily a good indicator, since although the mean value is undoubtedly lower in degraded forests, there is considerable overlap at the top end, and the same is true for tree height. We note that because of the way TDF regrows (particularly after shifting cultivation), forests classified as degraded tend to have more individual trees with more branches than conserved forests, despite the fact that they have much lower basal area and AGB. In other words, most of the degraded forests are in fact recovering, and have larger numbers of thinner more branching trees than conserved forest, presumably because light conditions following clearance support the production of large numbers of saplings. Since none of the forest structural variables followed a normal distribution, we applied the non-parametric Wilcoxon signed-rank test to determine if the differences in their mean values could be considered significant. We found statistically significant differences between the mean values of four of the six variables: canopy cover, basal area, height, and biomass. As for density of trees and branches, they are not significantly different between degraded and conserved forests ( Table 2).  Figure 2 identifies the structural variables that significantly differ between the two forest conditions. As can be seen, conserved forests show statistically higher values for canopy cover, tree height, and basal area. Since AGB is directly calculated from basal area, this variable obviously also differs between conserved and degraded forest. Importantly, we also note that although there can be overlap between the maximum canopy cover (100%) of conserved and degraded forest (Table 1), in practice only a limited number of plots (2 in 18 plots of conserved forests, and 5 in 18 plots of degraded forests) present this overlap; the mean values are statistically different and the standard deviations sufficiently small, in theory, for these two classes to be distinguished. As expected, the logistic regression models showed a significant coefficient (β) for canopy cover (z = −2.41, p = 0.02), basal area (z = −2.60, p = 0.01), mean height (z = −2.75, p = 0.01), and AGB (z = −2.60, p = 0.01) as explanatory variables to classify the plots into degraded and conserved forests, while the models that used density of branches (z = 0.93, p = 0.35) and density of trees did not (z = −0.18, p = 0.86). Using these models, we found the threshold values to separate the degraded and conserved forests for the four variables as the following: canopy cover: 90.9%; basal area: 9.45 m 2 /ha; height: 5.30 m; and AGB: 27.5 Mg/ha.
The overall accuracies obtained using these logistic regression models were canopy cover 80.56%, basal area 72.22%, mean height 80.56, and AGB 72.22%. Basal area and AGB are not independent variables, and thus, it is not surprising that they show the same accuracy when used to classify the plots into degraded and conserved forest. It became evident that the two statuses of forests showed an overlap in each of the quantified structural variables (Figure 2). Canopy cover and mean height appear to be the most promising structural variables for classifying forests into degraded and conserved conditions, partially because of the very low variance on these variables in conserved forests. The overall accuracies obtained using these logistic regression models were canopy cover 80.56%, basal area 72.22%, mean height 80.56, and AGB 72.22%. Basal area and AGB are not independent variables, and thus, it is not surprising that they show the same accuracy when used to classify the plots into degraded and conserved forest. It became evident that the two statuses of forests showed an overlap in each of the quantified structural variables (Figure 2). Canopy cover and mean height appear to be the most promising structural variables for classifying forests into degraded and conserved conditions, partially because of the very low variance on these variables in conserved forests.

Correlation between Variables of Forest Attributes
Inter-correlations at plot level were calculated for all the forest structural variables except basal area, since basal area and biomass have a 100% correlation. The primary purpose of this analysis was to see to what extent the variables that can be measured using remote sensing methods-particularly canopy cover, density of trees, and tree height-might be used to predict variables, which, if measured over time, would reflect changes in AGB, and which could then be used to estimate emissions due to degradation. The Kendall tau correlation was applied, and the results are shown in Figure 3.

Correlation between Variables of Forest Attributes
Inter-correlations at plot level were calculated for all the forest structural variables except basal area, since basal area and biomass have a 100% correlation. The primary purpose of this analysis was to see to what extent the variables that can be measured using remote sensing methods-particularly canopy cover, density of trees, and tree height-might be used to predict variables, which, if measured over time, would reflect changes in AGB, and which could then be used to estimate emissions due to degradation. The Kendall tau correlation was applied, and the results are shown in Figure 3.
When all plots are taken together (Figure 3a), we see that AGB is correlated with canopy cover, tree height, and density of trees, at a high level of statistical significance (τ = 0.39, τ = 0.39, and τ = 0.48, respectively, at p < 0.001). AGB is also correlated with density of branches, although with a lower significance coefficient (τ = 0.30, p < 0.01). Canopy cover is correlated with density of trees and tree height, although there is no correlation between canopy cover and density of branches. Density of trees and density of branches are also highly correlated (τ = 0.58, p < 0.001). For the case of conserved forest (Figure 3b), AGB is only correlated with density of trees and density of branches (both at τ = 0.34, p < 0.05). However, in degraded forest (Figure 3c) there is a very good correlation between AGB and density of trees (τ = 0.7, p < 0.001). There is also good correlation between AGB and density of branches (τ = 0.54, p < 0.05), canopy cover (τ = 0.4, p < 0.05), and tree height (τ = 0.44, p < 0.01). In addition, there is good correlation between density of trees and canopy cover (τ = 0.47, p < 0.01) and between density of trees and tree height (τ = 0.39, p < 0.05). Density of branches and tree height (τ = 0.34, p < 0.05), density of branches and canopy cover (τ = 0.38, p < 0.05), and density of trees and density of branches (τ = 0.69, p < 0.001) are all significantly correlated in degraded forest. However, canopy cover and tree height is not correlated.  When all plots are taken together (Figure 3a), we see that AGB is correlated with canopy cover, tree height, and density of trees, at a high level of statistical significance (τ = 0.39, τ = 0.39, and τ = 0.48, respectively, at p < 0.001). AGB is also correlated with density of branches, although with a lower significance coefficient (τ = 0.30, p < 0.01). Canopy cover is correlated with density of trees and tree height, although there is no correlation between canopy cover and density of branches. Density of trees and density of branches are also highly correlated (τ = 0.58, p < 0.001). For the case of conserved forest (Figure 3b), AGB is only correlated with density of trees and density of branches (both at τ = 0.34, p < 0.05). However, in degraded forest (Figure 3c) there is a very good correlation between AGB and density of trees (τ = 0.7, p < 0.001). There is also good correlation between AGB and density of branches (τ = 0.54, p < 0.05), canopy cover (τ = 0.4, p < 0.05), and tree height (τ = 0.44, p < 0.01). In addition, there is good correlation between density of trees and canopy cover (τ = 0.47, p < 0.01) and between density of trees and tree height (τ = 0.39, p < 0.05). Density of branches and tree height (τ = 0.34, p < 0.05), density of branches and canopy cover (τ = 0.38, p < 0.05), and density of trees and density of branches (τ = 0.69, p < 0.001) are all significantly correlated in degraded forest. However, canopy cover and tree height is not correlated.

Discussion
In the context of programs such as REDD+, it is loss and gain of biomass that is the primary variable of concern, since it is from this that the flux of carbon can be calculated. The analysis shows

Discussion
In the context of programs such as REDD+, it is loss and gain of biomass that is the primary variable of concern, since it is from this that the flux of carbon can be calculated. The analysis shows that several forest attributes that are regularly recorded in forest surveys could be used to distinguish degraded and conserved forests at ground level. These variables include mean canopy cover and mean tree height, basal areas, and AGB. The analysis also shows that the density of trees and density of branches do not significantly differ between these two states of forest. We observed that the variance of all variables is much higher in the degraded plots than in the conserved plots, indicating greater variability in the degraded forest. This is entirely to be expected, since these forests are in a state of constant change following human disturbances. This finding strongly underlines the importance of not using a standard default value for carbon stocks in degraded forest and the need for much more careful assessment of changes in the level of carbon stocks in degraded forest over time.
Applying the threshold obtained by the logistic model to classify the plots into degraded and conserved states, canopy cover and mean height are the variables that enable the highest accuracy (-both at 80.56%). This makes canopy cover and mean height the most suitable indicators of forest status. Although mean canopy cover in theory discriminates reasonably well between conserved and degraded forest, this variable has a wide standard deviation in degraded forest, with one plot of degraded forest having 100% canopy cover, which means that it would be misclassified if only this criterion were used, although the mean value of canopy cover of all plots of degraded forest was much lower, and the standard deviations do not overlap (Figure 2). In the case of mean height, the two types of forest can also be discriminated, with a small fraction of plots wrongly classified using the threshold value. However, it is expected that if two or more variables were to be combined, the error in the discrimination of conserved and degraded forest would be reduced, enabling a much more definitive separation between conserved and degraded forest. Nevertheless, these results also suggest that, based only on structural variables from field measurements, these two types of forest can only be distinguished up to a certain degree.
In terms of the correlations between structural variables and AGB, we find that when both types of forest are analyzed together, density of trees has the highest correlation with AGB (τ = 0.48, p < 0.001), followed by canopy cover and canopy height (both at τ = 0.31, p < 0.01). In plots of only conserved forest, AGB is only correlated with density of trees and density of branches. In plots of only degraded forest, both density of trees (τ = 0.70, p < 0.001) and canopy cover (τ = 0.40, p < 0.05) appear to be correlated with AGB. A possible limitation of our study, however, was that we made a distinction between "conserved" and "degraded" forests. A more nuanced classification of forest into several levels of degradation (i.e., highly degraded, moderately degraded, and lightly degraded) might have produced better results in terms of variables to predict AGB.
Both canopy cover and canopy (tree) height are correlated with AGB both for forest in general (both forest types analyzed together), and for degraded forest only, while for conserved forest, these two attribute variables do not show significant correlation with AGB. Canopy cover is the most easily and commonly measured variable when using optical remote sensing of medium to high spatial resolution satellite imagery [42,43], also when using Light Detection and Ranging (LiDAR) data [26,44], although this last can also be used to estimate canopy height [45][46][47][48][49][50]. Density of trees is also a reasonably good predictor of AGB in both conserved and degraded forest (with more, though thinner, trees in degraded forest), but density of trees cannot be detected with low or medium spatial resolution optical imagery. Such data could, however, be obtained via aerial photographs or very high spatial resolution imagery or LiDAR borne on drones [51][52][53]. Although most of the published work has been for temperate forest in northern Europe, Canada, or USA, few experiments have been reported for tropical forest, probably due to the complicity of the composition and seasonality. Possibly an index that combines tree height and density of trees measured from a drone or airborne LiDAR could be developed as a predictor of AGB in degraded TDF. It would certainly be worth exploring the potential of such an index for assessing changing carbon stock in degraded forest where significant changes are expected.
LiDAR estimates of AGB have been shown to be highly correlated with field-measured AGB data. For example, Hernandez-Stefanoni et al. [32] show a high association between AGB and LiDAR data with an R 2 = 0.87 using a linear regression analysis. An advantage of LiDAR data is that they do not get saturated in areas of high biomass; however, the high cost of acquisition could be a limitation [6]. It is worth noting that the Global Ecosystem Dimension Investigacion (GEDI) LiDAR mission (https://gedi.umd.edu/mission/mission-overview/), launched in December 2018, makes accurate measurements of canopy height, vertical canopy structure, and surface elevation and shows promise in improving measurements of AGB and forest carbon [54]. Synthetic aperture radar (SAR) data have also shown potential in biomass estimation. In particular, more accurate results have been found with longer wavelengths, such as L-band (23 cm wavelength) SAR [55]. L-band SAR has been used to estimate biomass in forest areas with low biomass [56,57]. In addition, the BIOMASS sensor from the European Satellite Agency (ESA), to be launched in 2022, will collect data at the P-band (70 cm wavelength) wavelength. It has potential to see through leafy treetops to build up maps of tree height and volume, and therefore it is expected to provide precise information on global forest biomass and carbon content and fill the gap of forest biomass density monitoring [58]. Since the data from L-band and P-band SAR become saturated in forest with a medium to high biomass level at approximately 60-100 and 100-150 Mg ha −1 , respectively, they are not suitable for mapping forest biomass in all conditions [55,59], although they should function well at the densities we found in degraded forests in Ayuquila River watershed.. The application of texture data derived from L-band SAR has been reported to be able to reduce the saturation at high biomass values, since texture data capture variation in horizontal forest structure attributes, such as tree height and crown diameter [32]. The best result could probably be obtained with a combination of optical, LiDAR, and SAR data [60].

Conclusions
This paper presented findings of a statistical study of conserved and degraded forests based on a recent forest survey in TDF. The forest attributes that are significantly different between conserved and degraded forests include canopy cover, basal area, tree height, and AGB. Although canopy cover presents large variance in degraded forests, there is large difference in mean AGB between degraded and conserved forest. Among the four variables, canopy cover and canopy (tree) height separate degraded and conserved forests with the least error, which makes these two variables the most suitable to discriminate these two forest conditions. In terms of variables that could be used to estimate biomass stocks in degraded forests, density of trees, canopy cover, and canopy (tree) height all show a good correlation with AGB. It is clear that for accurate estimation of emissions from degraded forest, very high spatial resolution satellite images, including drone-based imaging LiDAR and radar data, are essential. An important next step would be to calibrate canopy cover, canopy height, or density of trees against degradation using a sliding scale of intensity of degradation as observed at ground level. This would likely enable a more reliable estimation of area of degradation and quantification of AGB using remote sensing.