## 1. Introduction

Agroforestry denotes land use systems where woody perennials (trees, shrubs, palms,

etc.) are cultivated on the same land units as agricultural crops and/or animals [

1,

2]. In this study, we use agroforestry as a general term to refer to the land use system of cultivating woody perennials, either polyculture or monoculture (

i.e., plantation) on agricultural land, regardless the current existence of crops or animals. Over 10 million km

^{2} of agricultural lands have greater than 10% tree cover [

3]. Through the provision of diversified products, agroforestry has been advocated and practiced by many countries to offer a wide range of economic, social, and ecological benefits: (1) increasing the per capita farm income by planning high-value tree products [

4]; (2) improving soil fertility and land productivity [

5]; (3) increasing household resilience [

4]; (4) mitigating the impacts of climate variability and change [

6,

7]; (5) conserving biodiversity [

8,

9,

10]; and (6) improving air and water quality [

11,

12,

13].

Agroforestry offers high potential for carbon (C) sequestration [

14] not only because the carbon density of agroforestry is usually higher than annual crops or pasture [

15,

16,

17] but also because the trees produce fuelwood and timbers that otherwise would be harvested from natural forests [

1]. The fine and coarse wood debris of plants are also stored in the soil C pool for long periods [

18]. Thus, the role of agroforestry for C sequestration is increasingly recognized by the IPCC (Intergovernmental Panel on Climate Change) [

19]. However, for agroforestry to be successful as a strategy for C sequestration, it is necessary to have sound detection and monitoring systems [

20].

Remote sensing provides an effective way to estimate and monitor the biomass and C stock of different vegetation types [

21] including agroforestry systems [

22,

23,

24,

25]. In particular, airborne lidar (Light Detection and Ranging) has emerged in the 21 century as the most accurate technology to quantify forest aboveground biomass (AGB) at the landscape level [

21,

26]. Lidar is especially powerful over forests of high biomass where passive optical imaging or radar sensors have saturation problems. One of the main advantages of lidar is that it can penetrate through the small canopy gaps for detecting vertical structure and extracting ground elevation. The height information derived from lidar is strongly related to biomass for most tree species. A large number of studies have reported the use of airborne lidar for mapping AGB in boreal (e.g., [

27,

28,

29]), temperate (e.g., [

30,

31]), and tropical (e.g., [

32,

33,

34,

35,

36]) forests.

Compared to the large body of literature of lidar remote sensing of carbon and biomass for forests, the use of this frontier technology for quantifying agroforestry AGB is very limited. One of the best examples of agroforestry of relative economic success of colonization in the Brazilian Amazon occurred in the Tomé-Açu [

37]. Focusing on an agroforestry system in Tomé-Açu, this study aims to (1) predict plot-level agroforestry AGB using regular fixed-effects regression models that treat the regression coefficients as constants; (2) identify the causes of AGB prediction errors when such models are used; (3) propose new ways to stratify vegetation types and apply mixed-effects models to reduce the AGB prediction errors; and (4) discuss the challenges and future directions for modeling and mapping agroforestry AGB.

## 2. Study Area

This study was conducted in the Quatro Bocas district in the municipality of Tomé-Açu (2°28′S and 48°20′W), located in the northeastern region of Pará state in Brazil (

Figure 1). The study area is 2 km by 5 km. Topography in the region is characterized by low flat plateaus, terraces, and lowlands with altitudes varying from 14 to 96 m. The soils are classified as Ferralsols, Plintosols, and Fluvisols. The average annual rainfall is 2300 mm [

38]. The Tomé-Açu region has a humid mesothermal climate—Ami according to the Köppen classification—with high average annual temperatures (26 °C) and relative air humidity rate of about 85%. The original vegetation is lowland dense ombrophilous forest, which has been intensely altered. The landscape mosaic is dominated by pasture, agricultural fields, and secondary forests. Forest remnants are observed especially at the margins of streams.

Tomé-Açu started its agricultural development in the 1920’s, with the beginning of the Japanese immigration to the region. The immigrants implanted horticulture and, later, black pepper (

Piper nigrum L.). They were provided with lands by the Brazilian government, which made technological development possible and turned Pará into the greatest black pepper producer in Brazil. With the decay of the black-pepper cycle from the 1970’s on, caused by fusarium blight, the farmers looked for new production alternatives. According to Homma [

39], the way out of this ecological crisis for the immigrants was to diversify their activities, with emphasis in fruit crops, especially papaya, melon, acerola, orange, dende (oil palm), cupuacu, passion fruit, and other native and exotic fruit trees and vines that initiated a new economic cycle for the region. The current agroforestry systems (1 to 34 years old) have a great variety of fruit and timber tree species. Homma [

40] pointed out that the success of the region’s agricultural development resulted from the Japanese-Brazilian farmers’ innovative thinking, their holistic view of future markets, and their social-minded spirit, which made possible the creation of the Cooperativa Agrícola Mista de Tomé-Açu (Camta) in 1931, whose intention was to sell vegetables and nowadays commercializes the agroforestry products (fruit, pulp, juice, and oil) in various countries.

**Figure 1.**
The study area—Tomé-Açu at Pará state in Brazil (Data source: Lidar data were acquired in 2013 and 13 plots were measured in 2014) (Note: colors from red to blue represent the elevation of the laser points).

**Figure 1.**
The study area—Tomé-Açu at Pará state in Brazil (Data source: Lidar data were acquired in 2013 and 13 plots were measured in 2014) (Note: colors from red to blue represent the elevation of the laser points).

## 4. Results

With the multiplicative power model and our feature selection procedure, the final plot-level AGB prediction model is a simple power model based on the 90th percentile height:

The model

R^{2} is 0.47 and the RMSE is 60.4 Mg/ha (see

Figure 4a). Note that the mean AGB of all plots is 105.2 Mg/ha, which means that the coefficient of variation (CV) or relative prediction error is almost 60%. In particular, we found that plots 10 and 11 have relatively large residuals. Plot 11 is in an American oil palm plantation. The large residual of plot 11 is mainly caused by the unique DBH-H relationship of American oil palm trees, which implies that the AGB of American oil palm needs to be modeled separately from other vegetation types (see

Section 5.1). However, we had only one plot of American oil palm, which eliminates the possibility of independently assessing the model prediction errors. Thus, plot 11 was excluded in our analysis, and the corresponding AGB model (see

Figure 4b) developed after feature selection was:

where

H_{80th} is the 80th percentile height of lidar points.

**Figure 4.**
Models for predicting plot-level AGB with the American oil palm plot included (**a**) and excluded (**b**); and the relationship between residual and wood density at the plot level (**c**).

**Figure 4.**
Models for predicting plot-level AGB with the American oil palm plot included (**a**) and excluded (**b**); and the relationship between residual and wood density at the plot level (**c**).

Note that when the allometric model for non-palm trees (Equation (2)) was used to estimate AGB, wood density is an important variable. As shown in

Table 2, the teak plantation plots (plots 9 and 10) have the highest plot-level wood density. Plot 14 also has high average wood density because some of the species within it, such as ameixa and goiaba have high wood density (see

Table 1 and

Table 2). The large plot-level wood density of these three plots can explain why they all have positive residuals in AGB prediction models (

Figure 4b).

The patterns revealed in the residual errors of plot-level AGB model prediction (

i.e., plots with large wood density led to large positive residuals) (

Figure 4c) suggest that the residuals are not statistically independent,

i.e., an obvious violation of the basic assumption of the ordinary least squares (OLS) regression models. A statistical approach that can naturally consider the correlation within individual observations is mixed-effects models. The idea of mixed-effects models is to stratify the observations (

i.e., plots in this study) into different groups and allow the model parameters vary as realizations of random variables across the groups. By doing so, the residuals of model prediction within individual groups will become more statistically independent. Mixed-effects models provide a trade-off between fitting all data points with one model and fitting models for each group independently. Hence it is well suited for handling data with few observations within groups [

50].

The use of mixed-effects models requires the grouping of individual plots. The relatively large wood density of plots 9, 10, 14, and their positive residuals in the fixed-effects model imply that these three plots can be put into one group. The teak plantations (plots 9 and 10) not only have high wood density but also have the potential of being mapped over large area from remotely sensed data. Thus, we also tried to put only plots 9 and 10 into one group. As a result, we consider two different grouping scenarios as shown in

Table 3. As shown in Equation (8), the developed fixed-effects model was a simple power model (in the form of y = a*x

^{b}, where

a and

b are model parameters) with the 80th percentile height of lidar points as the predictor. In all of our mixed-effects modeling experiments, we found that only the parameter

$b$ had statistically significant random effects. Therefore, in the mixed-effects models, only the parameter

$b$ varies with vegetation groups. For grouping scheme A, the model parameters

a = 8.63,

b_{1} = 1.15, and

b_{2} = 0.98, where

b_{1} and

b_{2} are the parameters for groups 1 and 2, respectively. For grouping scheme B, the model parameters

a = 7.23,

b_{1} = 1.22, and

b_{2} = 1.03. Note that here the mixed-effects model for a given grouping scheme is equivalent to two simple power models for the two vegetation groups within the scheme (

Figure 5).

**Table 3.**
Different schemes of grouping plots for developing mixed-effects models.

**Table 3.**
Different schemes of grouping plots for developing mixed-effects models.
Scheme | Group#1 | Group#2 |
---|

A | Teak plantation (plot 9, 10) | Non-teak (other plots) |

B | High wood density plots (plot 9, 10, 14) | Other (other plots) |

When we used the same plots for calibration and validation (

i.e., re-substitution), the

R^{2} was much higher and RMSE was much smaller when we used mixed-effects models in comparison to fixed-effects models (

Table 4). The increase of goodness-of-fit in these statistics was expected because mixed-effects models are more complex (

i.e., more parameters) than fixed-effects models. AIC is a statistic that penalizes the increase of model fitness due to the use of more complex models [

51]. When we fitted mixed-effects models based on two groups, the AIC values decreased slightly for scheme A and by about four for scheme B (note that lower AIC values are preferred for modeling).

Figure 5 shows the lidar-based AGB models developed from the field plots as well as their AGB prediction in comparison to the field-based estimates.

We also used leave-one-out cross-validation for calculating

R^{2} and RMSE to further assess the models’ prediction performance (

Table 4). We found that the fixed-effects model had

R^{2} as low as 0.38, which was increased to 0.64 and 0.75 when mixed-effects models with scheme A and scheme B were used. This confirms the importance of stratification and mixed-effects modeling for predicting AGB.

**Table 4.**
Comparison of fixed-effects and mixed-effects models for plot-level AGB estimation.

**Table 4.**
Comparison of fixed-effects and mixed-effects models for plot-level AGB estimation.
Model Type | Re-Substitution | Cross-Validation |
---|

R^{2} | RMSE (Mg/ha) | AIC | R^{2} | RMSE (Mg/ha) |
---|

Fixed-effects model | | | | | |

| 0.74 | 40.4 | 122.6 | 0.38 | 56.4 |

Mixed-effects model | | | | | |

Scheme A | 0.91 | 25.9 | 122.0 | 0.64 | 42.9 |

Scheme B | 0.94 | 21.6 | 118.7 | 0.75 | 35.9 |

**Figure 5.**
Comparison of fixed-effects and mixed-effects models for estimating plot AGB. The solid lines in the first-row figures are model curves. The dashed lines in the second-row figures are 1:1 line.

**Figure 5.**
Comparison of fixed-effects and mixed-effects models for estimating plot AGB. The solid lines in the first-row figures are model curves. The dashed lines in the second-row figures are 1:1 line.

Figure 6 shows vegetation distribution 6a and the estimated AGB 6b and 6c over the whole agroforestry area using mixed- and fixed-effects models, respectively. The mean AGB densities estimated from fixed- and mixed-effects models were 57.0 Mg/ha and 51.7 Mg/ha, respectively. So, the average AGB density estimate from fixed-effects models was about 10% higher than the one from mixed-effects model. Moreover, by checking the spatial pattern of their difference at the pixel level (

Figure 6d), we can see that the teak plantation AGB was underestimated by up to ~70 Mg/ha (or 33%) while the other agroforestry AGB were overestimated by up to ~60 Mg/ha (or 25%) using fixed-effects models in comparison to mixed-effects models.

**Figure 6.**
Maps of vegetation type (**a**); and AGB predicted with mixed-effects model (**b**); and fixed-effects model (**c**); and the difference between AGB predicted with fixed- and mixed-effects models (**d**). Black color indicates the area masked for analysis.

**Figure 6.**
Maps of vegetation type (**a**); and AGB predicted with mixed-effects model (**b**); and fixed-effects model (**c**); and the difference between AGB predicted with fixed- and mixed-effects models (**d**). Black color indicates the area masked for analysis.