High-Precision Stand Age Data Facilitate the Estimation of Rubber Plantation Biomass: A Case Study of Hainan Island, China

Rubber (Hevea brasiliensis Muell.) plantations constitute one of the most important agro-ecosystems in the tropical region of China and Southeast Asia, playing an important role in the carbon budget there. Accurately obtaining their biomass over a large area is challenging because of difficulties in acquiring the Diameter at Breast Height (DBH) through remote sensing and the problem of biomass saturation. The stand age, which is closely related to the forest biomass, was proposed for biomass estimation in this study. A stand age map at an annual scale for Hainan Island, which is the second largest natural rubber production base in China, was generated using all Landsat and Sentinel-2 (LS2) data (1987–2017). Scatter plots and the correlation coefficient method were used to explore the relationship (e.g., biomass saturation) between rubber biomass and different LS2-based variables. Subsequently, a regression model fitted with the stand age (R2 = 0.96) and a Random Forest (RF) model parameterizing with LS2-based variables and/or the stand age were respectively employed to estimate rubber biomass for Hainan Island. The results show that rubber biomass was saturated around 65 Mg/ha with all LS2-based variables. The regression model estimated biomass accurately (R2 = 0.79 and Root Mean Square Error (RMSE) = 14.00 Mg/ha) and eliminated the saturation problem significantly. In addition to LS2-based variables, adding a stand age parameter to the RF models was found to significantly improve the prediction accuracy (R2 = 0.82–0.96 and RMSE = 4.08–10.59 Mg/ha, modeling using samples of different biomass sizes). However, all RF models overestimated the biomass of young plantations and underestimated the biomass of old plantations. A hybrid model integrating the optimal results of RF and regression models reduced estimation bias and generated the best performance (R2 = 0.83 and RMSE = 12.48 Mg/ha). The total rubber biomass of Hainan Island in 2017 was about 5.40 × 107 Mg. The northward and westward expansions after 2000 had great impact on the biomass distribution, leading to a higher biomass density for the inland coastal strip from south to northeast and a lower biomass density in the northern and western regions. Remote Sens. 2020, 12, 3853; doi:10.3390/rs12233853 www.mdpi.com/journal/remotesensing Remote Sens. 2020, 12, 3853 2 of 20


Introduction
The rubber tree (Hevea brasiliensis Muell.), which is widely planted in tropical regions such as Southeast Asia and the tropical region of China, is of great importance in producing natural rubber and in providing timber when the rubber latex harvest is finished [1,2]. The establishment of new rubber plantations-usually converted from old rubber plantations, tropical forest, and cropland [3]-and timber harvesting in old rubber plantations will greatly alter the carbon budget [4]. Therefore, accurately estimating biomass, which is an important indicator of the carbon storage capacity and the potential carbon pool size of a forest ecosystem [5,6], is a widespread concern of ecologists, plantation managers, and policy makers related to the rubber industry.
The biomass of rubber plantations at a plot level is usually estimated by allometric equation (AE) models, linked with field measurements of the Diameter at Breast Height (DBH) and Height (H), or a combination of other factors, such as the stand age and wood density [7][8][9][10][11]. In China, four rubber biomass allometric models have been developed thus far [8,10,12,13]. All these models use DBH or the combination of DBH and H as independent variables, yielding R 2 values greater than 0.97. Three of these models were developed in Xishuangbanna prefecture, Yunnan Province [8,10,12], and one was developed in Leizhou Peninsula (close to Hainan island), Guangdong Province [13]. Jia et al.'s [12] model only considers the aboveground biomass but covers biomass variation at low (550-600 m), medium (750-800 m), and high (950-1050 m) elevations above the mean sea level (AMSL), while the other three models consider the belowground biomass [8,10,13]. These AE models are very valuable because the purpose of planting rubber trees is to harvest latex for a long period of time (usually 25-30 years) instead of wood. The cost of felling mature rubber trees in a large area to establish AE models is very high.
There are several ways to extend biomass at the tree/plot level to a landscape/regional scale. One of the commonly used methods is to convert tree volume data from forest inventories into biomass using biomass conversion and expansion factors (BCEFs) [14][15][16]. A generalized curve of BCEFs for different species or species groups by stand age and site index has been developed in most countries, such as Poland and China, which signed up for the United Nations Framework Convention on Climate Change (UNFCCC) and its supplementary Kyoto Protocol [14,17]. Another approach is to multiply the average or time-averaged biomass obtained at a plot scale by the area in a specific region [11,18]. For example, Petsri et al. [9] calculated the biomass per hectare through a meta-analysis of DBH and H against stand age and then estimated the national rubber biomass in Thailand with annual statistics for area data and a fixed rotation length of 25 years. The accuracy obtained in this way was much better than the result obtained from multiplying the average/time averaged biomass by the total area. However, large uncertainties may exist because the actual age structure is unknown and the rotation length is not fixed at 25 years. In addition, the use of remote sensing to estimate forest biomass over a large area is becoming increasingly popular. Many studies has been conducted to estimate different forest (e.g., broadleaf and coniferous) biomass through optical satellite imagery [19][20][21], the Synthetic Aperture Radar (SAR) [6,22,23], Light Detection and Ranging (LiDAR) [24,25], or their combination [23,[26][27][28][29]. More recently, the European Space Agency planned to launch a fully polarimetric P-band SAR satellite-BIOMASS-for which the main tasks are mapping the global aboveground biomass, forest height, and severe forest disturbances [30]. Generally, LiDAR is the top-ranked approach for forest biomass estimation as it can remove data saturation, followed by longer wavelength radar and optical remote sensing [31]. However, the data availability is the opposite; LiDAR data are the scarcest due to limited satellites carrying LiDAR instruments, and optical remote sensing data are the most abundant. Many studies have used remote sensing to estimate the biomass of rubber plantations. The relationship between rubber biomass, spectral bands, and vegetation indices derived from Landsat Thematic Mapper (TM) was explored in Xishuangbanna Prefecture, Yunnan Province, China, and different biomass models using stepwise regression [32,33] and Random Forest (RF) [34] were developed (R 2 = 0.43-0.67 and Root Mean Square Error (RMSE) = 30.27-46.05 Mg/ha). The total rubber biomass in Xishuangbanna was about 1.97 × 10 7 Mg, and the average biomass was 75.90 Mg/ha [32]. Yasen and Koedsin found that an artificial neural network based on WorldView-2 multispectral imagery could estimate biomass more accurately than stepwise multiple linear regression in Phuket Province, Thailand [2]. Charoenjit et al. [35] first estimated tree girth-using spectral bands, textural parameters, and vegetation indices from Thaichote multispectral imagery by a linear multiple regression model-and then used it with an AE model to estimate the rubber biomass in eastern Thailand. Avtar et al. [36] found that the Horizontal transmitting with Vertical receiving (HV) band of the Phase-Array L-band Synthetic Aperture Radar (PALSAR) positively correlated with the biomass of rubber plantations. The correlation was higher than for the Horizontal transmitting with Horizontal receiving (HH) band, but it began to saturate when the biomass was about 50 Mg/ha. Trisaongko and Paull estimated the biomass of rubber plantations with a very high accuracy (R 2 = 0.98 and RMSE = 1.88 Mg/ha), indicating that the acquisition time of PALSAR-2 and the choice of AE model significantly affect the accuracy [37]. By summarizing this literature, two main approaches for estimating the biomass of rubber plantations on a large scale can be obtained: (1) establishing direct relationships between biomass and spectral/SAR bands, vegetation indices, and textural parameters and (2) obtaining proxy factors such as DBH from remote sensing data and then using them as the parameters of AE models. In fact, all these approaches use canopy features from optical or SAR satellite imagery to estimate biomass. As the relationship between rubber biomass and canopy features gradually weakens with an increase in the stand age [38], the estimated biomass of middle-aged and old rubber plantations often exhibits large deviations.
There is also a close relationship between rubber biomass and stand age [39]. Good models (R 2 = 0.97-0.99) of the biomass/carbon stock of rubber plantations and stand age have been developed in Xishuangbanna, Yunnan Province [8,40,41] and in Thailand [9]. Compared to DBH, the rubber stand age is easier to acquire accurately from time-series satellite imagery. Thanks to the free access of Landsat data policy [42], the establishment year (same as the stand age) of rubber plantations can be accurately traced back to the end of the 1980s at an annual resolution and RMSE of about 2 years [3,43], demonstrating great potential for biomass estimation. In addition to the stand age, the rubber biomass is also closely related to the geographical environment in which it grows, such as light conditions, soil moisture, and fertility. Differences in the growth condition of rubber plantations may affect the intensity of the spectral reflectance that can be used to estimate the biomass [35]. Therefore, we assume that combining the stand age and spectral characteristics can more accurately estimate the biomass of rubber plantations. The objectives of this study are (1) to explore the relationship between rubber biomass and stand age and LS2-based variables (e.g., spectral bands and vegetation indices); (2) to develop biomass models using stand age and LS2-based variables; and (3) to generate a biomass map of rubber plantations in Hainan Island, China and to investigate its spatial distribution pattern.

Study Area
Hainan Island (19 • 20 N-20 • Figure 1) is the second largest island in China and has an area of about 3.4 million ha. The topography of the island is mountainous in the interior, with lowlands along the coast. Wuzhi Mountain (1867 m AMSL) is the highest mountain on the island [44]. The climate varies from tropical to subtropical. The annual mean temperature is approximately 23-25 • C, and the monthly temperature varies between approximately 16 • C in January and approximately 30 • C from May to July. The average annual precipitation is 1500 to 2000 mm, Remote Sens. 2020, 12,3853 4 of 20 about 80% of which occurs during the rainy season from May to October. However, the spatial distribution of precipitation can be as high as 2400 mm in the central and eastern area and as low as 900 mm in the coastal areas of the southwest [45].
Field data consist of two parts: the main field data sampled in 2016 and supplementary data derived from [38] or collected from the experimental farm of the Chinese Academy of Tropical Agricultural Sciences (CATAS) from 2007 to 2014. The main field data are new and have a uniform acquisition time, so are used to explore the relationship between rubber biomass and remote sensing variables, to develop Random Forest (RF) models, and to evaluate their accuracy. Both the main field data and supplementary data were used to fit a biomass model using the stand age as the independent variable. The supplementary data were mainly collected for increasing samples in the 1990s-a period established with limited rubber plantations, which made it difficult to acquire wealthy field data by random sampling.  Hainan Island is the second largest rubber plantation base in China. In the 1950s, a large area of rubber trees was planted to meet China's national defense and economic needs. Statistics indicate that Hainan Island had about 5.43 × 10 5 ha of rubber plantations by the end of 2017, accounting for more than a quarter of the total forest area, and is the largest plantation ecosystem there [44].

Field Inventory Data
Field data consist of two parts: the main field data sampled in 2016 and supplementary data derived from [38] or collected from the experimental farm of the Chinese Academy of Tropical Agricultural Sciences (CATAS) from 2007 to 2014. The main field data are new and have a uniform acquisition time, so are used to explore the relationship between rubber biomass and remote sensing variables, to develop Random Forest (RF) models, and to evaluate their accuracy. Both the main field data and supplementary data were used to fit a biomass model using the stand age as the independent variable. The supplementary data were mainly collected for increasing samples in the 1990s-a period Remote Sens. 2020, 12, 3853 5 of 20 established with limited rubber plantations, which made it difficult to acquire wealthy field data by random sampling.
The main field data were collected in October of 2016 by randomly selecting rubber plantations with the aid of Google high-resolution satellite images. In each plantation, the DBH of each rubber tree at a 1.3 m height in an area of 25 × 25 m was measured, and the mean and standard deviation of the DBH were calculated. The stand age was obtained by interviewing farmers or was evaluated according to the height of tapped bark-a relatively fixed height of bark is consumed each year for latex harvest-and double checked with the plantation establishment year map of 2017 (described in Section 2.3). Field photos were taken by a Casio EX-H20G camera embedded with a Global Positioning System (GPS) receiver. The GPS information of the photos was extracted using these steps: (1) converting it into a Keyhole Markup Language (KML) file format by PhotoKML (v1.3) (https://www.softpedia. com/get/Multimedia/Graphic/Graphic-thers/PhotoKML.shtml), (2) importing the KML file in ArcGIS (v10.2, www.esri.com) and converting it into a shapefile format, and (3) extracting the latitude and longitude of each plot. A total of 51 rubber plantations were collected during the field campaign. The spatial distribution of field plots is shown in Figure 1 and descriptive statistics on the stand age and DBH are presented in Table 1. Among the 18 counties where rubber trees have been cultivated, 16 of them have sampling plots. The DBH and stand age of rubber plantations included in the supplementary data were collected in a similar way, but accurate GPS information of some plantations was not clear. Actually, the stand age is the primary concern when building a stand age-based biomass model. The supplementary data consist of 19 rubber plantations with different stand ages. The total biomass (below + aboveground) for a single rubber tree was calculated by the AE model (Equation (1)) [13]: where B G_tree is the biomass (kg/tree) and G is the girth at 1.3 m (cm). This model has a good accuracy (R 2 = 0.99, systematic error ≤ 1.469% [13]). On Hainan Island, rubber trees were generally planted at a density of 3 × 7 m or 4 × 5 m in columns and rows, resulting in about 495 trees/ha at the initial stage of plantation establishment. However, the tree density decreased with the year because of typhoons, which frequently hit the island and caused huge damage in the rubber industry [46]. Tree density data of more than 80 state farms (not distinguishing the stand age), from 1952 to 2001, were collected by Hainan State Farm Bureau [47], showing an average density of 381.45 trees/ha. Equation (1) was converted into Equation (2) at a hectare scale by multiplying the average tree density and by dividing it by 1000 (converting kg to Mg): where B G is the biomass (Mg/ha). The girth and biomass of each plantation were stored in the attribute table of the shapefile. According to the size of biomass, the field data were split into eight groups (30,40,50,60,70,80,110, and 140 Mg/ha), which were further used as parameters for stratified k-fold cross-validation. Stratified k-fold cross-validation is similar to k-fold cross-validation, but it conducts stratified sampling instead of random sampling to ensure that relative class frequencies Landsat 7/8 and Sentinel-2 multispectral imagery was used in this study. Landsat 7 Enhanced Thematic Mapper Plus (ETM+) and Landsat 8 Operational Land Imager (OLI) imagery were obtained from the U.S. Geological Survey (USGS, www.usgs.gov), and the Sentinel-2 MultiSpectral Instrument (MSI) imagery was obtained from the European Space Agency (ESA, https://sentinel.esa.int). All the top-of-atmosphere (TOA) reflectance data of Landsat 7/8 collection 1 Tier 1 and Sentinel-2 Level-1C were acquired during the day of year (DOY) between 120 and 330 (April 1 to November 30) and from 2016 to 2017. Surface Reflectance (SR) data were not used because Sentinel-2 SR (Level-2A) for the study area was not available at the GEE platform when we first carried out this study, and the mix use of Sentinel-2 TOA and Landsat SR may have the problem of cross-calibration between different sensors. Landsat 7/8 data have a 30-m spatial resolution, and Sentinel-2 MSI has a 10-m spatial resolution. These TOA reflectance data had already been orthorectified with a high geometric accuracy and reduced radiometric uncertainties between different acquisition dates and different sensors, such as ETM+ and OLI. All these data were available on the GEE cloud computing platform and could be readily accessed.
The cloud and cloud shadow of Landsat ETM+ and OLI were masked using bitmask information from the associated quality assessment band, which was generated by the widely used C Function of Mask (CFMask) algorithm [48]. Google's cloudScore algorithm, together with the Temporal Dark Outlier Mask (TDOM), were used for masking the cloud and shadow of Sentinel imagery [49,50]. Regression models [50] were used to minimize TOA reflectance differences among the sensors of Sentinel-2 MSI, Landsat 7 ETM+, and Landsat 8 OLI, using Landsat 8 OLI TOA reflectance as a benchmark.
A median value composite (similar to a maximum value composite [51], but the median value in each band over time was selected) was applied to Landsat 7/8 and Sentinel-2 (LS2) to generate 20-m resolution cloud-free imagery. Then, two commonly used vegetation indices-the Normalized Difference Vegetation Index (NDVI) [52] and Enhanced Vegetation Index (EVI) [53]-were calculated using Equations (3) and (4), respectively: where ρ blue , ρ green , ρ red , and ρ NIR are the blue, green, red, and near-infrared (NIR) bands of the composite image (Table 2). Previous studies indicated that principal component analysis (PCA) of Landsat images had stronger relationship with forest biomass than individual bands [31]. Therefore, PCA was computed based on the LS2 composite image.

Terrain Data
The Shuttle Radar Topography Mission (SRTM) digital elevation data (V3) at a resolution of 1 arc-second (approximately a 30-m spatial resolution) were provided by the National Aeronautics and Space Administration (NASA) Jet Propulsion Laboratory (JPL) [54]. The elevation was directly obtained from SRTM V3 data, and the corresponding slope was calculated using GEE's Application Programming Interface (API).

Rubber Plantation Biomass Estimation
Most AE biomass models are built on DBH, H, or their combination [12,13]. However, the biomass of rubber plantations also displays a close relationship with stand age-the R 2 of the established biomass model exceeds 0.93 [8,18]. A stand age-based biomass model is more popular in the remote sensing community because the stand age can be estimated more accurately by dense time-series satellite images than DBH or H [3,43], which are more easily and precisely obtained on the ground. In addition to the stand age, factors such as geography and climatic will affect the growth, meaning that rubber plantations of the same age have different canopy characteristics and biomasses. We made a hypothesis that integrating the stand age and spectral signatures from satellite imagery acquired in the growing season (April to December) can improve the estimation accuracy of the biomass model. Two methods were used to estimate the biomass of rubber plantations, and their differences were compared. The first was a regression model solely fitted by stand age, and the second was a Random Forest (RF) model built on LS2-based variables and/or the stand age. To generate a wall-to-wall biomass map of Hainan Island, both models require a reliable stand age map of rubber plantations in advance.

Mapping the Stand Age of Rubber Plantations
In a previous study, we developed a new algorithm to identify the establishment year (same as the stand age, where the stand age equals the target year minus the establishment year) of rubber plantations based on all Landsat images of Hainan Island [3]. Specifically, signatures that topsoil exposed during plantation establishment and canopy closure linearly increased during nonproductive periods (rubber seedling to mature plantations) were employed to identify the establishment year. Annual images of the minimum and average Land Surface Water Index (LSWI) from 1987 to 2015 were respectively composited from all Landsat images (about 2000 scenes in total) acquired in the growing season of the corresponding year and were then used to generate an establishment year map of rubber plantations for Hainan Island in 2015. The establishment year map at an annual resolution since 1987 shows a very high accuracy (R 2 = 0.85/0.99 and RMSE = 2.34/0.54 year at a pixel/plantation scale). The establishment year established before 1987 was uniformly set as 1986 because the earliest collection of Landsat 5 imagery of Hainan Island started in 1987. We used the same algorithm and thresholds to generate the stand age map in 2017, using the newly obtained map of rubber plantations in 2017 as a benchmark and adding annual composited images of 2016 and 2017 to the previous annual composited data set.

Regression Biomass Model
Both the main field data (51 plots) and supplementary field data (19 plots) were used to build a regression biomass model. The field data used were obtained from rubber plantations of less than 30 years old. It is difficult to accurately obtain a stand age of over 30 years from both remote sensing and ground surveys-the former case is because of a lack of time-series images for mapping (the earliest collection of Landsat 5 images of Hainan Island began in 1987), while the latter case is due to difficulties in determining the stand age from blurred tapping bark of old trees. The biomass was grouped according to stand age, and their average and standard deviation values were calculated. The stand age was used as the independent variable, and the average biomass was the dependent Remote Sens. 2020, 12, 3853 8 of 20 variable. An "S"-shaped sigmoidal biomass model was built using an online curve-fitting utility named MyCurveFit (mycurvefit.com). The biomass estimated by the regression model was referred to as B RG .

Random Forest (RF) Biomass Model
The RF machine learning algorithm is generally immune to data noise and overfitting [55] and has been widely used within the remote sensing community such as classification [56][57][58][59], biomass estimation [25,60], disease monitoring [61,62], and hazards detection [63]. The good performance of the RF model over other methods, such as maximum likelihood and single decision trees, has been demonstrated by many studies and was summarized by Belgiu et al. [64].
Three tests were conducted during RF model building: (1) biomass saturation (sensitivity to biomass is lost up to a point) with different variables from LS2 composite images, (2) model building using LS2-based variables, and (3) model building using LS2-based variables and the stand age. Based on previous studies on forest biomass [31], we selected variables such as B3, B4, B5, B7, PC1, PC2, PC3, NDVI, and EVI to construct an independent variable data set. A scatter plot and the Pearson correlation coefficient between different variables and biomass were used to assess the biomass saturation. We calculated multiple correlation coefficients using samples of different biomass sizes, starting from 40 Mg/ha, with 5 Mg/ha as the step until 80 Mg/ha. Every possible variable combination by gradually adding a distinct variable from the independent variable set-B3 and B4, and B3 and B5 as combination examples-was used to build RF models. We also tested the model performance under conditions of with/without stand age as a fixed independent variable. We kept the best 10 RF models each time and finally compared their performance under different variable combinations. Considering the problem of biomass saturation, model building was also conducted for samples from different biomass groups (smaller than mean values of 50, 60, 70, 80, 110, and 140 Mg/ha, respectively). Due to the limited sample size, we did not perform model testing on samples with a biomass of less than 50 Mg/ha, even using k-fold cross-validation (K = 4 was used here). In order to improve the efficiency of the test, we downloaded the sample data containing LS2-based variables from GEE to a local computer and then used the scikit-learn (https://scikit-learn.org/stable/index.html) python package for model testing. The number of trees in the forest was set as 100 and random_state was set as 1. The biomass estimated by the RF model was referred to as B RF .

Accuracy Assessment
Stratified k-fold cross-validation (StratifiedKFold) was used to evaluate the model accuracy. The 51 samples were split into four smaller sets (each set is called a fold) using grouping field (values close to 30,40,50,60,70,80,110, and 140 Mg/ha) for stratification. The model was trained using k-1 of the folds as training data and was validated with the remaining part. The performance measure reported by k-fold cross-validation was then the average of the values computed in the loop. Model performances were evaluated by the determination of coefficients (R 2 ) and Root Mean Square Error (RMSE). The scatter of B G against B RG and B RF was plotted, respectively. Matplotlib (https://matplotlib.org) and Seaborn, which is a Python data visualization library based on matplotlib (https://seaborn.pydata.org), were used for plotting.

Variable Sensibility to the Biomass of Rubber Plantations
The scatter plots and correlation coefficients of biomass against different spectral bands, vegetation indices, and principal components are presented in Figure 2. The reflectance of B3, B5, and B7 decreases with biomass when the biomass is lower than 65 Mg/ha and then increases slowly. Among them, B3 drops more significantly, followed by B7 and B5. PC1, PC2, PC3, NDVI, and EVI first increase with the increase in biomass, but there are almost no changes after the biomass is greater than 65 Mg/ha, Remote Sens. 2020, 12, 3853 9 of 20 except for PC2, showing an obvious decrease in a high biomass range. The relationship between B4 and biomass is the weakest, even if the biomass is divided into two parts at 65 Mg/ha. When the biomass is lower than 65 Mg/ha, its correlation coefficient with B3, B7, PC2, and NDVI is quite high among these variables, ranging from −0.27 to −0.59, −0.20 to −0.72, 0.37 to 0.64, and 0.45 to 0.62, respectively. When the biomass continues to increase, the correlation coefficient between it and all variables weakens.

Biomass Estimated by the Regression Model
A nonlinear symmetrical sigmoidal model, named the four parameter logistic (4PL) regression model, was used to establish a regression model between the biomass of rubber plantations and stand age (Equation (5), 95% confidence) (Figure 3 where is the biomass of rubber plantations (Mg/ha) and is the stand age. The R 2 is 0.963, adjusted R 2 is 0.967, RMSE is 7.12 Mg/ha, and Sum of Square Error (SSE) is 911.12. A slight overestimation is observed in immature plantations for which the stand age is less than 5 years. The maximum biomass for this model is about 145.40 Mg/ha. When combined with the biomass model and latest stand age map, the biomass of rubber plantations on Hainan Island could be obtained. Validation by main ground reference data collected in 2016 shows that R 2 reaches 0.79 and that RMSE

Biomass Estimated by the Regression Model
A nonlinear symmetrical sigmoidal model, named the four parameter logistic (4PL) regression model, was used to establish a regression model between the biomass of rubber plantations and stand age (Equation (5), 95% confidence) (Figure 3):  (Figure 4a). The scatter plot of B RG against B G demonstrates that most points distributed along the 1-to-1 line, but the model exhibits a slight overestimation when rubber biomass is less than about 80 Mg/ha (Figure 4a).
Remote Sens. 2020, 12, x 10 of 20 is 14.00 Mg/ha (Figure 4a). The scatter plot of against demonstrates that most points distributed along the 1-to-1 line, but the model exhibits a slight overestimation when rubber biomass is less than about 80 Mg/ha (Figure 4a).

Biomass Estimated by RF Models
R 2 of the best RF models developed by variable combinations of B3, B4, B5, B7, PC1, PC2, PC3, NDVI, and EVI ranges from 0.38 to 0.81, and the RMSE is between 4.31 and 24.34 Mg/ha ( Figure 5). The RF model trained with samples of less than or equal to group 50 (specifically, all samples with a biomass < 55 Mg/ha) holds the highest R 2 (0.81) and lowest RMSE (4.31 Mg/ha). R 2 decreases to about 0.65 when sample groups less than 60, 70, and 80 are used for modeling and then gradually decreases to about 0.38 when the samples of a higher biomass (groups 110 and 140) are added to the models. However, the RMSE of the RF models generally increases from 4.31 to 24.34 Mg/ha with the addition samples of a higher biomass.
When the stand age is a fixed independent variable, the prediction accuracies of RF models are significantly improved. R 2 ranges from 0.81 to 0.96, and the RMSE is between 4.08 and 10.59 Mg/ha ( Figure 5). The accuracy of the RF model built with samples of less than and equal to the 50 group is the highest, where R 2 reaches 0.96 (0.15 higher than that without using the stand age) and RMSE  (Figure 4a). The scatter plot of against demonstrates that most points distributed along the 1-to-1 line, but the model exhibits a slight overestimation when rubber biomass is less than about 80 Mg/ha (Figure 4a).

Biomass Estimated by RF Models
R 2 of the best RF models developed by variable combinations of B3, B4, B5, B7, PC1, PC2, PC3, NDVI, and EVI ranges from 0.38 to 0.81, and the RMSE is between 4.31 and 24.34 Mg/ha ( Figure 5). The RF model trained with samples of less than or equal to group 50 (specifically, all samples with a biomass < 55 Mg/ha) holds the highest R 2 (0.81) and lowest RMSE (4.31 Mg/ha). R 2 decreases to about 0.65 when sample groups less than 60, 70, and 80 are used for modeling and then gradually decreases to about 0.38 when the samples of a higher biomass (groups 110 and 140) are added to the models. However, the RMSE of the RF models generally increases from 4.31 to 24.34 Mg/ha with the addition samples of a higher biomass.
When the stand age is a fixed independent variable, the prediction accuracies of RF models are significantly improved. R 2 ranges from 0.81 to 0.96, and the RMSE is between 4.08 and 10.59 Mg/ha ( Figure 5). The accuracy of the RF model built with samples of less than and equal to the 50 group is the highest, where R 2 reaches 0.96 (0.15 higher than that without using the stand age) and RMSE   The scatter plot of the best RF model developed by stand age and LS2-based variables with kfold cross-validation is shown in Figure 4b. The points are more densely distributed near the 1-to-1 line when the biomass of rubber plantations is less than about 65 Mg/ha, and most points are then more scattered afterwards. The R 2 , RMSE, and slope of the fit line are 0.70, 16.41 Mg/ha, and 0.78, respectively. A slight underestimation is observed for plantations which exhibit a higher biomass.

Biomass Estimated by RF Models
Since the RF models have a higher prediction accuracy for plantations with a lower biomass ( Figure 5) and the regression model presents a high accuracy for plantations with a higher biomass (Figure 4a), a hybrid model integrating the optimal results of these two models can better predict the accuracy. The scatter plot of B against results obtained from the hybrid model (BRG + RF) is shown in Figure 4c. The prediction results are more closely distributed around the 1-to-1 line. The slope of the fit line, R 2 , and RMSE are 0.92, 0.83, and 12.48 Mg/ha, respectively.

The Spatial Distribution of Rubber Plantations and Biomass
The establishment year map of rubber plantations on Hainan Island by the end of 2017 is shown in Figure 6a. Immature plantations with a stand age of less than four years were excluded from this map because of larger mapping uncertainties. Rubber plantations are densely distributed in northern and center-to-east regions. Danzhou city possesses the largest area of rubber plantations, whilst other counties, such as Baisha, Chengmai, Qiongzhong, and Qionghai, also have dense distributions. The establishment year map shows an interesting distribution: A considerable number of rubber plantations have been established in the western and northern regions since 2000, whereas the old rubber plantations are mainly distributed along the coastal strip from south to northeast.
The spatial distribution of the biomass of rubber plantations generated by the hybrid model is shown in Figure 6b. The spatial distribution pattern is quite consistent with the spatial distribution of the establishment year. The biomass of newly established rubber plantations in the western and northern regions is generally below 60 Mg/ha, while the biomass of aging plantations distributed along the coastal strip from south to northeast and some border regions such as Danzhou city and Baisha county usually exceeds 110 Mg/ha. The biomass of some rubber plantations discretely distributed in Danzhou city and counties of Baisha and Qiongzhong is greater than 90 Mg/ha. When the stand age is a fixed independent variable, the prediction accuracies of RF models are significantly improved. R 2 ranges from 0.81 to 0.96, and the RMSE is between 4.08 and 10.59 Mg/ha ( Figure 5). The accuracy of the RF model built with samples of less than and equal to the 50 group is the highest, where R 2 reaches 0.96 (0.15 higher than that without using the stand age) and RMSE drops to 4.08 Mg/ha. When the sample biomass is high, the increase in R 2 is more obvious. For example, R 2 of a model built with samples of less than or equal to the 140 group increases from 0.38 to 0.85 and the RMSE decreases from 24.34 to 10.59 Mg/ha. The RMSE of the RF models generally increases with the addition of higher biomass samples.
The scatter plot of the best RF model developed by stand age and LS2-based variables with k-fold cross-validation is shown in Figure 4b. The points are more densely distributed near the 1-to-1 line when the biomass of rubber plantations is less than about 65 Mg/ha, and most points are then more scattered afterwards. The R 2 , RMSE, and slope of the fit line are 0.70, 16.41 Mg/ha, and 0.78, respectively. A slight underestimation is observed for plantations which exhibit a higher biomass.
Since the RF models have a higher prediction accuracy for plantations with a lower biomass ( Figure 5) and the regression model presents a high accuracy for plantations with a higher biomass (Figure 4a), a hybrid model integrating the optimal results of these two models can better predict the accuracy. The scatter plot of B G against results obtained from the hybrid model (B RG + RF ) is shown in Figure 4c. The prediction results are more closely distributed around the 1-to-1 line. The slope of the fit line, R 2 , and RMSE are 0.92, 0.83, and 12.48 Mg/ha, respectively.

The Spatial Distribution of Rubber Plantations and Biomass
The establishment year map of rubber plantations on Hainan Island by the end of 2017 is shown in Figure 6a. Immature plantations with a stand age of less than four years were excluded from this map because of larger mapping uncertainties. Rubber plantations are densely distributed in northern and center-to-east regions. Danzhou city possesses the largest area of rubber plantations, whilst other counties, such as Baisha, Chengmai, Qiongzhong, and Qionghai, also have dense distributions. The establishment year map shows an interesting distribution: A considerable number of rubber plantations have been established in the western and northern regions since 2000, whereas the old rubber plantations are mainly distributed along the coastal strip from south to northeast.  The total biomass of rubber plantations varies dramatically among different counties or cities (Figure 7b). Danzhou city possesses a total biomass of 9.77 × 10 6 Mg, which is far higher than other counties or cities, followed by the counties of Baisha, Qiongzhong, Chengmai, and Qionghai city, all of which have a total biomass of rubber plantations greater than 4.50 × 10 6 Mg. The total biomasses of  The total biomass of rubber plantations varies dramatically among different counties or cities (Figure 7b). Danzhou city possesses a total biomass of 9.77 × 10 6 Mg, which is far higher than other counties or cities, followed by the counties of Baisha, Qiongzhong, Chengmai, and Qionghai city, all of which have a total biomass of rubber plantations greater than 4.50 × 10 6 Mg. The total biomasses of The total biomass of rubber plantations varies dramatically among different counties or cities (Figure 7b). Danzhou city possesses a total biomass of 9.77 × 10 6 Mg, which is far higher than other counties or cities, followed by the counties of Baisha, Qiongzhong, Chengmai, and Qionghai city, all of which have a total biomass of rubber plantations greater than 4.50 × 10 6 Mg. The total biomasses of rubber plantations in Tunchang county, Wanning city, Ledong county, Dinan city, and Lingao county are about 3.21 × 10 6 , 3.05 × 10 6 , 2.78 × 10 6 , 2.59 × 10 6 , and 2.45 × 10 6 , respectively, while the remaining counties or cities have values of less than 2.00 × 10 6 Mg. Dongfang city and Lingshui county have the smallest biomass of rubber plantations, possessing values of 6.88 × 10 5 and 6.37 × 10 5 Mg, respectively. The total biomass of rubber plantations on Hainan Island in 2017 was 5.40 × 10 7 Mg.
The average biomass of rubber plantations varies greatly at different elevations and slopes (Figure 8a,b). The highest average biomass is located at an elevation between 200 and 300 m, which is about 93 Mg/ha, followed by rubber plantations distributed in an elevation range from 300 to 400 m and from 400 to 600 m. The lowest average biomass is located at an elevation between 50 to 100 m, which is about 85 Mg/ha. In areas with a slope below 12 degrees, the average biomass of rubber plantations increases with an increase in the slope, gradually increasing from about 83 Mg/ha within five degrees to about 94 Mg/ha at regions of 8-12 degrees. After the slope exceeds 12 degrees, the average biomass of rubber plantations is maintained at about 96 Mg/ha. The average biomass of rubber plantations varies greatly at different elevations and slopes (Figure 8a,b). The highest average biomass is located at an elevation between 200 and 300 m, which is about 93 Mg/ha, followed by rubber plantations distributed in an elevation range from 300 to 400 m and from 400 to 600 m. The lowest average biomass is located at an elevation between 50 to 100 m, which is about 85 Mg/ha. In areas with a slope below 12 degrees, the average biomass of rubber plantations increases with an increase in the slope, gradually increasing from about 83 Mg/ha within five degrees to about 94 Mg/ha at regions of 8-12 degrees. After the slope exceeds 12 degrees, the average biomass of rubber plantations is maintained at about 96 Mg/ha. The total biomass of rubber plantations on Hainan Island is mainly distributed in low elevation and flat areas (Figure 8 c,d). At an elevation of 100 to 200 m, the total rubber biomass is the highest, exceeding 2.16 × 10 7 Mg, followed by areas of 50 to 100 and 200 to 300 m, with values of about 1.31 × 10 7 Mg and 8.95 × 10 6 Mg, respectively. The total biomass in areas above a 600-m elevation is extremely low. The total biomass of rubber plantations gradually decreases with an increasing slope. In flat areas of less than five degrees, the total biomass is close to 2.28 × 10 7 Mg, followed by areas of 5-8 degrees, for which the total biomass is 1.12 × 10 7 Mg.

Biomass Saturation with Different LS2-Based Variables
Many studies have reported that the saturation point between the biomass of rubber plantations and different variables from remote sensing was very low. In this study, the saturation point of rubber The total biomass of rubber plantations on Hainan Island is mainly distributed in low elevation and flat areas (Figure 8c,d). At an elevation of 100 to 200 m, the total rubber biomass is the highest, exceeding 2.16 × 10 7 Mg, followed by areas of 50 to 100 and 200 to 300 m, with values of about 1.31 × 10 7 Mg and 8.95 × 10 6 Mg, respectively. The total biomass in areas above a 600-m elevation is extremely low. The total biomass of rubber plantations gradually decreases with an increasing slope. In flat areas of less than five degrees, the total biomass is close to 2.28 × 10 7 Mg, followed by areas of 5-8 degrees, for which the total biomass is 1.12 × 10 7 Mg.

Biomass Saturation with Different LS2-Based Variables
Many studies have reported that the saturation point between the biomass of rubber plantations and different variables from remote sensing was very low. In this study, the saturation point of rubber plantations with B3, B4, B5, B7, PC1, PC2, PC3, NDVI, and EVI from LS2-based composite images was about 65 Mg/ha, which was higher than in studies conducted in Cambodia [36] and Thailand [2]. Avtar et al. [36] found that the PALSAR HV band positively correlated with the aboveground biomass of rubber plantations but was saturated at 50 Mg/ha. Yasen and Koedsin [2] found that the aboveground biomass of rubber plantations was saturated with WorldView-2 images at 40 Mg/ha in Phuket Province, Thailand. The main reason for the higher saturation point we monitored was that it contains belowground biomass. If the underground biomass was deducted at a rate of 20% [13], the saturation point of the aboveground biomass was also around 50 Mg/ha, which was consistent with these studies. Since the most LS2-based variables were saturated when the rubber biomass was very low, it was difficult to accurately estimate the biomass of rubber plantations from these variables. For example, only one model we developed using samples less than 55 Mg/ha had an R 2 of 0.81. The R 2 of other models built on samples with a biomass of less than 85 Mg/ha was about 0.65 and decreased to 0.38 when all samples were used for modeling (Figure 5a).

Biomass Estimation Using Stand Age and Remote Sensing Parameters
The regression biomass model fitted by stand age had a high accuracy (R 2 = 0.96 and RMSE = 7.12 Mg/ha), which was consistent with previous studies [8,10,18]. The high accuracy of the model could be explained by the fact that rubber plantations have a shorter economic life cycle (about 25-30 years). The growth of biomass only became slow after more than 20 years [38], so the biomass had a close relationship with stand age. The maximum biomass from this regression model (145.40 Mg/ha) was similar to studies such as [9,65], reporting 63.70 and 65.10 Mg C/ha of carbon stock for rubber plantations with rotation periods of 38 and 25 years, respectively. However, this value was much higher than the value from Trisasongko's model-the maximum biomass of a 30-year-old rubber plantation was about 70 Mg/ha [37]. In addition to stand age, the planting density also affects biomass accumulation greatly [10]. We have incorporated planting density in the regression model, according to 50 years of extensive surveys from Hainan State Farm Bureau.
Obtaining a high-precision stand age map was essential for estimating the biomass of rubber plantations over a large area. Currently, most studies use one or several satellite images to estimate stand age, obtaining a stand age map of different age groups (e.g., <7, 5-15, and >15) [66][67][68]. In this study, the stand age map of rubber plantations in 2017 was generated at annual steps using all LS2 images acquired from 1987 to 2017 and a novelty algorithm developed in a previous study [3]. With this reliable stand age map, the biomass map through the regression model of Hainan Island exhibited high accuracy (R 2 = 0.79 and RMSE = 14.00 Mg/ha), better than models without the stand age-for example, higher than the regression model (R 2 = 0.43) developed in Cambodia [36]; artificial network model (R 2 = 0.66 and RMSE = 11.97 Mg/ha) established in Phuket Province, Thailand [2]; and RF model (R 2 = 0.43 and RMSE = 34.05 Mg/ha) built in Xishuangbanna, Yunnan Province, China [34]. Using an accurate stand age map and models, the wall-to-wall biomass map could be obtained. Theoretically, this method was superior to the traditional BCEF approach since it was difficult to establish a yearly scale conversion factor [14].
In addition to the stand age, signatures from satellite images contributed to the improvement of biomass estimation. However, the model built based on all samples had a large estimation bias especially in plantations which have higher biomass (Figure 4b) because the biomass of rubber plantations was saturated at around 65 Mg/ha (Figure 2). Previous research also found a similar estimation bias. For example, Yasen and Koedsin [2] found that the artificial neural network model was seriously underestimated after the biomass of rubber plantations exceeded about 50 Mg/ha, and Wang et al. [34] found that the stepwise regression model showed severe underestimation after 200 Mg/ha. The estimation bias issue in the RF model could be solved by replacing the prediction results of aging plantations (biomass of over 65 Mg/ha) with corresponding results from the regression model (estimated solely by stand age). R 2 of the hybrid model was improved from 0.79 to 0.83, and the RMSE was decreased from 14.00 to 12.48 Mg/ha (Figure 4b).
RF was widely used and proven to have a higher prediction accuracy than other algorithms because of its many advantages, such as the fact that it is not sensitive to noise or overtraining and requires less computational resources [69]. In this study, RF also showed a high prediction accuracy. For example, the R 2 of RF models developed with samples of biomass less than 85 Mg/ha and LS2-based variables reached about 0.65, although the biomass of rubber plantations was saturated at about 65 Mg/ha. However, as reported in previous studies [64], the RF model was also very sensitive to the sample design. Initially, we randomly divided the samples into training and validation sets, but the estimation accuracy was quite low because sample sizes of different age groups varied greatly. After we adopted the stratified k-fold cross-validation method, the accuracy and stability of the RF models improved greatly since more samples could be used for modeling and cross-validation. In addition, cross-sensor calibration among LS2 images also promoted high accuracy of RF models since the differences in band values among MSI, ETM+, and OLI were significant for all cross-sensor band pairs [50]. In the future, more data sources, such as PALSAR-2 radar data, and more influence factors, such as the altitude and slope, can be included to further improve the prediction accuracy of the models. For example, terrain factors affect not only the light and temperature environment of the forest plantations but also the signal strength of remote sensing, which is used for modeling [22].

Biomass of Rubber Plantations on Hainan Island
Rubber plantations account for more than one fourth of the total forest area on Hainan Island and are the largest forest plantation ecosystem there. Therefore, it is of vital importance to evaluate their biomass at a province scale. The average rubber biomass for the island is 90.53 Mg/ha, which is higher than the 83.25 Mg/ha [33] and 75.90 ± 58.75 Mg/ha [32] values estimated in Xishuangbanna, Yunnan Province, China. The main reason for the higher average biomass is the introduction of stand age during model building, which greatly reduced the signal saturation problem commonly encountered in the remote sensing community. The average rubber biomass was smaller than 108.35 ± 11.48 Mg/ha in 14-year-old plantations at a low elevation but higher than 79.71 ± 14.48 Mg/ha in plantations of the same age at a medium elevation in Xishuangbanna, Yunnan Province, China [12].
Since many rubber plantations were established after 2003, the biomass density in the northern and western areas of Hainan Island was relatively low and was mostly less than 75 Mg/ha ( Figure 6). Danzhou city, which is the largest rubber production base on Hainan Island, is mixed with plenty of young and aging rubber plantations, leading to a higher average biomass (about 80 Mg/ha) than that of Lingao county (about 71 Mg/ha, representing the lowest average biomass), where most rubber plantations were established after 2003 (Figure 7). The highest biomass density along the coastal strip from south to northeast (e.g., the average biomass in the cities of Sanya, Wanning, Qionghai, and Wenchang is about 100 Mg/ha, Figure 7) can be explained by the large number of aging rubber plantations there. It was previously found that farmers were no longer willing to plant rubber trees in these regions, frequently hit by severe typhoons [46,70], even if the natural rubber price was rising rapidly. The expansion trend of rubber plantations in the western and northern areas was very obvious, leading to great changes in the biomass spatial pattern. However, rubber plantations have not expanded to high elevations and steep areas in the past few decades, so the average biomass in areas with a slope higher than 12 degrees has hardly changed, and the total biomass is concentrated in areas 400 m above sea level (Figure 8). Our previous study indicated that a large number of rubber plantations converted from cropland were concentrated in the northwestern part, which has relatively low elevations ( Figure 1) and a large amount of flat cropland [3], causing a very low average biomass in low elevation and flat areas (Figure 8). The total biomass of rubber plantations by the end of 2017 was 5.40 × 10 7 Mg, which is equivalent to 2.70 × 10 7 Mg carbon if 0.5 is taken as the conversion factor [9].

Uncertainties in Biomass Estimation on a Large Scale
There are several factors affecting the estimation accuracy of rubber plantation biomass, including the accuracy of the stand age map, landscape fragmentation, and the tree density.

The Accuracy of the Stand Age Map
Although we used all LS2 images and an algorithm based on changes of land use and the biophysical canopy of rubber plantations to generate the stand age map [3], there are still many uncertainties, especially for plantations established before 2000-a stage of acquiring a limited number of images from the Landsat 5 satellite. In addition, the establishment year of rubber plantations established earlier than 1987 was uniformly set to 1986, leading to an underestimation of the stand age and biomass (estimated by the regression model).

Landscape Fragmentation
Due to the strong interference from human activities, rubber plantation patches were quite small and often mixed with different crop patches. The severely fragmented landscape has brought many difficulties to the accurate mapping rubber plantations and their stand age [3]. This is why we used a single pixel instead of the average value of the 3 × 3 pixel window of the field data for modeling.

Tree Density
Due to being frequently affected by severe typhoons, the tree density of rubber plantations on Hainan Island gradually decreases as the stand age increases [46,70]. However, we could not establish a model of the tree density and stand age because of the lack of abundant field data. Instead, a fixed density of 381.45 trees/ha, according to field data collected by Hainan State Farm Bureau, was used to estimate the biomass of rubber plantations. This tree density was smaller than the 418.99 tree/ha value used in Thailand, which is rarely hit by typhoons [9]. Using a fixed tree density underestimates the biomass of young and middle-aged plantations but overestimates the biomass of aged plantations. However, the biomass of rubber plantations established before 1987 was underestimated due to the underestimation of stand age (uniformly set to 1986), which, to some extent, offsets the overestimated biomass of old rubber plantations.

Conclusions
Biomass is a crucial variable for estimating the carbon storage capacity and the potential carbon pool size of a forest ecosystem. It is of great importance to estimate the biomass of rubber plantations that are widely distributed in tropical regions. Our results show that the biomass of rubber plantations is closely related to stand age and that R 2 of the stand age-based biomass model (regression model) can reach 0.97. When the biomass of plantations is small, they are closely related to the LS2-based variables, especially B3, B7, PC2, and NDVI, and the absolute value of most correlation coefficients is around 0.55. However, the biomass of rubber plantations is saturated with all LS2-based variables at around 65 Mg/ha. Combined with the annual-scale high-precision map of the rubber stand age in 2017 and regression model, the biomass of rubber plantations on Hainan Island was estimated with a satisfactory accuracy (B RG , R 2 = 0.79 and RMSE = 14.00 Mg/ha). However, the hybrid model with joint results of regression and RF models has a higher accuracy and low estimation bias (B RG + RF , R 2 = 0.83 and RMSE = 12.48 Mg/ha). The introduction of stand age into the models significantly solved the problem of biomass saturation, which occurs at about 65 Mg/ha with satellite-based signatures.
The results showed that there were a large number of rubber plantations with a low biomass distributed in the western and northern areas of Hainan Island, usually below 75 Mg/ha, because of rapid expansion since 2003. High-biomass rubber plantations were mainly distributed in the inner region of the coastal strip from south to northeast, which have frequently suffered from severe typhoons in the past few decades. For example, the average biomass of rubber plantations in Wanning city, southeast of Hainan Island, was close to 105 Mg/ha, ranking top among all cities and counties. Danzhou city, representing the largest rubber production base on Hainan Island, possessed the largest total biomass of 9.77 × 10 6 Mg, followed by Baisha and Qiongzhong counties. The total biomass of rubber plantations on Hainan Island by the end of 2017 was 5.40 × 10 7 Mg, indicating a great carbon pool there. The maps of stand age and biomass are of great value for regional carbon budget evaluation and can assist in the development of sustainable management policies to maximize the potential benefits of rubber tree resources.