Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring

Wongsai, Sangdao; Sanpayao, Manatsawee; Jirakajohnkool, Supet; Wongsai, Noppachai

doi:10.3390/f16040672

Open AccessArticle

Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring

¹

Department of Mathematics and Statistics, Faculty of Science and Technology, Thammasat University, Pathum Thani 12121, Thailand

²

Thammasat University Research Unit in Data Learning, Thammasat University, Pathum Thani 12121, Thailand

³

College of Arts, Media and Technology, Chiang Mai University, Chiang Mai 50200, Thailand

⁴

Department of Sustainable Development Technology, Faculty of Science and Technology, Thammasat University, Pathum Thani 12121, Thailand

^*

Author to whom correspondence should be addressed.

Forests 2025, 16(4), 672; https://doi.org/10.3390/f16040672

Submission received: 12 March 2025 / Revised: 3 April 2025 / Accepted: 10 April 2025 / Published: 11 April 2025

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

Accurate mapping of rubber plantation stand age is essential for forest inventory, land use monitoring, and carbon stock estimation. This study proposes a pixel-based approach that integrates the Bare Soil Index (BSI) with Normalized Difference Vegetation Index (NDVI) time series to detect land clearance events and predict stand age. The methodology involves feature engineering, selection, and evaluation of three tree-based and one non-parametric supervised machine learning models. Predictive features were extracted from interannual spectral index profiles, with an optimal subset selected using Recursive Feature Elimination (RFE). The best-performing model, optimized using a grid search matrix, was trained and applied to stacked images for pixel-level land clearance prediction over 37 years of NDVI and BSI time series. By aggregating predictions and performing post-classification analysis, a spatially explicit stand-age map was generated. The result was validated using secondary rubber farmer registration data, achieving an overall prediction accuracy of 84.5% and a root mean squared error (RMSE) of 1.86 years. The findings highlight the effectiveness of machine learning with NDVI and BSI time series for stand-age estimation, contributing to advancing remote sensing methodologies for forest inventory and support furfure high-precision carbon stock assessments.

Keywords:

interannual spectral profile; decision tree; recursive partitioning; machine learning; Landsat time series; bare soil exposure

1. Introduction

Rubber (Hevea Brasiliense) plantations have become a vital part of agricultural economies worldwide, especially in tropical regions such as Southeast Asia, Africa, and South America. Rubber plantations have rapidly extended beyond their traditional growing areas, now covering regions such as southern China, eastern Myanmar, northern Laos, northeast Thailand, and Cambodia [1]. The growing demand for natural rubber has led to the large-scale expansion of rubber plantations, contributing to economic growth while raising concerns about deforestation, biodiversity loss, and carbon sequestration [2,3].

Estimating the age of rubber plantations is essential for monitoring their productivity, carbon sequestration potential, and land use dynamics. Remote sensing has proven to be an effective tool for this purpose, offering consistent, large-scale, and temporally accurate data for mapping rubber tree growth and stand age [4,5,6,7]. Furthermore, carbon stock dynamics in rubber plantations vary along environmental gradients, affecting their role in carbon sequestration and emissions [8]. Previous studies have employed various remote sensing-based approaches, including time-series analysis of vegetation indices, phenological metrics, and machine learning models to classify and estimate the age of rubber plantations [1,9,10,11].

Advancements in Geographic Object-Based Image Analysis (GEOBIA) have further improved the accuracy of remote sensing techniques in rubber plantation monitoring. By integrating pixel- and object-based image analysis with Landsat time series data, researchers have successfully mitigated spectral noise and enhanced the accuracy of rubber standing age estimation [12,13,14]. Studies have demonstrated that rubber trees exhibit distinct foliation cycles, allowing researchers to distinguish their growth patterns from other vegetation types through remote sensing data [1,15]. Additionally, Sentinel-2 remote sensing imagery has been effectively utilized for biomass estimation in rubber plantations, optimizing variable selection to improve accuracy in assessing carbon storage [16]. The integration of textural features in remote sensing data has significantly improved the extraction and classification of rubber plantations, enhancing mapping accuracy [17].

Our previous study [18] developed a method for estimating rubber plantation establishment years using dense Landsat time series in a tropical monsoon region. The approach employed a tree-based machine learning algorithm trained on NDVI values extracted from multi-temporal Level-2 Landsat Surface Reflectance imagery. Specifically, the Recursive Partitioning (RP) algorithm was used to predict plantation establishment years, achieving an overall accuracy of 84% and a root mean squared error (RMSE) of 0.83 years. This method effectively addressed the limitations related to cloud cover and localized phenological variation, providing a robust framework for rubber age mapping [4,5,6,7].

A key limitation of our previous approach was the requirement to manually delineate individual rubber plantation polygons during data preprocessing. This vector-based method relied on prior identification of plantation boundaries for the entire study area, which involved time-consuming, labor-intensive steps, particularly at large or regional scales. Moreover, predictions were constrained to these manually digitized polygons rather than being applied directly at the pixel level, which limited both the scalability and spatial flexibility of the method.

To overcome these limitations, the present study aims to refine and automate the methodology for large-scale applications, contributing to the development of a high-resolution national spatial database of rubber plantations. Our approach introduces a pixel-level method for detecting land clearance and estimating plantation age. By integrating interannual NDVI and BSI time series with supervised learning techniques, this method enables scalable, annual-resolution age mapping without the need for manually digitized inputs. This advancement significantly enhances the efficiency, spatial detail, and practical applicability of rubber plantation monitoring across broad geographic areas.

Such a spatial database is crucial for supporting government decision-making and strategic policies related to sustainable agriculture, forest management, and economic planning. By integrating forest inventory techniques with quantitative remote sensing methods, this research contributes to the broader goals of balancing agricultural productivity, conserving natural resources, and promoting sustainable bioeconomy initiatives. The findings will also support the implementation of the Bio–Circular–Green (BCG) economic model, promoting resource efficiency, renewable energy development, and long-term environmental sustainability, particularly in the context of spatial carbon sequestration [16,19,20,21,22].

2. Materials and Methods

2.1. Study Area

The study area covers the mainland rubber cultivation region in Surat Thani province, southern Thailand, as illustrated in Figure 1. Located on the eastern side of Thailand’s southern peninsula, Surat Thani is approximately 651 km south of Bangkok. According to 2021 data from the Office of Agricultural Economics (OAE), Ministry of Agriculture and Cooperatives of Thailand, about 58% of Thailand’s rubber cultivation area (~23,300 km²) is concentrated in the southern region, with approximately 3800 km² in Surat Thani. Surat Thani ranks first in the country, contributing 16.30% of southern Thailand’s rubber growth area, equivalent to 9.65% of the national total. This study, however, mapped only those rubber cultivation areas within the extent of overlap in Landsat imagery, as shown in Figure 1.

Thailand’s climate is divided into three seasons: rainy, winter, and summer. Two monsoon seasons—the southwest and the northeast—affect the southern region, including Surat Thani. The southwest monsoon brings warm, moist air from the Indian Ocean, resulting in abundant rainfall from mid-May to mid-October on the western side of the peninsula, while the eastern side receives less precipitation due to the high mountain ranges that bisect the peninsula. The northeast monsoon occurs in winter (mid-October to mid-February), carrying cold, dry air from the Asian continent, which cools the southern peninsula and induces dry-season aridity. However, moisture-laden winds crossing the Gulf of Thailand bring ongoing rainfall to the eastern coast, especially from November to mid-December. Consequently, Surat Thani experiences year-round rainfall, with higher rainfall during the northeast monsoon season than during the southwest monsoon season.

2.2. Data

2.2.1. Landsat Time Series

A time series of Landsat Collection 2 (C2) Tier 1 Level-2 (L2) Surface Reflectance (SR) images covering the study area (spanning four scenes: path 129–1130 and row 053–054) was utilized in this study. The image collections were acquired through the Google Earth Engine (GEE) platform. A total of 3406 Landsat L2 SR images were obtained from November 1987 to October 2024, comprising 1276 Landsat 5 (L5) Thematic Mapper (TM) images, 1189 Landsat 7 (L7) Enhanced Thematic Mapper Plus (ETM+) images, and 941 Landsat 8 (L8) Operational Land Imager (OLI) images. The Landsat 7 collection includes both fully intact and Scan Line Corrector (SLC)-off images, with 184 complete and 1005 SLC-off images.

All available images over the 37-year study period were used, not just those from the dry season when cloud cover is typically reduced, and rubber plantation establishment generally occurs. This inclusive approach was necessary due to the year-round rainfall in Surat Thani, with only a brief period (January–March) of lower cloud cover and reduced precipitation.

2.2.2. Land Use and Land Cover (LULC)

The most recently updated LULC data for 2018, obtained from the Land Development Department (LDD), were used to mask rubber cultivation areas in the study region. The LULC data were classified into five primary land cover classes, each with three hierarchical levels. Only polygons labeled as rubber agricultural land (level-3 LULC code “A302”) were utilized for this study. According to LDD’s data creation process, the vector LULC data were digitized meticulously from high-resolution aerial and satellite images to delineate each land use type.

Since rubber cultivation constitutes most of the agricultural land use in the study area, an initial “A302” polygon was created, followed by the removal of sections over non-rubber land areas. Due to the predominance of smallholder farmers in Thailand’s rubber industry, “A302” polygons were typically large, multipart geometries encompassing numerous small to medium-sized, adjacent plantations [2,23]. Approximately 4.2% of rubber cultivation areas were represented as single-part polygons, indicating individual own plantations and associating with land title area.

2.2.3. Secondary Ground Reference Data

The registration database of rubber farmers in Surat Thani province, provided by the Rubber Authority of Thailand (RAOT), was utilized for region of interest (ROI) referencing and accuracy assessment. This spatial database contains essential information pertinent to this study, including the year of rubber plantation establishment, the georeferenced shape of each plantation based on land title shapefiles obtained from the Department of Land, and the coordinates of the plantation center. Initially, there were 55,886 plantation registry records available up to July 2023, with establishment years ranging from 1978 to 2022. For the purpose of output classification accuracy assessment, however, only 53,789 records of plantations established from 1988 onwards were used, since the obtained Landsat SR images start from November 1987.

2.3. Mapping Stand Age

The primary objective in predicting the age of rubber plantations is to determine the timing of land clearance (T₀) prior to the initiation of crop planting, as observed in the time series of spatial index imagery. To achieve this, four ML algorithms were trained and evaluated. The best-performing model was selected to be used to identify the area of land clearance for each year during 1988 to 2024. The output of this predictive modeling is a ML model that selects a set of highly informative independent variables (predictors) to accurately predict the target variable (outcome). The procedure for mapping the annual stand age of rubber trees comprises three main stages: (i) data preprocessing, (ii) machine learning modeling and prediction, and (iii) accuracy assessment. Figure 2 below illustrates this workflow.

2.3.1. Landsat Images Pre-Processing and Interannual Spatial Indices Time Series

The change in land cover within plantations, transitioning from dense vegetation to exposed soil, serves as the central principle of our proposed method for identifying the timing of plantation establishment using Landsat time series imagery. A sudden decline in vegetation cover can be observed in the NDVI time series [18]. However, NDVI values often reach saturation during certain phenological stages and are susceptible to atmospheric interference and background reflectance [24], particularly in low-vegetation areas dominated by soil. Rubber tree defoliation typically occurs during the dry season, coinciding with land-clearing activities. This overlap can lead to inconsistencies and fluctuations in NDVI values, making the indication of land clearance ambiguous, even in the interannual profile. Consequently, we opted to use NDVI with the assistance of Bare Soil Index (BSI) in this study. The BSI can be calculated using the following equation [25]:

BSI = \frac{(R_{SWIR 2} + R_{red}) - (R_{NIR} + R_{blue})}{(R_{SWIR 2} + R_{red}) + (R_{NIR} + R_{blue})},

(1)

where R denotes the surface reflectance of the SWIR2, NIR, and red and blue spectral bands. The BSI value fluctuates between −1 and 1, with higher levels signifying a greater change in bare soil. A positive BSI value generally indicates bare land, while land with sparse or dense vegetation cover has a negative BSI value [25]. This characteristic of BSI makes it more suitable for identifying exposed areas associated with land preparation for rubber plantation establishment in imagery.

To generate the interannual NDVI and BSI time series, Landsat SR images were acquired and preprocessed using the GEE platform. All available SR images from Landsat sensors (L5, L7, and L8), covering the period from November to October of the subsequent year (marking the end of the monsoon season), were initially subjected to cloud masking using QA_PIXEL bitmask information and clipped to the boundaries of the Surat Thani province area. BSI images were then generated for each multi-spectral image collected within a given year. Subsequently, five composite images (maximum, Q3, median, Q1, and minimum) for each spectral index were created from the compilation of all generated spectral index images. This preprocessing workflow was repeated to produce 37 years of interannual images for each of the five NDVI and BSI distribution summary values.

2.3.2. On-Screen Reference Sites and Year of Rubber Planting Identification

The regions of interest (ROIs) used for training and validation were sampled using QGIS software (version 3.34.14-Prizren). Grids of 10 × 10 km cells were established across the study area to serve as a spatial sampling framework. Rubber cultivation areas were first identified using the “A302” code polygon from the national LULC dataset. Historical high-resolution (HHR) aerial imagery from Google Earth were then used to assist in the visual selection of ROIs. Within each systematically generated 10 × 10 km grid cell, at least four ROIs were selected.

To ensure spatial and temporal representativeness, we used the RAOT rubber farmer registration database to identify and filter medium-sized plantations (5–15 hectares). These were cross-referenced with recent HHR imagery to confirm plantation boundaries. Rather than delineating plantations entirely from scratch, we re-digitized the boundaries based on existing vector data from the registration database and land use shapefiles. This step was carried out to ensure that the reference data accurately captured homogeneous land clearance footprints, such as exposed bare soil, as observed in historical imagery. The year of plantation establishment for each ROI was recorded from the RAOT database and cross-validated with the timestamp of the corresponding HHR image.

Each ROI was further classified by topography as either plainland (PL) or hillside (HS). It is important to emphasize that this manual re-digitization process was limited to a small set of ROIs used exclusively for training and validation. This on-screen referencing approach, using HHR aerial images from Google Earth for ROI interpretation and digitization, has been widely implemented in various studies [9,18,26], proving to be practical and cost-effective, reducing the need for field surveys.

We aimed to reference plantations with their land clearance year (T₀) across every sample year from 1988 to 2024. However, the availability of images on the Google Earth platform posed a challenge, as few images were available before 2000, and coverage over the study area was sometimes incomplete. Consequently, young rubber plantations (<6 years old) were distinguished from adjacent mature plantations, where plantation boundaries were clearly delineated. These ROIs were initially left with unidentified T₀ values at this step, to be detected through subsequent time series analysis.

A total of 447 ROIs were initially sampled for this study, comprising 387 (86.5%) ROIs in PL and 60 (13.5%) ROIs in HS areas, as shown in Figure 1. Each ROI polygon was buffered inward by 30 m to exclude mixed spectral signal pixels at plantation borders. Within each ROI, five points of interest (POIs) were randomly generated, instead of a centroid point of the ROI, to increase the sample size. A 30 m offset was applied to prevent POIs from falling within the same Landsat pixel. All POIs were used as sampling points to extract NDVI and BSI values from the time series of five distribution summary NDVI and BSI images, enabling the generation of interannual NDVI and BSI profiles. Finally, these profiles were plotted to observe the patterns of spectral index dynamics corresponding to land clearance occurrences across different areas and time periods (years), as well as to identify year of T₀.

2.3.3. Features Generation

Before initiating the supervised model training, a preliminary time series analysis was conducted to investigate patterns of land clearance events and the subsequent growth phases of rubber trees within the 37-year interannual NDVI and BSI profiles. The NDVI and BSI values were plotted as interannual time series to visualize data patterns, validate the year of rubber plantation establishment for ROIs with labeled T₀, and observe significant changes in NDVI and BSI values to identify T₀ for ROIs with initially unknown T₀.

The modeling dataset in this study was structured in the time domain, where each POI contributes a multi-year temporal profile rather than serving as a single spatial observation. For each POI, annual NDVI and BSI values were extracted from 1988 to 2024, forming a 37-year time series. Each year within this period was treated as a separate record, resulting in 37 records per POI.

To capture key characteristics of bare soil exposure and vegetation regrowth in rubber plantations, we developed a set of features based on NDVI and BSI profiles. BSI values are highly negative under dense vegetation, increase with decreasing vegetation cover, and may exceed zero in areas of exposed bare soil [25]. In PL areas, land preparation typically results in positive BSI values, while in HS areas, BSI remains slightly negative due to soil conservation practices such as intercropping or ground cover maintenance [18]. NDVI shows an inverse pattern, ranging from 0 to 1, and typically drops sharply during land clearance and increases gradually during regrowth.

We analyzed the interannual distribution of BSI and NDVI values around land clearance events using a 9-year window—two years before (T₋₂, T₋₁) and six years after (T₊₁ to T₊₆) the identified clearance year (T₀). This window was based on our previous study [18], which observed that NDVI values typically decline from ~0.8 to ~0.2 at T₀ and gradually return to ~0.8 within six years post-clearance, capturing the key dynamics of rubber plantation development.

Subsequently, predictive features were derived from the five key distribution metrics—maximum, Q3, median, Q1, and minimum—for both NDVI and BSI within this 9-year window. Additional features included differences and ratios between values for a given year and the other eight years in the window. In this context, the modeling dataset consists of a nominal binary outcome variable and continuous features. Each POI was represented by 37 records, corresponding to 37-year interannual bare soil and vegetation signature profile characteristics. The binary outcome is labeled “1: Yes” if land clearing occurred in that year and “0: No” otherwise, typically resulting in 36 “0: No” outcomes and only one “1: Yes” outcome per POI. Some POIs, however, may exhibit two instances of land clearance (double T₀) due to the average economic life span of rubber trees being around 20 years. Thus, there is a chance that re-planning events could occur during the 37-year study period. As a result, the dataset was highly imbalanced. Moreover, feature values for the first two and last six records of each POI were incomplete due to missing NDVI and BSI values required to compute lag and lead values across the 9-year window.

2.3.4. Features Selection

During the feature generation process, over 200 predictors were created. To reduce computational costs during model training, a two-step dimensionality reduction was applied to the modeling dataset. First, simple filter methods were employed: Pearson’s correlation was used to measure the linear relationship between each feature, and the Chi-Square test was utilized to assess the independence between each feature and the categorical target variable. In cases where pairs of features exhibited an absolute correlation coefficient value exceeding 0.8, one feature from the pair was removed. Additionally, features with a p-value greater than 0.05 in the Chi-Square test were excluded. This primary approach not only reduced the dimensionality of the modeling dataset but also addressed multicollinearity among the predictive features. Second, the Recursive Feature Elimination (RFE) method was used to identify the most informative features for subsequent model training step. This wrapper method iteratively eliminates features based on their importance or coefficients within subsets and builds models with the remaining features. It supports all kinds of models. While computationally more intensive than filter methods, RFE can provide superior results by accounting for feature interactions. RFE was implemented with four models using the ref() function from the caret R package (version 7.0-1), with 10-fold cross-validation as the control function. From each estimator, the top 10 most important features were selected, and the final set of informative features was derived from the union of these selected features.

2.3.5. Model Training and Evaluation

This study employed three tree-based supervised machine learning models—Recursive Partitioning (RP), Random Forests (RF), and Extreme Gradient Boosting (XGBoost)—along with one nonparametric model, the Support Vector Machine (SVM).

Four classifiers were trained using the identified set of significant features. To evaluate and determine the optimal machine learning model for predicting land clearance areas, a grid search procedure was employed for hyperparameter optimization in conjunction with cross-validation. To address the issue of class imbalance in the modeling dataset, a sampling strategy was implemented. Specifically, for each POI, rows corresponding to a “1:Yes” outcome, along with four preceding and four succeeding rows, were selected. This approach reduced the dataset from 37 rows to nine rows per POI while preserving the seasonal characteristics of bare soil and vegetation profiles around the T₀ year. Furthermore, rows containing missing values were excluded, as the RF and SVM algorithms are incapable of processing datasets with missing values without prior imputation. However, in this study, missing values were not imputed with the feature median, as the missing data pattern was determined to be non-random [18]. The records with missing values were found in a structured pattern when T₀ occurred in the years 1988, 1989, and from 2019 to 2024.

The sampled dataset was stratified by T₀ year and outcome classes, then split into training and testing subsets using an 80:20 ratio. This approach ensured proportional representation of land clearance events across the spectral index time series, minimized bias due to unequal class distribution, and preserved sequence structured of data in both training and testing sets. Subsequently, the hyperparameter grid for each classifier was defined, and a grid search was conducted using 10-fold cross-validation to identify the optimal set of hyperparameters.

Following hyperparameter optimization, the performance of each best model was evaluated on the testing dataset to find the suitable predictive model for land clearance area identification in the time series of spectral indices imagery. Although the dataset was sampled by rows to mitigate the effects of class imbalance, it remained inherently imbalanced. Consequently, relying on traditional accuracy metrics could produce misleading results. To address this, the F1-Score, Matthews Correlation Coefficient (MCC), and Cohen’s Kappa Coefficient (K) were employed as evaluation metrics. The F1-Score, representing the harmonic mean of precision and recall, is defined as follows:

F 1 = 2 \times \frac{TP}{TP + FP + FN},

(2)

where TP represents true positives, FP false positives, TN true negatives, and FN false negatives, as derived from the 2 × 2 confusion matrix.

The Matthews Correlation Coefficient (MCC) is a robust metric that remains balanced even in scenarios with highly imbalanced class distributions. It offers significant advantages over other metrics, such as the F1-score and accuracy, particularly when evaluating binary classifications in imbalanced datasets [27]. It is defined as follows:

MCC = \frac{TP \times TN - FP \times FN}{\sqrt{(TP + FP) (TP + FN) (TN + FP) (TN + FN)}} .

(3)

The kappa coefficient accounts for agreement due to chance, making it a more suitable metric for evaluating binary classifiers with imbalanced classes, as it is symmetric and evaluates both directions of predictability [28]. Cohen’s kappa for binary classification is calculated as follows [29]:

K = \frac{2 \times (TP \times TN - FP \times FN)}{(TP + FP) \times (FP + TN) + (TP + FN) \times (FN + TN)} .

(4)

2.3.6. Prediction of Land Clearance Area and Estimating Age of Rubber Plantation

After the best model was obtained, we generated stacked images of the chosen informative features resulting from the feature selection process for each year. At the pixel level, the trained model was applied to identify land clearance in the stacked image, with output pixels coded as “1” for detected clearance and “0” or “NA” for no clearance. In the final annual output images, each “1” pixel was labeled with the corresponding year, and the images were sequentially stacked by year. The timing of land clearance for each pixel was determined based on the maximum pixel value in the annual series, representing the latest occurrence.

Finally, a post-classification smoothing was applied using a 9 × 9-pixel majority filter, wherein the year label for each pixel was replaced by the most frequently occurring year within the window, provided the frequency threshold exceeded five. This criterion helps to preserve small plantation areas and refine the boundaries of heterogeneous plantation regions. The resulting output was a smoothed image displaying the year of land clearance for each pixel. Finally, we overlaid the rubber cultivation polygons from LULC (2018) data to mask the output image, producing a stand age map for rubber plantations.

2.4. Accuracy Assessment

To evaluate the accuracy of stand-age predictions, we utilized 53,789 rubber farmer registration spatial database obtained from RAOT. The latitude and longitude at the center of each rubber plantation were used to generate POIs. Each validated POI included the planting year of the rubber trees, representing the actual year of land clearance, along with the type of cultivated area (PL and HS). In the final stand-age map, predicted land clearance years were sampled at each POI location. The Pearson correlation coefficient (r) and the Kappa coefficient between the actual and predicted T₀ years across all POIs were computed. Additionally, we calculated the mean and standard deviation of the predicted T₀ years, grouped by the actual year, and the overall Root Mean Squared Error (RMSE) to assess the error in the predicted output [6,18].

To assess the spatial accuracy of individual rubber plantation areas, we measured the similarity between two polygons: one representing the individual plantation polygon obtained from the RAOT farmer registration database and the other extracted from a cluster of homogeneous pixels (i.e., pixels with the same predicted T₀ year) in the final stand-age map. The Intersection over Union (IoU), also known as the Jaccard Index [30], was used as a similarity metric to compare two polygons with different areas but overlapping regions. The IoU formula is defined as follows:

IoU = \frac{A_{I}}{A_{U}},

(5)

where A_I represents the overlapping area between the two polygons (area of intersection) and A_U represents the total area covered by both polygons without counting the overlap twice (area of union).

3. Results

3.1. Time Series Analysis of Distribution Summary BSI Values

Figure 3a,b illustrates the interannual distribution summary of NDVI and BSI values from a sample rubber plantation within the plainland (PL) cultivation area. This example represents a re-planting scenario for rubber plantations. The interannual profile reveals two distinct patterns of land clearance events, occurring in 1995 and 2014. The first pattern, as named Type1, illustrates the starting point of the increasing trend of NDVI values from around 0.2–0.4 in previous years of 1995 to around 0.8–0.9 in six years after 1995. This characteristic of NDVI indicates the land use conversion from paddy fields (active or abandoned) or annual cropland to perennial tree cultivation such as rubber or oil palm as usually found in the area. This Type1 pattern of interannual profile is mostly found before the year 2000. The second pattern, named Type2, indicates a significant increase in BSI values from approximately −0.6 to around zero or above where dense vegetation cover was cleared, and the greenness gradually recovered for around 6 years after the T₀. This happened in the area where rubber trees were re-planted or a change in degraded forest land or rubber plantation has occurred. The patterns of NDVI and BSI values generally show the opposite trend. The increase in BSI to positive value indicates an exposure of bare soil due to land clearance, which is consistent with a fall of NDVI value due to loss of vegetation cover. This pattern aligns with the typical 20-year productive lifespan of latex-yielding rubber plantations, as recommended by various agricultural agencies. Rubber trees beyond this age generally experience a decline in productivity. However, certain farmers cultivating rubberwood varieties may opt to harvest trees as early as 15 years post-planting. Consequently, re-planting intervals may vary between 15 and 25 years, depending on the rubber tree variety (latex-producing or wood-only varieties) and market demand for rubber timber.

The data pattern in Figure 3a,b illustrates how the year of land clearance can be determined by examining NDVI values below 0.4 and BSI values exceeding zero. Moreover, we can identify T₀ by observing the differences between values at T₀ and those in preceding and subsequent years. Figure 3c depicts the time series of differential Q3 BSI values at T₀, compared to one year prior (labeled as diff.BSI.Q3.t0tn1) and differential Q1 NDVI values at T₀, compared to six years post T₀ (labeled as diff.NDVI.Q1.t0tp6). Notably, the value of diff.BSI.Q3.t0tn1 in the year 2015 was markedly elevated, reaching approximately 0.6. This suggests that diff.Q3.t0tn1 serves as a more reliable predictor for the Type 2 pattern, as it exhibits a peak exclusively at T₀ in the context of rubber replanting events. Moreover, the value of diff.NDVI.Q1.t0tp6 in 2015 dropped to its minimum; however, this decrease was not significantly different from the values observed between 2016 and 2018.

Similarly, in the case of the Type 1 pattern, the value of diff.NDVI.Q1.t0tp6 reached its minimum in 1995, among the values recorded from 1993 to 2001. Furthermore, diff.NDVI.Q1.t0tp6 remained lower for several years following T₀. Consequently, identifying T₀ for the Type 1 pattern may require a combination of multiple generated features to improve accuracy. Furthermore, the diff.NDVI.Q1.t0tp6 feature is limited in its predictive capability for T₀ in plantations established post-2017. Conversely, features derived from prior-year values relative to T₀, such as diff.BSI.Q3.t0tn1 or diff.BSI.Q3.t0tn2, demonstrate predictive utility primarily within the first one- or two-years post-planting. Utilizing a combination of features derived from the distribution summary of NDVI and BSI values and the difference between those values at different times could be used to find the year of clearance in the time series.

Figure 4 presents the interannual Q3 BSI values from samples of 50 rubber plantations in both plainland (PL) and hillside (HS) cultivation areas. At year T₀, most Q3 BSI values from the PL cultivation area exceeded zero, while those from the HS area were predominantly sub-zero. This observation suggests that using a zero BSI threshold as a global cutoff may not be appropriate for this study, as it could lead to under-classification of land clearance areas in hillside plantations. The differences in median values highlight the magnitude of change over time, with statistically significant differences observed in years T₋₂, T₋₁, and T₊₆. Additionally, we found that utilizing the maximum or minimum annual NDVI and BSI composite values may be unsuitable due to the high variability introduced by cloud cover in the Landsat imagery, which results in heterogeneous spectral index values among neighboring pixels, as illustrated in Figure 5.

The maximum and minimum composite BSI images (Figure 5a,e) exhibit high-frequency spatial noise stemming from a limited number of pixel values in certain areas. In contrast, the Q3, median, and Q1 BSI images (Figure 5b–d) are unaffected by cloud masking and invalid (NA) values from SLC errors in Landsat 7 images, and they display a more homogeneous texture than the maximum BSI image. Consequently, the difference image of maximum BSI values between 2003 and 2004 (Figure 5f) retains heterogeneity in its texture, and the difference image of minimum BSI values (Figure 5j) fails to capture numerous areas of land clearance in 2004. We also found that the year-to-year difference in Q1 NDVI values exhibited similar characteristics to the year-to-year difference in Q3 BSI values. Based on these observations, we utilized only Q3 BSI values and Q1 NDVI values to derive features for model training. This approach initially yielded 42 predictive features, as detailed in Table A1.

Furthermore, we have decided to generate three training datasets: (1) an interannual profile that captures the spectral characteristics of land clearance events in the context of agricultural land use conversion (referred as Type1), (2) an interannual profile that captures rubber replanting activities in flatland areas (referred as Type2_PL), and (3) an interannual profile that captures the transition of degraded forest land to rubber cultivation or rubber replanting in hillside areas (referred as Type2_HS).

3.2. Feature Importance and Hyperparameter Tuning

After determining Pearson’s correlation among the 42 features and assessing the relationship between the target variable and each feature using the Chi-Square test, we observed a high correlation (>0.8) among features, particularly those derived from index values of consecutive or closely spaced years. Furthermore, features computed from the ratio of different index values at T₀ and T₊₃, as well as T₀ and T₊₆, exhibited minimal correlation with the target variable, as p-values of their Chi-Square test exceeded the 95% statistical significance threshold. We also found that features derived from Q1 NDVI and Q3 BSI were strongly correlated, as the two interannual profiles exhibited inversely symmetrical patterns. Consequently, we decided to omit all ratio-based features and those derived from years T₊₂, T₊₄ and T₊₅ to reduce feature correlation while preserving the structure of the interannual profile in the training dataset. There were 26 features remaining before RFE process.

Since we generated three training datasets to capture different patterns of interannual variation, we conducted RFE separately for each dataset. For each dataset, the top 10 features were ranked based on feature importance scores from four models, and only the unique features among the 40 selected features were used to define the final feature set for each training dataset, as summarized in Table A2. However, to maintain the predictive capability of the Type2 models, which are intended to identify land clearance events that occurred in the last six years of the study period, we decided to exclude features derived from year T+6 from the final feature sets for the Type2_PL and Type2_HS datasets. This adjustment ensures that the Type2 models can effectively predict land clearance occurrences up to the year 2021. Finally, the Type1 dataset contained 11 features with 18,317 observations, the Type2_HS dataset contained 12 features with 5274 rows, and the Type2_PL dataset contained 13 features with 36,769 records. These datasets were used for future hyperparameter tuning and optimal model selection.

Figure 6 presents the evaluation results of the four best models for each training dataset. The optimal models, selected based on the grid search results, were chosen according to their optimal hyperparameters. The definition of the hyperparameter grid is provided in Table A3, while the optimal sets of hyperparameters for the four models corresponding to each training dataset are detailed in Table A4.

In general, there was no significant difference in model performance across different datasets when considering the F1-score. However, the XGBoost model exhibited overfitting on the Type1 dataset, where the Kappa and MCC values reached 1 on the training dataset but dropped to 0.69 on the testing dataset. Similarly, SVM models demonstrated severe overfitting on the Type2_HS and Type2_PL datasets, where they failed to make correct predictions on the testing dataset. Moreover, the SVM model required significantly more computational time during training compared to other models, with training time increasing substantially as dataset size grew. Specifically, the SVM model took 270.83 s to train on the Type2_PL dataset, whereas it required only 0.86 s for the Type1 dataset, which contained approximately half the number of observations. It is important to note that the model training was conducted on a High-Performance Computing (HPC) system, with 16 allocated cores of a 2.25 GHz AMD EPYC 7742 CPU and 32 GB DDR4 LRDIMM memory for this task.

Based on the performance evaluation results of the best models, the RP and RF models were identified as suitable candidates for all three datasets. However, the RP model was selected for further analysis using the optimal set of hyperparameters. A total of 100 predictive models were built to predict land clearance areas in the BSI and NDVI time series images. To ensure diversity in decision trees, the training data were randomly selected for each predictive model. Following the training of 100 models, each trained model was applied to identify land clearance in the stacked images of the selected informative features for each year. A simple majority-vote ensemble method, requiring over 50% agreement, was employed to aggregate the predictions from the 100 models, ensuring robust and reliable results.

3.3. Map of Rubber Plantation Stand Age

Since three RP models with different sets of optimal hyperparameters were used to predict land clearance areas under different land use conversion types and rubber cultivation areas, the resulting images from each model’s predictions were aggregated using an OR logical operation. At the pixel level, a land clearance area was identified as a “1” pixel if a value of “1” was present in any of the predicted images and as a “0” pixel otherwise. Subsequently, each “1” pixel in the aggregated image was labeled with the corresponding year. Finally, the output images were sequentially stacked by year, and the latest occurrence of land clearance for each pixel was determined to generate the final map indicating the year of land clearance, representing the establishment of rubber plantations.

Figure 7a illustrates the annual age distribution of rubber plantations in Surat Thani province, based on predicted land clearance years through 2024. Plantation age is represented using a gradient of green tones—dark green for older plantations and light green for younger ones. The white areas within the provincial boundary indicate non-rubber cultivation land, including in the two enlarged regions shown in Figure 7b,c. The black areas denote pixels consistently predicted as “0” across all 37 years, comprising approximately 12.5% of rubber plantation areas where establishment year (or T₀) could not be determined. Of these, about 5.1% are in the irrigated plains and floodplains, often converted from former paddy fields, while 3.8% lie in mountainous or hilly terrain. Although hillside regions had the lowest proportion of underpredicted pixels, they represent approximately 29.8% of the total rubber cultivation area in the province.

Figure 7b highlights numerous individual plantation areas within the large ’A302’ polygons generated and used by the LDD. The LULC data were digitized based on satellite and aerial imagery to define land use and categorize land cover, without distinguishing unique rubber plantations. Our results not only map the annual stand age of plantations but also differentiate individual plantations established in different years. However, it may not be feasible to identify plantations belonging to individual farmers or to distinguish by land titles if adjacent lands began rubber cultivation simultaneously. Furthermore, our pixel-based approach effectively determines the age of small plantations with regular, geometric shapes aligned vertically and horizontally, as shown in Figure 7c. Figure 7d displays the large, amorphous shapes of plantation areas in hillside regions, where it is generally challenging to identify the starting year.

3.4. Accuracy Assessment of Rubber Plantation Age and Area Predictions

Although there were over 50,000 registered rubber farmers, not every record could be used in the final product accuracy assessment. It was observed that the shapes of rubber plantations were extracted from digital shapefiles of land titles owned by the farmers. There was a high possibility that multiple rubber cultivation patches existed within a single land title area. Most rubber farmers in Thailand are small stakeholders who typically replant rubber trees in half or a portion of their land while retaining productive rubber trees to sustain their income during the first 5 to 7 years, as they wait for the newly planted trees to begin latex production. In such cases, the area and year of new planting were not updated in the RAOT rubber farmer database. Therefore, only farmer registration records in which the plantation shapes covered at least 90% of the pixels and contained a single T₀ value across all covered pixels were utilized. Moreover, only records with a start year of planting between 1992 and 2022 were used. We also stratified the sampling records by year, selecting only eligible records to ensure a balanced and complete representation of plantation establishment years. Subsequently, the centroids of the 10,200 selected plantation shapes, serving as POIs, were generated along with their corresponding years of rubber plantation establishment. These centroids were then used to extract the predicted planting year from the final map of the Age of Rubber Plantations.

Figure 8a depicts a bubble plot that illustrates the actual and anticipated years obtained from each validating POI. It demonstrates the frequency and spatial distribution of accurate and inaccurate prediction of planation age. The shading color and size of the dots positioned along the diagonal indicate the frequency of accurate predictions, whereas the remaining dots represent incorrect predictions. The further the dots are positioned from the 1:1 line, the greater the degree of inaccuracy in the prediction. It should be noted that the POIs sampled for years of land clearance in 1989 and 1990 were not included in the accuracy analysis. This is because the features used to train the model were obtained from two years prior (T₋₂ and T₋₁). Hence, the prediction started from the third year in the time series. The Pearson’s correlation coefficient was 0.978, which suggests a solid linear correlation between actual and predicted T₀ years. Figure 8b depicts the means and their error bars of the predicted T₀ years and the fitted line. Additionally, the linear regression equation is provided. The adjusted R-squared of the fitted regression model was 0.968, indicating a strong fit. Additionally, the RMSE calculated to be 1.86 years. The overall accuracy of T₀ prediction was 84.5%, whereas the underestimated (“NA”) prediction was 4.0%, as detailed in Table 1.

The prediction of plantation establishment year in this study did not outperform our previous method in terms of RMSE, as reported in [18]. This can be attributed to the increased complexity of applying the new pixel-based approach at a regional scale. Variations in localized phenological phases—such as earlier defoliation in northern plantations compared to southern ones—introduce temporal inconsistencies. Moreover, land clearance in Surat Thani occurs throughout the year due to continuous rainfall influenced by moist air from the Gulf of Thailand. This deviates from the typical dry-season clearance pattern observed elsewhere in Thailand, making it harder to detect establishment events from satellite imagery. Furthermore, methodological differences between the current and previous studies also contributed to the observed discrepancy. These include variations in how training data were collected and how prediction results were evaluated, which affect model performance and comparability.

Figure 8c illustrates the mean IoU values for plantations within one-hectare size intervals. The error bars are the standard deviation (SD) representing dispersion of data points around the mean. In general, larger plantations tend to exhibit higher similarity coefficients between the predicted and actual plantation areas. Most plantations in the study area are relatively small, with an average size of 3.7 hectares, leading to greater variability in the similarity between individual plantation areas and their predicted counterparts. However, the overall mean IoU was 0.63 (SD = 0.18), and most plantations included in the spatial accuracy assessment achieved an IoU greater than 0.5, indicating a significant overlap between the actual and predicted polygons.

4. Discussion

4.1. Model Overfitting and Suitability

As demonstrated in the model performance comparison, the SVM model exhibited severe overfitting, achieving 100% accuracy on the training dataset while failing to make a single correct prediction on the testing dataset for the Type2_HS and Type2_PL data patterns. Even with grid search and 10-fold cross-validation, the SVM model remained prone to overfitting under certain conditions [31].

A key factor in this issue is the C parameter in SVM, which balances margin maximization and misclassification minimization. If grid search selects an excessively high C value, the model prioritizes minimizing training errors, making it overly sensitive to noise and variations, leading to poor generalization. Additionally, the choice of kernel function significantly impacts SVM performance. An inappropriate kernel, such as an RBF kernel with an extremely small gamma, may cause the model to overfit by capturing noise rather than underlying patterns. While grid search explores multiple kernel options, an ill-defined search space may still result in suboptimal kernel selection.

To mitigate grid search bias, refining the hyperparameter search space can improve regularization, ensuring an optimal balance between model complexity and generalization. Alternative approaches, such as Bayesian Optimization or Random Search, may sometimes yield better results than grid search.

Feature scaling may have also contributed to SVM overfitting in this study, as the differences in NDVI and BSI values were not normalized, unlike NDVI and BSI themselves. Since SVM is sensitive to feature scaling, improper standardization may cause the optimization process to overemphasize certain features [32]. Furthermore, high correlations among features can exacerbate overfitting if not properly addressed.

Although prediction accuracy and processing time for RP, RF, and XGBoost were not significantly different across datasets, XGBoost required tremendous computational time during grid search with 10-fold cross-validation, likely due to the extensive number of hyperparameters involved in model training. In contrast, RP was the fastest during hyperparameter optimization, outperforming RF, despite having 68 additional hyperparameter settings. This efficiency arises from RP constructing a single decision tree, whereas RF builds up to 300 trees in the ensemble.

For the final prediction, 100 RP models were chosen over RF models due to RP’s ability to handle missing values in both training and prediction datasets. Although missing values could be excluded from the training dataset, gaps in interannual NDVI and BSI images—caused by persistent cloud cover—necessitated missing value handling during prediction. Unlike RF, which requires explicit missing value handling, RP accommodates missing data directly. Without proper handling, RF predictions could be compromised, necessitating spatial interpolation to fill missing pixels using neighboring values.

4.2. Error Analysis of Prediction Uncertainty

Following the accuracy assessment, we conducted an error analysis on each case of inaccurate and underestimated predictions. In general, three primary factors may affect the accuracy of the prediction results, as discussed below.

4.2.1. Impact of Image Availability and Cloud Cover on Prediction Accuracy

Landsat image availability significantly influences the composition and quality of annual composites, essential for environmental monitoring, change detection, and land cover analysis. Over the 37-year study period, 3406 Landsat images from four scenes (path 129–130, row 053–054) were utilized, with each sensor capturing reflected and emitted energy at a bi-monthly temporal resolution.

Annual image counts varied, ranging from 29 images in 1992 (L5 images only) to 167 in 2020 (81 L7 and 86 L8 images), with a mean of 92.3 images per year. Between 1988 and 1998, availability remained below 100 images per year due to reliance on L5. In later years, multiple Landsat platforms were combined; however, certain years still had fewer than 100 images, primarily due to the lack of L5 archival data (2002–2003) caused by orbit decay [33] and the transition between L5 failures and L8 launch (mid-2012–February 2013). Limited temporal frequency and radiometric consistency may affect data quality in annual composites [34], particularly for biomass estimation, which relies on consistent imagery [35].

At the pixel level, the number of non-missing spectral values rarely reached the maximum due to cloud cover. Notably, 42.3% of images used had more than 50% cloud cover over the tropical study area. While a larger volume of images increases the likelihood of obtaining cloud-free pixels, areas with fewer such images may rely on pixels affected by partial cloud cover or shadows, reducing composite clarity and accuracy. Image availability is critical for minimizing cloud cover impacts, particularly in cloud-prone regions [36]. To mitigate cloud effects and scan line gap errors in L7 images, maximum BSI composite values were excluded. However, some Q3 composites after 2003 still exhibited continuous low-BSI value lines in certain areas. Variability in image availability, compounded by cloud cover, contributes to fluctuations in the interannual BSI profile, affecting the model’s prediction accuracy.

4.2.2. Impact of Land Clearance Timing on the Annual Distribution of Spectral Indices

From the accuracy assessment, 69.3% of mispredictions were attributed to one-year-lag errors. As shown in Figure 9, the BSI time series from two distinct plainland rubber plantations indicated that in the 25th season (May 2012–April 2013), only two BSI values were available from three composite images—acquired in October 2012 and April 2013. The highest BSI value (−0.183) corresponded to the defoliation period, when the ground was more exposed, and was subsequently assigned as Q3 BSI.

When calculating Q3 BSI differences, the values for 2013–2012 (0.311) and 2012–2011 (0.285) all fell into the “1:Yes” category, leading to incorrect T₀ predictions. This issue is not solely limited to years with insufficient imagery but also occurs in cases of frequent cloud cover, which results in missing BSI values at certain locations, further complicating prediction accuracy.

Another misprediction scenario, influenced by cloud cover and T₀ timing, is shown in Figure 9b. In this case, T₀ was predicted one year late because high BSI values (indicating soil exposure) appeared at the end of the annual period, while the first ten months had predominantly low BSI values. Since the highest BSI values were treated as upper distribution limits, the Q3 BSI for the observed T₀ year was lower than expected but increased in the following year. Land clearance often occurs mid-year, and excessive cloud cover in subsequent images can obscure the actual T₀ period, leading to erroneous predictions.

Cloud contamination in optical satellite imagery remains a major challenge for land cover classification at both regional and global scales [37,38,39]. A common mitigation strategy is composite image generation, integrating data from multiple time periods to fill gaps caused by cloud and shadow regions [39]. Our approach aggregated BSI images over a one-year period, training the decision-tree model using quartile values from the composite images. The RP algorithm consistently selected Q3 BSI as the most informative feature, maximizing classification accuracy. However, quartile values depend on data density and distribution, and a limited number of BSI values can introduce variability, distorting temporal patterns in the time series.

The main reason we included all available images—rather than limiting the data to the dry season, as in our previous smaller-scale study [18]—was due to the greater variation in land clearance timing observed across the larger study area. In smaller, localized areas, land preparation tends to occur during the dry season, when soil conditions are suitable for heavy machinery, and it is ideal for replanting before the rainy season. However, in Surat Thani, land clearance can occur throughout the year due to its tropical climate and year-round rainfall.

To increase the likelihood of capturing the timing of BSI increases associated with land clearance, we opted to use all available imagery. Nonetheless, observing land clearance during the rainy season remains difficult due to frequent and severe cloud cover. Since we define our annual observation period from October (the start of the dry season) to September of the following year, land clearances that occur during the high-cloud-cover months (typically the last four months of the annual period) may be poorly represented in the imagery.

As a result, the limited number of observable BSI increases during this period often appear as outliers in the annual BSI distribution, potentially leading to inaccurate T₀ predictions, as discussed above. One possible solution to this issue is to divide the annual BSI distribution into two or three sub-periods based on local or regional climatic patterns, which may help reduce the effect of seasonal cloud cover which resulting small number of BSI values on prediction accuracy.

4.2.3. Pre-Conversion Land Use and Rubber Cultivation Area

The BSI time series reveals a distinct shift and subsequent trend in BSI values before and after T₀, as shown in Figure 4. This study analyzed annual BSI patterns, focusing on land clearance events and the growth trends of rubber plantations in plainland and hillside areas. The examined cases included both new plantings on degraded forest land and replanting on former rubber or perennial crop areas. However, error analysis indicated that pre-conversion land use significantly influenced BSI trends.

Most rubber plantations in Surat Thani were established on plainland, often converted from other agricultural land uses. According to LULC data (LDD 2007, 2018), 28.6% of cropland, particularly paddy fields, was converted into oil palm and rubber plantations, with oil palm being the predominant replacement due to its compatibility with the flooded conditions of irrigated rice fields. Although rubber trees can be cultivated in former paddy fields, additional drainage infrastructure is required during site preparation.

In rain-fed rice farming, land preparation typically begins in the early wet season, leading to BSI fluctuations based on the rice cultivation stage. BSI values peak during the land preparation phase and about one-month post-harvest, while remaining lower during other periods. Consequently, prior rice farming activities altered the BSI time series, as illustrated in Figure 10a. The interquartile ranges (IQRs) of BSI values were wider before and at T₀ but progressively narrowed afterward. Additionally, the Q3 and median BSI values declined post-T₀, reflecting rubber tree growth, while Q1 remained relatively stable—a trend consistent with other BSI time series.

Figure 10b shows an example from a plantation converted from an abandoned paddy field, where IQRs were highly variable up to four years before T₀, reflecting active rice production. The IQRs stabilized four years before T₀, indicating a prolonged abandonment period before rubber tree establishment. However, the duration of stable quartile values before T₀ varies depending on how long a field remained unused before conversion.

According to the accuracy assessment, pre-conversion land use was responsible for 66.6% of “NA” predictions, mainly due to the undersized training dataset containing these distinct data patterns. The RP algorithm failed to recognize certain features, such as diff.Q3.t0tn1, which captures abrupt BSI changes during land clearance. Enhancing prediction accuracy may require increasing POIs from former paddy fields and incorporating quartile-based trend features for up to six years post-T₀. However, relying on post-T₀ quartile values limits the model’s ability to predict recent land clearance over the past six years. Therefore, separate models for different NDVI and BSI data patterns, as demonstrated in this study, should be considered to improve classification precision across diverse land use histories.

4.3. Underestimated Predictions

Most incorrectly defined land clearance areas were associated with small-scale rubber plantations. In Thailand, rubber smallholders are defined as farmers owning less than 40 hectares of rubber land [2], though most hold less than 4 hectares [40]. In 2012, Thailand had 5.9 million small-scale rubber plantations, averaging 4.04 hectares in size, with 70% of farming households owning less than 4.8 hectares. Additionally, 26.5% of rubber farmers operated on less than 1.6 hectares. The decline in medium- to large-scale farms (>8 ha) is driven by industrialization, urbanization, land policies, property rights, labor market dynamics, and generational land inheritance divisions [41,42,43]. This trend extends to the natural rubber sector, where 2021 vector data from the Department of Lands and RAOT indicated an average landholding size of 2.3 hectares (median 1.5 hectares) across 55,886 land titles. Furthermore, many medium-to-large parcels were subdivided, resulting in rubber trees being planted at different time periods.

The heterogeneity of BSI time series is influenced by the size and geometric shape of plantations. BSI values in small plantations exhibit greater instability due to spectral mixing from neighboring pixels, particularly in irregularly shaped plantations. Figure 7b demonstrates that T₀ determination is often unreliable in narrow plantations (<60 m width, equivalent to two Landsat pixels) or small plantations near settlements. However, T₀ prediction is feasible for small plantations with symmetrical, well-aligned geometries.

Post-classification analysis revealed that many “NA” predictions were concentrated at the boundaries between adjacent plantations, likely due to POI selection bias favoring large plantations while excluding edge pixels. Consequently, the training dataset lacked samples from small plantations, preventing the model from learning mixed spectral features. However, applying a majority filter for pixel-based smoothing significantly reduced the number of “NA” pixels.

To improve the spatial accuracy of predicted plantation areas and address the rubber cultivation areas with unknown T₀, incorporating higher-resolution multispectral imagery could enhance classification accuracy. Potential datasets include (1) 15 m pan-sharpened Landsat imagery (available since April 1999) and (2) 10 m Sentinel-2 imagery (available since June 2015). However, using higher-resolution imagery requires extensive pre-processing and pixel-based prediction, increasing computational time. Additionally, plantations established before the availability of these datasets would be excluded from the analysis.

Additionally, the location of the cultivation area is a key factor contributing to underprediction. The results show that approximately 29.8% of rubber cultivation areas in hillside regions could not be assigned a T₀. This is primarily due to the unclear distinction between bare soil and vegetation index signals in their interannual profile. Since good agricultural practices (GAP) for rubber plantations differ between highly sloped and lowland areas, lowland plantations are typically cleared to bare land, while highland plantations are terraced along contours with 2 to 4 meter-wide tracks, depending on slope steepness [18]. Some native vegetation is retained between tracks to prevent soil erosion, resulting in highland plantations having a mix of sparse vegetation and bare soil.

To address this limitation, incorporating the Modified Bare Soil Index (MBI), Normalized Difference Bare Soil Index (NDBSI), Soil-Adjusted Vegetation Index (SAVI), and Normalized Difference Soil Index (NDSI), along with proper feature engineering and selection, may improve underprediction issues. MBI accounts for vegetation influence by incorporating the blue band, which is more sensitive to vegetation cover, reducing its impact on soil detection [44]. NDBSI, on the other hand, utilizes a normalized difference calculation to enhance soil detection accuracy in urban and rural areas [45]. In areas with sparse vegetation, SAVI can be applied to compensate for soil background effects, adjusting NDVI to correct for soil brightness in low vegetation cover areas. Additionally, NDSI may be used to improve soil detection by minimizing vegetation interference. Training a machine learning model to predict the year of land clearance in hillside plantations may require spectral indices that more effectively distinguish bare soil from vegetation and other land cover types.

5. Conclusions

This study demonstrates the potential of remote sensing data and machine learning algorithms in accurately classifying and predicting the age of rubber plantations at an annual resolution. By utilizing dense-temporal satellite imagery and advanced classification techniques, we generated a high-resolution age map of rubber plantations, offering a valuable tool for forest inventory and monitoring. The integration of remote sensing with machine learning not only enhances the precision of age estimation but also offers a scalable approach for continuous plantation assessment.

In this study, the Recursive Partition (RP) model was selected over other machine learning models due to its balance between simplicity and predictive performance. The RP model provided a reliable level of accuracy while maintaining interpretability and computational efficiency, making it well-suited for large-scale remote sensing applications. Future research could focus on integrating higher-resolution remote sensing datasets such as Sentinel-2 to further enhance age classification accuracy and improve biomass estimation. Additionally, expanding this approach to incorporate spatially explicit growth models and ecosystem service valuation could provide a more comprehensive understanding of the role of rubber plantations in carbon sequestration and climate mitigation efforts. The findings of this study highlight the value of remote sensing and machine learning in advancing large-scale forest monitoring, particularly in managed plantation systems.

By enabling high-resolution, annual-scale mapping of rubber plantation establishment years, the proposed methodology supports more accurate estimation of above-ground biomass and carbon stocks when combined with species-specific growth models and allometric equations. While tree age is not the sole determinant of carbon sequestration potential, it serves as a critical input in biomass estimation under known site conditions. The resulting age and area maps can also serve as inputs for site index models [46] and other forest productivity assessments [47]. This contributes to improving national greenhouse gas inventories, climate change mitigation strategies, and sustainable forest management practices. Furthermore, the methodology developed in this research can be adapted for other economic plantation forests, such as oil palm, teak, and eucalyptus, facilitating broader applications in regional-scale forestry, carbon stock assessments, and land use planning.

Author Contributions

Conceptualization, S.W. and N.W.; methodology, N.W.; software, S.J.; validation, S.W., M.S. and S.J.; formal analysis, N.W.; investigation, S.W.; resources, S.W. and M.S.; data curation, M.S. and S.J.; writing—original draft preparation, M.S. and N.W.; writing—review and editing, S.W.; visualization, S.J.; supervision, S.W.; project administration, S.W.; funding acquisition, S.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the Thammasat University Research Fund (Contract No. TUFT 046/2563) and the Research Scholar Program of the Rubber Authority of Thailand (Contract No. 034/2564).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

This research was supported by the Erawan HPC Project at the Information Technology Service Center (ITSC), Chiang Mai University, Thailand. The authors would like to express their gratitude to the Rubber Authority of Thailand for providing data to support the final product validation.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study, in the collection, analysis, or interpretation of data, in the writing of the manuscript, or in the decision to publish the results.

Appendix A

Table A1. Definitions of the 42 Generated NDVI and BSI Features.

Index	Feature	Description
NDVI	NDVI.Q1.tn2, NDVI. Q1.tn1, NDVI. Q1.t0, NDVI.Q1.tp1, NDVI.Q1.tp2, NDVI.Q1.tp3, NDVI.Q1.tp4, NDVI.Q1.tp5, NDVI.Q1.tp6	The Q1 of NDVI at year T₋₂ to T₊₆
	diff.NDVI.Q1t0tn2, diff.NDVI.Q1.t0tn1, diff.NDVI.Q1.t0tp1, diff.NDVI.Q1.t0tp2, diff.NDVI.Q1.t0tp3, diff.NDVI.Q1.t0tp4, diff.NDVI.Q1.t0tp5, diff.NDVI.Q1.t0tp6	Difference in the Q1 NDVI between year T₀ and two years before and six years after year T₀
	diff.NDVI.Q1.tn1tp6	Difference in the Q1 NDVI between year T_-1 and year T₊₆
	diff.NDVI.Q1.tp3tp6	Difference in the Q1 NDVI between year T₊₃ and year T₊₆
	ratio.NDVI.Q1.t0p3.p3p6	Ratio between diff.NDVI.Q1.t0p3 and diff.NDVI.Q1.p3p6
	ratio.NDVI.Q1.p3p6.t0p6	Ratio between diff.NDVI.Q1.p3p6 and diff.NDVI.Q1.t0p6
BSI	Similar to NDVI, but replace NDVI with BSI and Q1 with Q3

Table A2. Final feature sets for the three training datasets.

	Dataset
	Type1	Type2_HS	Type2_PL
1	BSI.Q3.t0	diff.BSI.Q3.t0n1	diff.BSI.Q3.t0n1
2	diff.BSI.Q3.t0n1	BSI.Q3.n1	diff.NDVI.Q1.t0n1
3	diff.NDVI.Q1.t0n1	diff.NDVI.Q1.t0n1	BSI.Q3.n1
4	diff.NDVI.Q1.t0n2	diff.BSI.Q3.t0n2	diff.BSI.Q3.t0p1
5	diff.BSI.Q3.t0p6	diff.NDVI.Q1.t0n2	diff.BSI.Q3.t0n2
6	diff.BSI.Q3.t0n2	NDVI.Q1.n1	NDVI.Q1.n1
7	diff.NDVI.Q1.t0p3	BSI.Q3.t0	BSI.Q3.t0
8	diff.BSI.Q3.t0p1	diff.BSI.Q3.n1.p6 *	BSI.Q3.n2
9	NDVI.Q1.t0	NDVI.Q1.n2	diff.BSI.Q3.n1.p6 *
10	diff.BSI.Q3.t0p3	BSI.Q3.n2	diff.NDVI.Q1.t0p1
11	diff.NDVI.Q1.t0p6	diff.NDVI.Q1.t0p6 *	diff.NDVI.Q1.t0n2
12		NDVI.Q1.t0	diff.BSI.Q3.t0p3
13		diff.BSI.Q3.t0p6 *	diff.NDVI.Q1.t0p6 *
14		diff.NDVI.Q1.t0p3	NDVI.Q1.t0
15		diff.BSI.Q3.t0p3	diff.BSI.Q3.t0p6 *
16			diff.NDVI.Q1.t0p3

* Indicates excluded features.

Table A3. Definition of hyperparameter grids.

Model	Grid of Hyperparameter	Description
RP	cp = [0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05, 0.1]	Complexity Parameter: Controls pruning
	maxdepth = [2, 4, 6, 8, 10]	Maximum depth of the tree
	minsplit = [2, 5, 10, 20]	Minimum number of observations required to split
RF	mtry = [2, 3, 4, 5, 7, 10]	Number of variables randomly sampled per split
	ntree = [100, 200, 300]	Number of decision trees built in the forest model
	maxnodes = [5, 10, 20, 30]	Maximum number of terminal nodes
XGBoost	nrounds = [100, 300, 500]	Number of boosting iterations (trees)
	max_depth = [2, 3, 6, 9]	Maximum depth of trees
	eta = [0.01, 0.05, 0.1, 0.3, 0.5]	Learning rate: controls step size of updates
	gamma = [0, 1, 5]	Minimum loss reduction required to make a split
	colsample_bytree = [0.5, 0.7, 1]	Fraction of features randomly selected per tree
	subsample = [0.5, 0.8, 1]	Fraction of training samples per boosting round
	min_child_weight = [1, 5, 10]	Minimum weight sum of leaf node split
	max_delta_step = [0, 1, 3, 5]	Helps prevent large updates, useful for imbalanced datasets
SVM	method = [“svmRadial”, “svmLinear”, “svmPoly”, “svmSigmoid”]	SVM Kernel: Radial Basis Function (RBF), Linear, Polynomial, and Sigmoid models
	C = [0.1, 0.5, 1, 5, 10]	Cost: regularization parameter (for radial, linear, polynomial, and sigmoid kernel)
	degree = [2, 3, 4]	Polynomial kernel degree (only for polynomial kernel)
	scale = [0.01, 0.1, 1]	Scaling factor degree (for polynomial kernel, gamma equivalent)
	sigma = [0.01, 0.1, 1]	Kernel coefficient that controls flexibility, lower values = more generalized model (only for radial kernel, gamma equivalent)
	gamma = [0.01, 0.1, 1]	Kernel coefficient that controls scaling of input features (for radial, sigmoid, polynomial kennel)
	coef0 = [0.1, 0.3, 0.5, 0.7, 0.9]	Regularization parameter

Table A4. Optimal set of hyperparameters for three training datasets.

Model	Hyperparameters	Dataset
Model	Hyperparameters	Type1	Type2_HS	Type2_PL
RP	cp	0.001	0.005	0.001
	maxdepth	8	6	8
	minsplit	5	10	5
	F1-score	0.954	0.985	0.986
RF	mtry	4	2	7
	ntree	100	200	100
	maxnodes	30	30	30
	F1-score	0.957	0.987	0.987
XGBoost	nrounds	500	300	300
	max_dept	9	6	6
	eta	0.1	0.01	0.1
	gamma	0	1	5
	colsample_bytree	1.0	0.5	0.7
	subsample	0.8	1.0	1.0
	min_child_weight	1	1	1
	max_delta_step	3	1	0
	F1-score	0.965	0.987	0.988
SVM	method	“svmRadial”	“svmRadial”	“svmRadial”
	C	5	5	10
	degree	-	-	-
	scale	-	-	-
	sigma	1	0.1	0.1
	gamma	-	-	-
	Coef1	-	-	-
	F1-score	0.968	0.987	0.988

References

Li, Z.; Fox, J.M. Mapping Rubber Tree Growth in Mainland Southeast Asia Using Time-Series MODIS 250 m NDVI and Statistical Data. Appl. Geogr. 2012, 32, 420–432. [Google Scholar] [CrossRef]
Fox, J.; Castella, J.C. Expansion of Rubber (Hevea brasiliensis) in Mainland Southeast Asia: What Are the Prospects for Smallholders? J. Peasant Stud. 2013, 40, 155–170. [Google Scholar] [CrossRef]
Giambelluca, T.W.; Mudd, R.G.; Liu, W.; Ziegler, A.D.; Kobayashi, N.; Kumagai, T.O.; Miyazawa, Y.; Lim, T.K.; Huang, M.; Fox, J.; et al. Evapotranspiration of Rubber (Hevea brasiliensis) Cultivated at Two Plantation Sites in Southeast Asia. Water Resour. Res. 2016, 52, 660–679. [Google Scholar] [CrossRef]
Beckschäfer, P. Obtaining Rubber Plantation Age Information from Very Dense Landsat TM & ETM+ Time Series Data and Pixel-Based Image Compositing. Remote Sens. Environ. 2017, 196, 89–100. [Google Scholar] [CrossRef]
Chen, B.; Cao, J.; Wang, J.; Wu, Z.; Tao, Z.; Chen, J.; Yang, C.; Xie, G. Estimation of Rubber Stand Age in Typhoon and Chilling Injury Afflicted Area with Landsat TM Data: A Case Study in Hainan Island, China. For. Ecol. Manag. 2012, 274, 222–230. [Google Scholar] [CrossRef]
Chen, B.; Xiao, X.; Wu, Z.; Yun, T.; Kou, W.; Ye, H.; Lin, Q.; Doughty, R.; Dong, J.; Ma, J.; et al. Identifying Establishment Year and Pre-Conversion Land Cover of Rubber Plantations on Hainan Island, China Using Landsat Data During 1987–2015. Remote Sens. 2018, 10, 1240. [Google Scholar] [CrossRef]
Chen, G.; Thill, J.-C.; Anantsuksomsri, S.; Tontisirin, N.; Tao, R. Stand Age Estimation of Rubber (Hevea brasiliensis) Plantations Using an Integrated Pixel- and Object-Based Tree Growth Model and Annual Landsat Time Series. ISPRS J. Photogramm. Remote Sens. 2018, 144, 94–104. [Google Scholar] [CrossRef]
Razaq, M.; Huang, Q.; Wang, F.; Liu, C.; Gnanamoorthy, P.; Liu, C.; Tang, J. Carbon Stock Dynamics in Rubber Plantations Along an Elevational Gradient in Tropical China. Forests 2024, 15, 1933. [Google Scholar] [CrossRef]
Dong, J.; Xiao, X.; Chen, B.; Torbick, N.; Jin, C.; Zhang, G.; Biradar, C. Mapping Deciduous Rubber Plantations through Integration of PALSAR and Multi-Temporal Landsat Imagery. Remote Sens. Environ. 2013, 134, 392–402. [Google Scholar] [CrossRef]
Grogan, K.; Pflugmacher, D.; Hostert, P.; Kennedy, R.; Fensholt, R. Cross-Border Forest Disturbance and the Role of Natural Rubber in Mainland Southeast Asia Using Annual Landsat Time Series. Remote Sens. Environ. 2015, 169, 438–453. [Google Scholar] [CrossRef]
Trisasongko, B.H. Mapping Stand Age of Rubber Plantation Using ALOS-2 Polarimetric SAR Data. Eur. J. Remote Sens. 2017, 50, 64–76. [Google Scholar] [CrossRef]
Blaschke, T.; Hay, G.J.; Kelly, M.; Lang, S.; Hofmann, P.; Addink, E.; Feitosa, R.Q.; van der Meer, F.; van der Werff, H.; van Coillie, F.; et al. Geographic Object-Based Image Analysis—Towards a New Paradigm. ISPRS J. Photogramm. Remote Sens. 2014, 87, 180–191. [Google Scholar] [CrossRef]
Chen, G.; Weng, Q.; Hay, G.J.; He, Y. Geographic Object-Based Image Analysis (GEOBIA): Emerging Trends and Future Opportunities. GISci. Remote Sens. 2018, 55, 159–182. [Google Scholar] [CrossRef]
Chen, G.; Weng, Q. Special Issue: Remote Sensing of Our Changing Landscapes with Geographic Object-Based Image Analysis (GEOBIA). GISci. Remote Sens. 2018, 55, 155–158. [Google Scholar] [CrossRef]
Dong, J.; Xiao, X.; Sheldon, S.; Biradar, C.; Xie, G. Mapping Tropical Forests and Rubber Plantations in Complex Landscapes by Integrating PALSAR and MODIS Imagery. ISPRS J. Photogramm. Remote Sens. 2012, 74, 20–33. [Google Scholar] [CrossRef]
Fu, Y.; Tan, H.; Kou, W.; Xu, W.; Wang, H.; Lu, N. Estimation of Rubber Plantation Biomass Based on Variable Optimization from Sentinel-2 Remote Sensing Imagery. Forests 2024, 15, 900. [Google Scholar] [CrossRef]
Zhang, C.; Huang, C.; Li, H.; Liu, Q.; Li, J.; Bridhikitti, A.; Liu, G. Effect of Textural Features in Remote Sensed Data on Rubber Plantation Extraction at Different Levels of Spatial Resolution. Forests 2020, 11, 399. [Google Scholar] [CrossRef]
Somching, N.; Wongsai, S.; Wongsai, N.; Koedsin, W. Using Machine Learning Algorithm and Landsat Time Series to Identify Establishment Year of Para Rubber Plantations: A Case Study in Thalang District, Phuket Island, Thailand. Int. J. Remote Sens. 2020, 41, 9075–9100. [Google Scholar] [CrossRef]
Chen, B.; Yun, T.; Ma, J.; Kou, W.; Li, H.; Yang, C.; Xiao, X.; Zhang, X.; Sun, R.; Xie, G.; et al. High-Precision Stand Age Data Facilitate the Estimation of Rubber Plantation Biomass: A Case Study of Hainan Island, China. Remote Sens. 2020, 12, 3853. [Google Scholar] [CrossRef]
Kou, W.; Xiao, X.; Dong, J.; Gan, S.; Zhai, D.; Zhang, G.; Qin, Y.; Li, L. Mapping Deciduous Rubber Plantation Areas and Stand Ages with PALSAR and Landsat Images. Remote Sens. 2015, 7, 1048–1073. [Google Scholar] [CrossRef]
Pratama, L.D.Y.; Danoedoro, P. Above-Ground Carbon Stock Estimates of Rubber (Hevea brasiliensis) Using Sentinel-2A Imagery: A Case Study in Rubber Plantation of PTPN IX Kebun Getas and Kebun Ngobo, Semarang Regency. In IOP Conference Series: Earth and Environmental Science, Proceedings of the Fifth International Conferences of Indonesian Society for Remote Sensing, West Java, Indonesia, 17–20 September 2019; IOP Publishing Ltd.: Bristol, England, 2020; Volume 500, p. 012087. [Google Scholar] [CrossRef]
Yasen, K.; Koedsin, W. Estimating Aboveground Biomass of Rubber Tree Using Remote Sensing in Phuket Province, Thailand. J. Med. Bioeng. 2015, 4, 451–456. [Google Scholar] [CrossRef]
Simien, A.; Penot, E. Current Evolution of Smallholder Rubber-Based Farming Systems in Southern Thailand. J. Sustain. For. 2011, 30, 247–260. [Google Scholar] [CrossRef]
Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.P.; Gao, X.; Ferreira, L.G. Overview of the Radiometric and Biophysical Performance of the MODIS Vegetation Indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
Diek, S.; Fornallaz, F.; Schaepman, M.E.; de Jong, R. Barest Pixel Composite for Agricultural Areas Using Landsat Time Series. Remote Sens. 2017, 9, 1245. [Google Scholar] [CrossRef]
Dong, J.; Xiao, X.; Sheldon, S.; Biradar, C.; Duong, N.D.; Hazarika, M. A Comparison of Forest Cover Maps in Mainland Southeast Asia from Multiple Sources: PALSAR, MERIS, MODIS and FRA. Remote Sens. Environ. 2012, 127, 60–73. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The Advantages of the Matthews Correlation Coefficient (MCC) over F1 Score and Accuracy in Binary Classification Evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
Powers, D.M.W. What the F-Measure Doesn’t Measure: Features, Flaws, Fallacies and Fixes. arXiv 2015, arXiv:1503.06410. [Google Scholar] [CrossRef]
Chicco, D.; Warrens, M.J.; Jurman, G. The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment. IEEE Access 2021, 9, 78368–78381. [Google Scholar] [CrossRef]
Halkidi, M.; Batistakis, Y.; Vazirgiannis, M. Cluster Validity Methods: Part I. ACM SIGMOD Rec. 2002, 31, 40–45. [Google Scholar] [CrossRef]
Wainer, J.; Fonseca, P. How to tune the RBF SVM hyperparameters? An empirical evaluation of 18 search algorithms. Artif. Intell. Rev. 2021, 54, 4771–4797. [Google Scholar] [CrossRef]
Liang, Z.; Liu, N. Efficient Feature Scaling for Support Vector Machines with a Quadratic Kernel. Neural Process. Lett. 2014, 39, 235–246. [Google Scholar] [CrossRef]
Roy, D.P.; Li, Z.; Zhang, H.K.; Huang, H. A Conterminous United States Analysis of the Impact of Landsat 5 Orbit Drift on the Temporal Consistency of Landsat 5 Thematic Mapper Data. Remote Sens. Environ. 2020, 240, 111701. [Google Scholar] [CrossRef]
White, J.C.; Wulder, M.A.; Hobart, G.W.; Luther, J.E.; Hermosilla, T.; Griffiths, P.; Coops, N.C. Pixel-Based Image Compositing for Large-Area Dense Time Series Applications and the Normalization of Forest Change Products. Remote Sens. Environ. 2017, 194, 314–335. [Google Scholar] [CrossRef]
Pflugmacher, D.; Cohen, W.B.; Kennedy, R.E. Using Landsat-Derived Disturbance and Recovery History and LiDAR to Map Forest Biomass Dynamics. Remote Sens. Environ. 2012, 122, 60–73. [Google Scholar] [CrossRef]
Zhu, Z.; Woodcock, C.E. Object-Based Cloud and Cloud Shadow Detection in Landsat Imagery. Remote Sens. Environ. 2014, 152, 120–134. [Google Scholar] [CrossRef]
Lopes, M.; Frison, P.; Crowson, M.; Warren-Thomas, E.; Hariyadi, B.; Kartika, W.D.; Agus, F.; Hamer, K.C.; Stringer, L.; Hill, J.K.; et al. Improving the Accuracy of Land Cover Classification in Cloud Persistent Areas Using Optical and Radar Satellite Image Time Series. Methods Ecol. Evol. 2020, 11, 532–541. [Google Scholar] [CrossRef]
Man, C.D.; Nguyen, T.T.; Bui, H.Q.; Lasko, K.; Nguyen, T.N.T. Improvement of Land-Cover Classification over Frequently Cloud-Covered Areas Using Landsat 8 Time-Series Composites and an Ensemble of Supervised Classifiers. Int. J. Remote Sens. 2018, 39, 1243–1255. [Google Scholar] [CrossRef]
Shrestha, D.P.; Saepuloh, A.; van der Meer, F. Land Cover Classification in the Tropics: Solving the Problem of Cloud-Covered Areas Using Topographic Parameters. Int. J. Appl. Earth Obs. Geoinf. 2019, 77, 84–93. [Google Scholar] [CrossRef]
Promme, P.; Kuwornu, J.K.M.; Jourdain, D.; Shivakoti, G.P.; Soni, P. Factors Influencing Rubber Marketing by Smallholder Farmers in Thailand. Dev. Pract. 2017, 27, 865–879. [Google Scholar] [CrossRef]
Fan, S.; Chan-Kang, C. Is Small Beautiful? Farm Size, Productivity, and Poverty in Asian Agriculture. Agric. Econ. 2005, 32, 135–146. [Google Scholar] [CrossRef]
Kwanmuang, K. Succession Decisions and Inherited Land Size: An Evidence of Family Farms in Nakhon Si Thammarat Province, Thailand. Asian J. Appl. Econ. 2018, 25, 70–88. [Google Scholar]
Kwanmuang, K.; Pongputhinan, T.; Jabri, A.; Chitchumnung, P. Small-Scale Farmers Under Thailand’s Smart Farming System. FFTC Agric. Policy Platf. (FFTC-AP) 2020, 636, 2647. [Google Scholar]
Nguyen, C.T.; Chidthaisong, A.; Kieu Diem, P.; Huo, L.-Z. A Modified Bare Soil Index to Identify Bare Land Features during Agricultural Fallow-Period in Southeast Asia Using Landsat 8. Land. 2021, 10, 231. [Google Scholar] [CrossRef]
Liu, Y.; Meng, Q.; Zhang, L.; Wu, C. NDBSI: A normalized difference bare soil index for remote sensing to improve bare soil mapping accuracy in urban and rural areas. CATENA 2022, 214, 106265. [Google Scholar] [CrossRef]
García, O. Dynamical implications of the variability representation in site-index modelling. Eur. J. For. Res. 2011, 130, 671–675. [Google Scholar] [CrossRef]
Skovsgaard, J.P.; Vanclay, J.K. Forest site productivity: A review of the evolution of dendrometric concepts for even-aged stands. Forestry 2008, 81, 13–31. [Google Scholar] [CrossRef]

Figure 1. (a) Map of Thailand and (b) rubber plantation area in Surat Thani province, including modeling region of interest (ROI) grouped by landscape characteristics.

Figure 2. Framework for mapping the yearly stand age of rubber plantations. The blue parallelograms indicate secondary data inputs, the green parallelograms indicate output data from intermediate processes, and the orange parallelogram represents the final output spatial database.

Figure 3. Interannual distribution summary of NDVI and BSI values from an example rubber plantation in a plainland (PL) area. The lines connect different yearly distribution summary values of (a) NDVI and (b) BSI. (c) Interannual of different values of Q1 NDVI and Q3 BSI at year T₀ to T₋₁, and to T₊₆.

Figure 4. Interannual third-quartile BSI values from years T₋₂ to T₊₆ for 50 samples of rubber plantations in (left) plainland (PL) and (right) hillside (HS) cultivation areas. The red line connects the medians of the boxplot, representing changes in annual BSI over the 9-year window. Circular dots indicate outliers in the BSI value distributions.

Figure 5. Top row: Examples of (a) maximum, (b) Q3, (c) median, (d) Q1, and (e) minimum BSI images in 2004. Bottom row: (f–j) Image of five different distribution BSI values between 2003 and 2004.

Figure 6. F1-score, Kappa coefficient, and MCC of the best model performance on the Type1, Type2_HS, and Type2_PL datasets. The red column represents performance on the training dataset, while the blue column indicates performance on the testing dataset. Green lines represent the training duration of the model, and he dashed line indicates the maximum values of evaluation metrics.

Figure 7. (a) Age of rubber plantations in Surat Thani province (for 2024) map, where the white areas indicate non-rubber cultivation zones, and the gradient of green tones represents the age of rubber plantations, calculated from the predicted year of land clearance (T₀) to 2024. In the enlarged sections of the stand-age map (b–d), white areas indicate non-rubber cultivation regions, black areas represent rubber plantations where the year T₀ could not be predicted within the 37-year study period (labeled as “NA”), and red lines denote the A302 polygons from the LULC dataset.

Figure 8. Accuracy assessment results of predicted T₀: (a) a bubble plot of actual versus predicted T₀, (b) an error bar showing mean predicted T₀ for each actual T₀ year, and (c) the average Intersection over Union (IoU) for each plantation, grouped by hectare-size interval, with the error bars representing the standard deviation. The red diamond dots connected with a line indicate the number of plantations across different planation area intervals. The “NA” label indicates the number of underestimated prediction cases.

Figure 9. Examples of BSI time series illustrating the impact of available pixel values and land clearance timing on prediction accuracy: (a) limited pixel values availability within a year, and (b) land clearance occurring near the end of the annual period. Red, blue, and black dots indicate the distribution of BSI values extracted from Landsat 4-5, Landsat 7 and Landsat 8 imagery, respectively.

Figure 10. Interannual BSI variation and trends from six years before to six years after T₀ in rubber plantations established on different pre-conversion land uses: (a) active paddy fields and (b) abandoned paddy fields. Black dots indicate the distribution of BSI values within each annual period.

Table 1. Summary of prediction accuracy results.

Cultivation Area	Plainland	Hillside	Total
Num. of POI	9251	949	10,200
Correct	7837 (84.7%)	782 (82.4%)	8619 (84.5%)
Incorrect	1058 (11.4%)	118 (12.4%)	1179 (11.5%)
Underestimate (“NA”)	356 (3.8%)	49 (5.2%)	405 (4.0%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wongsai, S.; Sanpayao, M.; Jirakajohnkool, S.; Wongsai, N. Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring. Forests 2025, 16, 672. https://doi.org/10.3390/f16040672

AMA Style

Wongsai S, Sanpayao M, Jirakajohnkool S, Wongsai N. Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring. Forests. 2025; 16(4):672. https://doi.org/10.3390/f16040672

Chicago/Turabian Style

Wongsai, Sangdao, Manatsawee Sanpayao, Supet Jirakajohnkool, and Noppachai Wongsai. 2025. "Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring" Forests 16, no. 4: 672. https://doi.org/10.3390/f16040672

APA Style

Wongsai, S., Sanpayao, M., Jirakajohnkool, S., & Wongsai, N. (2025). Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring. Forests, 16(4), 672. https://doi.org/10.3390/f16040672

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pixel-Based Mapping of Rubber Plantation Age at Annual Resolution Using Supervised Learning for Forest Inventory and Monitoring

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area

2.2. Data

2.2.1. Landsat Time Series

2.2.2. Land Use and Land Cover (LULC)

2.2.3. Secondary Ground Reference Data

2.3. Mapping Stand Age

2.3.1. Landsat Images Pre-Processing and Interannual Spatial Indices Time Series

2.3.2. On-Screen Reference Sites and Year of Rubber Planting Identification

2.3.3. Features Generation

2.3.4. Features Selection

2.3.5. Model Training and Evaluation

2.3.6. Prediction of Land Clearance Area and Estimating Age of Rubber Plantation

2.4. Accuracy Assessment

3. Results

3.1. Time Series Analysis of Distribution Summary BSI Values

3.2. Feature Importance and Hyperparameter Tuning

3.3. Map of Rubber Plantation Stand Age

3.4. Accuracy Assessment of Rubber Plantation Age and Area Predictions

4. Discussion

4.1. Model Overfitting and Suitability

4.2. Error Analysis of Prediction Uncertainty

4.2.1. Impact of Image Availability and Cloud Cover on Prediction Accuracy

4.2.2. Impact of Land Clearance Timing on the Annual Distribution of Spectral Indices

4.2.3. Pre-Conversion Land Use and Rubber Cultivation Area

4.3. Underestimated Predictions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI