1. Introduction
Southeast Asia is one of the most rapidly deforesting regions in the world [
1]. As such, deforestation has been well-documented in the region, including its drivers, which are mostly due to various tree cash crops [
2]. It is estimated that, by 2012, Indonesia lost up to 83% more primary forest than the Amazon region [
3]. A major driver in the region is oil palm, which is still expanding at a high rate, due to the favorable climate and policy conditions [
4]. While the majority of these expansions are truly large scale (i.e., industrial) driven by companies, local farmers are also expanding their plantations (i.e., smallholders) [
5]. The global extent of industrial-scale plantations was recently estimated at 18.7 million ha (mha) [
5], but the extent of smaller-scale plantations remains largely unknown. Mosnier et al. [
6] report that, as large scale oil palm plantation developers increasingly comply with sustainability standards, the area cultivated by small-scale producers will likely increase. Smallholders have limited financial means to clear and replant old plantations, which are generally replanted after 25 years of production. For such reasons, there is a risk that smallholders abandon their plantations after one production cycle and, eventually, establish new plantations in other areas, thereby producing a new wave of deforestation. Although this is not yet happening [
7].
The extent of industrial-scale plantations in Southeast Asia, particularly in Indonesia and Malaysia, has been the focus of several studies that have relied on medium-resolution (10–30 m/pixel) satellite images. There has been a multitude of methods used to map industrial plantations that have various levels of automation [
8,
9,
10,
11,
12]. These studies use radar satellite data (L-band in ALOS PALSAR and C-band in Sentinel-1) as the main source for the classification of oil palm plantations. Closed oil palm stands present a characteristic radar backscatter, which entails a high separability from other tropical plantations [
8]. Concretely, oil palm plantations show a higher backscatter than other vegetation types in the dual cross-polarization bands and, secondly, the single co-polarization bands and the dual cross-polarization bands present a large difference. More recently, such analyses have moved to cloud processing platforms, holding various freely available satellite datasets, in an attempt to fully automate mapping [
9,
10].
To date, several studies have attempted to map oil palm plantations without discriminating plantations based on their type (industrial versus smallholder) and age. One study has attempted to discriminate industrial and smallholder oil palm plantations [
11], but its study area had a small coverage (<50 km
2) and considered only oil palm plantations in peatlands, which in consequence, did not allow the generalization of the results at a regional scale for the entire Sumatra island. Due to the difficulty in detecting young oil palm plantations, most of these studies have mapped mature (>3 years, but potentially >8 years in some studies [
12]) oil palm stands, apart from a recent work that discriminated both classes [
13]. As a result of these studies, the global extent of industrial plantations is relatively well known, although it remains based on a patchwork of a large number of different studies with different methodologies. Consequently, there is no standardized global map of industrial-scale mature oil palm plantations [
5]. The extent of smallholder plantations is, furthermore, poorly quantified and estimates the proportion of smallholder plantations vary extensively between countries, with Nigeria at the high end (~94%) and areas in Indonesia and Malaysia containing ~40% [
5].
Hence, it is apparent that there are several challenges to overcome before attaining a globally automated map of industrial and smallholder plantations, on which young and mature stands are mapped at frequent intervals (e.g., one year). In this study, we aim to show the suitability of radar (Sentinel-1) and optical (Sentinel-2) satellite data for the automated detection of oil palms and for discriminating industrial and smallholder plantations. Thus, we collected a sample dataset in Riau province (Indonesia) and assessed the performance of Sentinel-1 and Sentinel-2 for the mapping of oil palm plantations. The classification model was implemented in a large scale cloud processing system [
14]. Such an automated method of detecting industrial and smallholder plantations has the benefit that it can potentially lead to a global oil-palm plantation map. It can be adopted by certifying bodies, such as the Roundtable on Sustainable Oil Palm (RSPO) and additionally help to create a transparent and free mapping tool for all of the involved stakeholders. It would also help in ongoing debate about how to best meet future vegetable oil demands in the world, and which crops to use, given their respective yields and social and environmental impacts [
15].
3. Results
The performance of the models for the different classification setups is shown in
Figure 5. Random Forest was the model with the best accuracy in the three setups (kappa = 92.8%, 93.1%, and 95.8%), followed by Support Vector Machine in setup I and II (kappa = 89.8% and 89.5%) and k-NN in setup III (kappa = 95.6%). Except Naive Bayes and Minimum Distance in setups I and II, the models show a similar performance with a kappa above 80% in the three setups. The number of features in which the kappa accuracy saturated lies around 15 features in setup I, 10 features in setup II, and 5 features in setup III.
Figure 6 shows the 26 most relevant features picked by the sequential feature selection, the permutation analysis, and the Gini coefficient. The three methods selected features derived from both Sentinel-1 and Sentinel-2 (Sentinel-1/Sentinel-2 features: 9/17 in sequential feature selection, 12/14 in Gini coefficient, and 10/16 in permutation analysis). The three methods tended to select only extracted features, although the median features Bi_smooth_ksized are highly correlated with the original spectral bands.
The set of features that showed the highest kappa with the lowest number of features was selected. The kappa coefficients obtained with the 26 most relevant features in each method are 87.3 ± 1.7% for the sequential feature selection, 84.5 ± 1.3% for Gini coefficient, and 84.8 ± 2.0% for the permutation analysis. The sequential feature selection shows the highest kappa coefficient, which already saturates when adding more than 15 features. For this reason, we selected the first 15 features of the sequential feature selection due to its high performance (kappa = 87.2 ± 2.2%) with a lower number of features than the other methods.
The accuracy assessment shows the added value of combining Sentinel-1 and Sentinel-2 data for the classification of oil palm plantations.
Table 2 shows the overall accuracy and kappa coefficient for the classification models trained with Sentinel-1 features, Sentinel-2 features, the 15 selected features of the sequential feature selection (
Figure 6), and with Sentinel-1 and Sentinel-2 original bands. The classification models trained with the selected features show the best performance in all classification setups. The models trained with the selection of optimal features excel particularly for the 3-class (OA = 92.6% and kappa = 88.6%) and the 5-class models (OA = 90.2% and kappa = 87.2%), while the performance of the 2-class models is similar to the results obtained with the other configurations of input variables.
The confusion matrices of the best models (
Figure 7), along the user’s (UA) and producer’s (PA) accuracies, show a high thematic accuracy for all the classes (UA and PA > 85%) in the three classification setups, apart of class 4 Smallholder young oil palm in setup I, which shows a low producer’s accuracy (PA = 64.5%). Although, the partial accuracies are high, the industrial and smallholder mature plantations present higher confusion compared to the other classes in setups I and II. We also observe that the models could distinguish the oil palm classes, regardless of the age or typology, from other land uses; Other land uses presents a UA and PA above 93% in the three setups.
McNemar’s test [
24] was applied to assess whether the accuracy of the best model (the model trained with the selected features) is significantly higher than the rest of the models in each classification setup with a significance level lower than 1%. The null hypothesis was rejected for all cases except the classification setup III trained with Sentinel-1 and Sentinel-2 without added features (
p-value = 0.747). This means that the proportion of errors in the model trained with all the features of Sentinel-1 and Sentinel-2 is not significantly higher than the model trained without the features when detecting oil palm trees.
The post-classification step improved the appearance of the classification. Based on the visual comparison with Sentinel-1 and Sentinel-2 composites (
Figure 8a,b), we confirmed that the step corrected the major issues of the classification.
Figure 8 exemplifies the improvement before (
Figure 8c) and after (
Figure 8d) the post-classification in setup I model trained with the optimal features. Despite these improvements, the accuracy did not increase remarkably after the post-classification, with an improvement of 1.2%.
Figure 9 shows the map of oil palm plantations in Riau province for classification setup I, using the set of optimal features from Sentinel-1 and Sentinel-2. The total amount of oil palm is 31,020 km
2, which represents 36.8% of the land surface of Riau on the Sumatran mainland. Within the total surface of oil palm plantations, 37.8% is industrial mature, 12.3% is industrial young, 42.0% is smallholder mature, and 7.9% is smallholder young. Thus, the ratio of smallholders is 49.9% over all oil palm plantations.
4. Discussion
The study aimed to show the feasibility of distinguishing smallholder and industrial oil palm plantations with satellite images. The importance of optical and SAR data in the classification of oil palm plantations was analyzed. The results showed that a combined use of Sentinel-1 and Sentinel-2, with a set of optimal additional features, led to the highest accuracy when classifying smallholders and large plantations. The code that generates the results of this study is available at [
25]. To our knowledge, this is the first study that aimed to distinguish smallholder and industrial oil palm plantations using state-of-the-art satellite remote sensing data. The fusion of the employed Sentinel-1 and Sentinel-2 bands is also unprecedented in this topic and presents a further step, compared to previous studies that used ALOS PALSAR or only Sentinel-1 scenes.
The high accuracy obtained using only Sentinel-1 (OA = 93.5% and kappa = 86.9%), for oil palm classifications, without distinguishing between smallholders and industrial plantations, are comparable to the user’s accuracy of 95.6% obtained in a recent study [
26] for an oil palm class. The results of the Riau case study confirmed the usefulness of SAR data for mature oil palm mapping [
8]. The detection of oil palms, without distinction of typology, can still be improved with optical imagery (Sentinel-2). The results also show that feature extraction is not necessary when detecting mature oil palm trees. This is particularly relevant for further studies that aim to detect roughly the oil palm plantations at a regional scale, without distinction of typology and computationally expensive algorithms.
The characteristic canopy of oil palm plantations might explain the high relevance of Sentinel-1 in the models. The shape of palm-like trees produces a characteristically high backscatter response in the dual-band VH. Evidence of the importance of Sentinel-1 is the high relevance of the VH band and its features in the three feature selection methods. Despite good results for Sentinel-1 in mature oil palm mapping, Sentinel-1 solely cannot distinguish smallholders and industrial plantations. A distinction between smallholders and industrial plantations necessarily requires additional features derived from Sentinel-1 and Sentinel-2 images, which are effective in capturing the shape and density of the harvesting trail networks in industrial oil palm plantations.
The results do not corroborate the study of Oon and colleagues [
11], which concluded that the sole use of Sentinel-1 dual bands can distinguish smallholder and industrial plantations in peatlands. The findings of the Riau case study suggest that Sentinel-1 bands show a low accuracy (OA = 82.6% and kappa = 73.4%) for such classification problems. One possible explanation for this is that the conclusions by Oon et al. are based on the results obtained from an insufficient number of training and testing points (98 points) that may overfit to its small study area (<50 km
2). In contrast, the present study uses a larger training dataset (3,448 points) that covers an extensive heterogeneous area across Riau province. The training dataset used in the present study serves as evidence that the distinction between smallholders and industrial plantations is a challenging problem that requires additional features, based on textural analysis, that capture distinctive contextual information of the oil palm plantations.
The study also highlights the usefulness of cloud computing for regional and global environmental studies that make use of large satellite datasets and require high processing speed. In our study, the algorithm uses 217 daily Sentinel-1 images and 827 daily Sentinel-2 images. The processing of the 10-meter resolution oil palm map of Riau province, which covers an area of 84,360 km
2, was processed in the cloud platform and took about 11 hours. This processing includes the image compositing of Sentinel-1 and Sentinel-2, the extraction of the selected features, the training of the Random Forest, and the image classification. The code and algorithms written in Google Earth Engine can be easily shared and run among different users. For instance, a demo code is available for oil palm mapping [
27] and for the visualization of the results of setup I [
28] in GEE. The shareability of code can be useful, not only for code development, but also for obtaining quick results for environmental and land use monitoring. For instance, in our case study, the classification in Riau province revealed an unexpectedly high ratio of smallholders, which represent the 49.9% of all palm oil plantations in Riau province in 2018.
The model comparison emphasized the usefulness of Random Forests for fast modeling and image classification, even in cloud-based platforms, such as GEE. Random Forest delivered higher accuracy numbers, compared to other supervised classification models, that are implemented in GEE. Although, SVM and other kernel-based classification methods may excel at drawing the decision boundary between classes, their use requires expert knowledge of data pre-processing and hyperparameter tuning in the training stage. Instead, RF is a fast and easy-to-use classification model that requires less parameterization to control overfitting and, thus, can be used more broadly by the scientific community or industry.
The algorithm we used showed good performance, particularly for distinguishing smallholders and industrial plantations in Riau province. However, the level of performance might differ in other parts of the world, particularly for such classification setup, that considers the typology and age of the plantations. The industrial oil palm plantation in other regions might present a different construction design and trail network that would lead to a different set of relevant features for the classification model. Moreover, other palm-like plantations, such as sago plantations in Papua, might get misclassified as oil palm. The similar canopy of oil palm and other palm-like species entails a similar backscatter response, which in turn leads to a high confusion between oil palm and other palm-like plantations. Therefore, our recommendation is that the classification model should be trained again with a different sample dataset collected around the new study area. The last shortcoming we observed is the high confusion between young palm trees and bare soil, which is also reported in [
29]. The classification of young (<3 year-old) plantations is very challenging due to the low canopy coverage of young palm trees. Young plantations seem to be unplanted from space even with 10-meter optical imagery. Recent use of Convolutional Neural Networks has proven its usefulness in similar remote sensing studies [
30,
31], and the more extensive use of contextual information in deep learning may lead to a more accurate classification of young and mature oil palm plantations. Future studies should address these limitations and opportunities to accurately map oil palm plantations at global scale.
5. Conclusions
The present study aimed to show the suitability of optical and SAR satellite data to classify oil palm plantations. The results showed that the combined use of Sentinel-1 co-polarization bands and Sentinel-2 spectral bands allows the detection of oil palm stands. The detection of mature oil palm stands is possible thanks to the characteristic backscatter response of the palm canopy. However, the Sentinel-1 and Sentinel-2 raw images are not enough for a classification problem that aims to distinguish between smallholders and industrial oil palm plantations, or a distinction between young and mature plantations. Such classification setup requires a set of additional relevant features, such as texture and convolutional operations, that capture the spatial patterns that are characteristic in industrial plantations. The most significant pattern in industrial plantations is the dense trail network, which tends to be non-existent in smallholder plantations.
The study was carried out with Google Earth Engine, a cloud-based platform that allows the rapid classification of Sentinel-1 and Sentinel-2 data for a given study area. The shareability of our algorithm and the possibility to run a trained classification model everywhere in the world makes GEE (or other cloud-based processing systems) a suitable tool for environmental monitoring. Researchers and environmentalists can easily reproduce the classification model for mapping oil palm trees over large regions and detect new plantations in sensitive and protected areas. It is our aim that this study will eventually lead to an automated and standardized global map of oil palm and its various categories that can be used by the various parties involved in palm oil, such as the Round Table for Sustainable Palm Oil (RSPO), Governments, NGOs, companies, and other stakeholders.