Mapping the Distribution of Co ﬀ ee Plantations from Multi-Resolution, Multi-Temporal, and Multi-Sensor Data Using a Random Forest Algorithm

: Indonesia is the world’s fourth largest co ﬀ ee producer. Co ﬀ ee plantations cover 1.2 million ha of the country with a production of 500 kg / ha. However, information regarding the distribution of co ﬀ ee plantations in Indonesia is limited. This study aimed to assess the accuracy of classiﬁcation model and determine its important variables for mapping co ﬀ ee plantations. The model obtained 29 variables which derived from the integration of multi-resolution, multi-temporal, and multi-sensor remote sensing data, namely, pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEMNAS. Applying a random forest algorithm (tree = 1000, mtry = all variables, minimum node size: 6), this model achieved overall accuracy, kappa statistics, producer accuracy, and user accuracy of 79.333%, 0.774, 92.000%, and 90.790%, respectively. In addition, 12 most important variables achieved overall accuracy, kappa statistics, producer accuracy, and user accuracy 79.333%, 0.774, 91.333%, and 84.570%, respectively. Our results indicate that random forest algorithm is e ﬃ cient in mapping co ﬀ ee plantations in an agroforestry system.


Introduction
Coffee is one of the most important trade commodities in the world. Coffee trees were first found in the mountains of Ethiopia in the 9th century. They were planted in Indonesia in 1699 by the Dutch Vereenigde Oostindische Compagnie (VOC). The planting patterns that have been developed by coffee farmers can generally be classified into two categories: monoculture (mono-cropping) and agroforestry [1]. In Indonesia, coffee-based agroforestry is a well-developed model proficient in supporting social, ecological, and economic interests. According to the International Coffee Organization [2], Indonesia is the fourth largest coffee producer in the world, even though Indonesia stands out as having the second-largest amount of land devoted to coffee plantations, after Brazil [3]. Nevertheless, coffee production in Indonesia is relatively low: 500 kg/ha, with a total area of 1.2 million ha. This is in strong contrast to Vietnam which produces 2.7 million tons of coffee per ha, while having a total area of 630,000 ha [4]. Indonesia has a great potential for becoming the largest coffee producer in the world if the country were able to increase its production efficiency. This is supported by high market demand and the availability of land that is highly suited for coffee development. Therefore, the mapping of coffee plantations is essential for identifying the centers of coffee production. Remote sensing technology has been widely proven to be a cost-effective, fast, efficient method that is essential in monitoring and mapping vegetation [5,6].
Several techniques for mapping coffee plantations were previously used by researchers. These include: supervised pixel-based classification [7,8], object-based classification [9,10], hybrid classification [11], use of spectral variables [12], topographic variables [7,13,14], tasseled cap, and the correlation of NDVI and precipitation [14]. Generally, the aforementioned research uses the classification technique. Classification algorithms can be divided into two models: parametric and nonparametric [15]. A parametric algorithm requires statistical parameters derived from normally distributed training data; whereas, a nonparametric algorithm does not require statistical parameters. Most researchers have been using remote sensing data to map coffee plantations to improve accuracy; in general, they have used Maximum Likelihood (ML) as a parametric algorithm [7,12]. The ML algorithm groups pixels based on the probability of a class in normal distribution [16]. However, the normal distribution assumption is often ignored, especially in complex areas (high spectral heterogeneity). This parametric algorithm is also hindered by difficulties in integrating spectral variables with other variables [17].
In the classification process, nonparametric algorithms performed well [17]. The random forest algorithm is one of the most popular nonparametric algorithms and provides high accuracy results [18][19][20][21]. This algorithm is also used to rank variables [22]. Several recent studies have explored the application of random forest algorithms in mapping land cover [20,21,23], invasive plant species [24], landslide areas [25], bamboo patches [26] and larch trees [27]. Kelley et al. [14] used a random forest algorithm from integrating multi-temporal tasseled cap derived from Landsat 8 imagery, temperatures, topography, and precipitation to map coffee plantations in northern Nicaragua which resulted in user accuracy of 82.1% and producer accuracy of 80.0%. Previous studies indicate that the mapping of coffee plantations has mostly been done with medium resolution imagery [11,14]. However, very high-resolution (VHR) satellite imagery is needed for mapping coffee plantations due to 70% of the world's coffee is grown on small-holder plantations which are less than 10 ha [28].
Since 1999, the availability of VHR imagery such as IKONOS 2, Quickbird, OrbView-3, Rapid-Eye, WorldView-2, and Geo-Eye 1 has made vegetation mapping easier. Pan-sharpened GeoEye-1 is a VHR image that was launched in 2008 with a panchromatic geometric resolution (PAN) of 0.46 m. Aguliar et al. [29] state that pan-sharpened GeoEye-1 achieved higher accuracy than the pan-sharpened WorldView 2 for discriminating shadows, buildings, vacant land, and vegetation. On the other hand, research related to the use of texture in improving the accuracy of mapping coffee plantations has rarely been done. Whereas, it has been found that a combination of spectral images and texture produces a higher accuracy than classifications based only on the spectral image [30]. Other research results indicate that classification accuracy achieved by adding texture variables extracted from ALOS PRISM was higher than when only using a combination of ALOS AVNIR and ASTER GDEM topographical data for the delineation of a Shifting Cultivation Landscape [31]. Gao et al. [27] explain that adding textural variables extracted from Landsat imagery using the random forest algorithm produced an increase in the overall accuracy of 2.92%. Therefore, this research carried out an analysis with the addition of textural variables for mapping coffee plantations in an agroforestry system. It is hoped that the addition of these variables can improve the classification accuracy of coffee plantations.
Coffee trees exhibit different characteristics in different seasons. In rainy season, these trees thrive and have dense green leaves, while in dry season (the harvest period) leaves are reduced due to pruning by coffee farmers. Pruning needs to be done to maintain productivity as well as the sustainability of coffee plantation cultivation. However, this does cause a difference in the spectral value of coffee plantations. Loss of leaves due to pruning results in lower NIR channel values and higher red channel values [32]. Therefore, the use of multi-temporal data becomes an alternative solution to represent the Remote Sens. 2020, 12, 3933 3 of 23 condition of coffee plantations in different seasons through the vegetation index. Multi-temporal data were also very significant for monitoring changes in land cover and plant phenology [33].
Multi-temporal VHR satellite imagery could not be obtained in this research; therefore medium resolution data obtained from Sentinel 2, was used. Jia et al. [33] explain that the addition of multi-temporal data derived from medium-resolution images achieved a significant increase in the accuracy of high-resolution image classification, especially in terms of differentiating plantation types. Souza et al. [10] mapped coffee plantations by integrating multi-temporal variables extracted from Landsat (30 m) and Rapid-Eye imagery (5 m), achieving significant accuracy compared to when only using Rapid-Eye. Sentinel 2 is a new generation of medium-resolution imagery launched in 2015 and has significant potential for mapping coffee plantations [34]. It has 13 bands that consist of the spatial resolution of 10 m (for the visible and near-infrared light bands) and 20 m (for the near-infrared and shortwave infrared bands) from blue wavelengths (0.458-0.523 µm) to SWIR (2.100-2.280 µm) which are sufficient for calculating a range of indices to successfully map coffee plantations. In this research, multi-temporal Sentinel 2 was used as a variable on Landsat 8 for mapping coffee plantations, applying what was used by Kelley et al. [14]. These variables include the multi-temporal vegetation index of greenness, wetness, brightness, slope, aspect, elevation, and multi-temporal NDVI. The main development of this research is the use of VHR satellite imagery obtained from GeoEye-1 and the addition of textural variables; namely, entropy, standard deviation, correlation, mean, and contrast in mapping coffee plantations in the agroforestry system. Therefore, we pursue two primary objectives: (1) Assessing the accuracy of random forest classification derived from integrating multi-sensor, multi-temporal, and multi-resolution remote sensing data from pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM for mapping coffee plantations. (2) Determining the most important variables derived from random forest classifications for mapping coffee plantations.

Description of Research Area
The research was conducted in parts of Mount (Mt.) Puntang, in Banjaran district in West Java, Indonesia. The research area focused on approximately 433.7 ha where is located between 107. 595 • E to 107.613 • E and 7.091 • S to 7.111 • S. The area is mainly mountainous from the central portion to the southeast, while the northwestern part is predominantly plain. The elevation ranges between 1000 and 1200 m above sea level, and the terrain is flat to undulating. The annual precipitation is 2500 mm per year.
Coffee plantations have a biannual growing cycle [35]. There are some phases of growing coffee plantations including firstly, the phase correlates to vegetation production from September to March, the second phase is maturation and dormancy of the buds from April to August of the first year, the third phase is flowering of the buds from September to December, the fifth phase is the fruit ripens for three months, and the final phase occurs in July and August of the second year which is the senescence. On the dry season which is the leaves of coffee trees are reduced due to pruning by coffee farmers occurs from March to September, meanwhile on the rainy season which is trees thrive and have dense green leaves occurs from September to March. The research area was chosen due to Mt. Puntang being one of the best coffee production centers in West Java and won an award at the 2016 Specialty Coffee Association of America (SCAA) Contest in Atlanta. The research area can be seen in Figure 1.

Data and Preprocessing
The aim of this research was to map coffee plantations based on multi-resolution, multitemporal, and multi-sensor data using remote sensing; these data can be seen in Table 1.

Data and Preprocessing
The aim of this research was to map coffee plantations based on multi-resolution, multi-temporal, and multi-sensor data using remote sensing; these data can be seen in Table 1. In this research, we preprocessed images focusing on geometric, radiometric, and terrain corrections. All images were projected onto Universal Transverse Mercator (UTM) with WGS 84 Datum. Radiometric correction was also required for pan-sharpened GeoEye-1 and Sentinel 2. The dark object subtraction (DOS) atmospheric correction was applied for multi-temporal Sentinel 2 [36]. Next, the digital number (DN) of the pan-sharpened GeoEye-1 was converted to reflectance based on the GeoEye-1 corrections formula and calibration coefficients [37]. Another way of improving the classification of land cover in mountainous terrains consists of using terrain corrections [38]. In this research, the terrain correction method used was Minnaert correction [39]. Minnaert is a popular and widely used terrain correction method for mapping forest areas [40,41]. This equation requires information regarding the sun azimuth and elevation at the time of image acquisition, DEM, and the original image. The different spatial resolution of images was integrated by resampling data using the nearest neighbor to a grid cell-size equivalent to the highest resolution of data involved in this process. All images were resampled to 0.5 m.
We applied either NDVI or tasseled cap transformation to the multi-temporal Sentinel 2 images [42][43][44]. Multi-scale and multi-feature textures were chosen for analysis including entropy, mean, contrast, standard deviation, and correlation with window size 5 × 5, 15 × 15, and 25 × 25 derived from a green band of pan-sharpened GeoEye-1. In total, 29 variables were used in this research (Table 1). Normalization needs to be done due to the different range of images [45]. The dataset was Remote Sens. 2020, 12, 3933 6 of 23 transformed using the min-max normalization method by processing the minimum and maximum value of each attribute [46,47].
Before running the random forest algorithm, we had to set three parameters; which were: the number of trees, the number of variables predictor (mtry), and the number of minimum node sizes [48]. We varied these parameters across a wide range of values to achieve the best parameter combination (number of trees = 400 to 1000; the number of variables predictor: all variables and square root; minimum node size: 2 to 10). Next, we applied a random forest algorithm [18] using the best parameters to construct classification models by combining multi-sensor, multi-temporal, and multi-resolution remote sensing data from pan-sharpened GeoEye-1, Sentinel 2, and DEMfor mapping coffee plantations. Variable importance was also measured and models were created based on the most important variables. Each model was trained and validated with the reference data collected being based on aerial photographs and field surveys.

Field Surveys
Regarding the field surveys, we collected VHR aerial imagery data obtained from Unmanned Aerial Vehicle (UAV) platforms on 30 August 2019. These were used as references in making training and testing points. The VHR aerial imagery data were collected at five different locations ( Figure 2). The locations were chosen based on the variations in land cover and were expected to represent coffee plantations in the research area. Both DJI Phantom 4 pro and Mavic 4 pro have same specifications, including the sensor of 20 MP camera resolution, 1" CMOS, a spectral resolution of 5472 × 3648 pixels, a mean ground sample distance (GSD) of 3 cm/pixel for a flight height of 100 m above take-off level with 80% overlap, and have three visible bands (red, green, and blue). The VHR aerial imagery was carried out using Agisoft Metashape 1.6 [49] with a total of 120 photos in Area 1; 92 photos in Area 2; 546 photos in Area 3; 223 photos in Area 4, and 573 photos in Area 5. Several steps were taken in the process: (1) creating a flight plan, (2) aligning photos in order to equalize the same points between photos (matching points) which were then used as tie points to build a point cloud, (3) building a dense cloud to interpolate the point cloud from aligning photos to form an object, (4) employing building mesh to build a 3D model of the resulting VHR aerial imagery. We also performed distortion geometric correction by using a ground control point (GCP). The number of GCP for each UAV image is 15 GCP which is distributed evenly. The accuracy of the geometric corrections is indicated from the Root Mean Square Error (RMSE) value per pixel in the image. In general, these values are less than one pixel. If the value is greater than one, there is a possibility that the image is still distorted [50]. The RMSE of UAV image in each area can be seen in Table 2.

Land Cover Classification Scheme
Based on existing land cover in the research area, we determined ten land cover for classification due to the differences in land cover patterns and gradients in elevation (Table 3). We used a balanced sample of 400 pixels to each class as training samples [50] and 150 pixels to each class as testing samples in the classification model [51]. The number of testing samples is in keeping with recommendations from ISO 19157:2013 [52]. All samples obtained from field surveys and VHR aerial imagery (3 cm spatial resolution) were interpreted. A stratified random sampling approach was then applied. Stratified random sampling is a probability sampling technique wherein the sampling is carried out randomly by first dividing the population into strata since the elements of populations are not proportionally stratified [53,54].

Land Cover Classification Scheme
Based on existing land cover in the research area, we determined ten land cover for classification due to the differences in land cover patterns and gradients in elevation (Table 3). We used a balanced sample of 400 pixels to each class as training samples [50] and 150 pixels to each class as testing samples in the classification model [51]. The number of testing samples is in keeping with recommendations from ISO 19157:2013 [52]. All samples obtained from field surveys and VHR aerial imagery (3 cm spatial resolution) were interpreted. A stratified random sampling approach was then applied. Stratified random sampling is a probability sampling technique wherein the sampling is carried out randomly by first dividing the population into strata since the elements of populations are not proportionally stratified [53,54]. Table 3. Description of Land Cover Classifications Used.

Class (Id) Description
Water body (1

Textural Analysis
We applied textural features using the Gray Level Co-occurrence Matrix (GLCM) which is a statistical method used to extract the second order statistical textural features from the given image [55]. These textural features were extracted from a matrix containing gray level occurrence frequencies (digital numbers) of pairs of pixels at fixed relative positions in an image. GLCM has some advantages including easily done and has been shown to give very good results in a large field of applications [56]. Furthermore, Maillard shows that GLCM gives better results than semi-variogram and the Fourier spectra for simple situations where the textures are visually easily separable [57]. Therefore, GLCM was chosen because this method outperforms other methods in distinguishing textures [58,59]. Some studies show that the integration of spectral and textural features using GLCM improves classification accuracy [30,31,55,60,61].
In this research, five textural features were applied: contrast, correlation, standard deviation, entropy, and mean [25,29,62]. These features were chosen due to the strong correlation frequently reported between many of the features, can reduce the complexity of the algorithm, and are more efficient than using all the features [29]. The textures were derived from the green band of pan-sharpened GeoEye-1 image as variables in coffee plantation discrimination. In addition, green band is useful to determine vegetation types, vigor, and to differentiate between healthy and unhealthy plants [63]. According to Karjono [64], a classification based on a combination of several windows (multi-scale texture) provides higher accuracy than using only one window size (single-scale texture). Rodriguez-Galiano et al. [21] also show that the use of multi-scale texture in random forests increases the accuracy by 30% in some categories, which is better than when only using a single-texture window size. By referring to previous research Franklin et al. [65], three different window sizes of 5 × 5, 15 × 15, and 25 × 25 were obtained. These windows size were obtained based on semivariograms and then were applied to diagonal (1, 1) orientations.
Remote Sens. 2020, 12, 3933 where P i,j is the (i − j)th entry in GLCM, µ i is the mean vector of class i with the formula µ i = i j iP i,j , µ j is the mean vector of class j with the formula µ j = i j jP i,j , σ i is standard deviations

Derivation of Topography Data
The integration of topographical data and remote sensing imagery can improve the accuracy of classifications in mapping coffee plantations. Topographical indexes including elevation, slope, and aspect were widely used by researchers for mapping coffee plantations [7,13,14,66]. According to Kelley et al. [14], adding these topographical variables (elevation, slope, and aspect) in mapping coffee plantations increases classification accuracy by 7.8-20.1%.
In this study, all topographical data layers (elevation, aspect, and slope) were derived from DEMNAS data provided by the Geospatial Information Agency-Indonesia (BIG). DEMNAS is an elevation model with 0.27 arc-second (8.3 m) resolution (Tides.big.go.id), which is derived from various data sources: Interferometric Synthetic Aperture Radar (IFSAR) at 5-m resolution, TerraSAR-X at 5-m resolution, and Advanced Land Observing Satellite-Phased Arrayed L-band SAR (ALOS PALSAR) at 11.25 m resolution. It adds Masspoint data from stereo-plotting results. This dataset contains one band for elevation.

Derivation of Vegetation Index
The vegetation index is the transformation of two or more spectral bands from different electromagnetic waves to produce information about land cover change and green biomass [67]. Normalized Difference Vegetation Index (NDVI) is the most popular vegetation index [44]. We applied NDVI to the multi-temporal Sentinel 2 for mapping coffee plantations, given its success in improving the accuracy of coffee plantation mapping [7,14,66]. According to Cordero-Sancho and Sader [7], mapping coffee plantations using a combination of NDVI and DN produces better accuracy results than when using only DN. Hailu et al. [66] classified coffee in an agroforestry system using SPOT 5 images. The results of this research indicate that NDVI is the best variable for land cover classification as it produces an overall accuracy of 87.8%.
where NIR is reflection in the near-infrared spectrum, RED is reflection in the red range of the spectrum.

Tasseled Cap Transformation
The tasseled cap algorithm was invented by Kauth and Thomas [68] who extracted it from the Landsat MSS; the tasseled cap produces three functions consisting of brightness, greenness, and yellowness. In 1984, Crist and Cicone further developed this algorithm by changing the yellowness index to the wetness index. Tasseled cap has been used for monitoring agricultural land [69], plant health [70], and detecting changes in land cover [71]. Kelley et al. [14] used tasseled cap from Landsat 8 OLI data as additional data for the identification of coffee plants in agroforestry systems. The results of their research show that the use of tasseled cap improves accuracy in the mapping of coffee plantations.
Following previous research Kelley et al. [14], we applied tasseled cap as ancillary data to a multi-temporal Sentinel 2 image to obtain an index of brightness, greenness, and wetness. Tasseled cap was used in this research because this transformation compresses information from a number of bands and minimizes information loss; it is suitable for application in areas of high complexity (heterogeneous) [72]. We used coefficient values following the Sentinel 2 reflectance data provided by Shi and Xu [43] which is this specific coefficient derived from PCP method can effectively enhance brightness, greenness, and wetness characteristics of the Sentinel-2 multispectral at sensor reflectance. In addition, these coefficients can accurately interpret the corresponding biophysical characteristics of land cover better than coefficient derived by Nedkov [43]. These coefficients can be seen in Table 4.

Optimization Parameter of Random Forest Algorithm
Before constructing the random forest classification, we investigated the best random forest parameter to obtain an optimal classification result. The parameters tested in this research were: tree, mtry, and minimum node size [48]. Studies have used 100 trees [73], 500 trees [14,74,75], and 1000 trees [76]. Du et al. [19] indicate that trees between 10 and 200 had no influence on classification accuracy. Some researchers also maintain that 300 is the minimum number of trees needed to retain accuracy [74,77]. In this research, trees were tested ranged from 400 to 1000.
Mtry is the number of variables used in splitting each node. The variables used for splitting each tree were randomly selected to minimize the correlation between the trees [21,74]. According to Breiman [18], mtry can be calculated using the √ k, where k is the total number of variables. Some researchers have also followed this formula [78][79][80]. In this research, we tested both mtry = √ k [18] and mtry = k (all variables). The minimum node size is the minimum number of samples required for each node splitting. Breiman [18] suggest using 2 minimum node sizes for all the trees in the land cover classification to allow them to grow fully without pruning. In this research, the minimum node sizes tested ranged from 2 to 10. To obtain optimal parameters, a random forest classification model was constructed based on the results of a cross combination of all the parameters. These cross combinations parameters produce 126 classification models. Random forest classification was run using the Vigra tool in SAGA 7.6.2 software [81].

Mapping Coffee Plantations Based on a Full Model Approach
We constructed a random forest classification algorithm for mapping coffee plantations derived from pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM data (total of 29 variables) as the input image denoted as model 5 (M5) using the best random forest parameters. We then compared the classification accuracy produced by a model derived from using 29 variables with the other four-classification model to determine the effect of textures, spectral bands, topography, NDVI, and tasseled cap for mapping coffee plantations. These models can be seen in Table 5.

Variables of Importance
In total, 29 variables were obtained from pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM which were used for mapping coffee plantations (Table 1). It is important to know the influence of each variable on the classification result. The importance of the variables is measured based on the value of permutation or the mean decrease accuracy (MDA). In this process, random permutations of the variables at the out of the bag (OOB) value were measured. OOB error is a method for validating the random forest model. The accuracy change value was then measured to determine the variable of importance. Based on the value of the variable of importance, the most important variables were then selected to produce a more efficient classification model which was then used for mapping coffee plantations in the research area.

Accuracy Assessment of Classification Model
Accuracy assessment was used to compare the result of each classification model with the ground truth. Following previous research, the accuracy of each classification was evaluated using a confusion matrix [82]. This approach can assist in identifying misclassifications between different land cover and potential sources of error. The percentage of the classification accuracy results was seen from the following values: kappa statistics, overall accuracy, user accuracy, and producer accuracy [83]. Ideally, all non-diagonal elements in the matrix should be zero, which means that there are no deviations in the matrix [63].
Producer Accuracy = Xii X + i × 100% (10) where N = the number of all pixels used for observation, R = the number of rows in the error matrix (number of classes), Xii = Diagonal values of the contingency matrices of row i and column i, X + i = column pixel number i, and Xi + row pixel number i.

Mtry
The use of mtry = k (all variables) was significant in producing higher accuracy than when using mtry = √ k (square root) (Figure 3). The highest accuracy derived by the parameter combination of mtry = k, tree = 1000 and minimum node size = 6 produced overall accuracy and kappa statistics of 79.333% and 0.774, respectively; whereas, the lowest accuracy obtained by the parameter combinations of mtry = √ k, tree = 500 and minimum node size = 5 produced overall accuracy and kappa statistics of 77% and 0.744, respectively. The use of mtry = k (all variables) and mtry = √ k (square root) produced an average of the overall accuracy of 78.450% and 77.740%, respectively. mtry = √k (square root) (Figure 3). The highest accuracy derived by the parameter combination of mtry = k, tree = 1000 and minimum node size = 6 produced overall accuracy and kappa statistics of 79.333% and 0.774, respectively; whereas, the lowest accuracy obtained by the parameter combinations of mtry = √k, tree = 500 and minimum node size = 5 produced overall accuracy and kappa statistics of 77% and 0.744, respectively. The use of mtry = k (all variables) and mtry = √k (square root) produced an average of the overall accuracy of 78.450% and 77.740%, respectively.

Tree
The average of overall accuracy obtained by each tree can be seen in Table 6. The highest overall accuracy obtained by using tree = 1000 produced an overall accuracy of 78.644%. Nevertheless, the relative use of trees numbering from 400 to 1000 did not provide a significantly fluid increase in accuracy. The average overall accuracy of each tree is in the range of 78%. The uses of the tree did not significantly influence the classification accuracy compared to the uses of mtry.  Figure 4 shows an average of overall accuracy obtained from using minimum node sizes 2 to 10. It has been shown that using minimum sizes 2 to 10, produced a decreasing overall accuracy which is proportional to the increasing number of minimum node sizes used. The node size parameter specified the minimum number of samples in each node. The more samples of each node, the lower the accuracy resulted. The highest overall accuracy of 78.886% was produced by using two minimum samples in each node; whereas, using 10 minimum node sizes produced the lowest overall accuracy of 77.886%.
The determination of minimum node size is used as a control in constructing trees. Decreasing the tree size will be controlled by setting a higher node size value. Trees with the lowest minimum node size (set to 2) will be allowed to grow fully until the smallest sample size of each node is obtained; on the other hand, if a larger minimum node size is used, tree growth will be limited, indicating that the tree must be pruned. This minimizes the possibility of overfitting due to the complexity of the resulting model. The determination of minimum node size is used as a control in constructing trees. Decreasing the tree size will be controlled by setting a higher node size value. Trees with the lowest minimum node size (set to 2) will be allowed to grow fully until the smallest sample size of each node is obtained; on the other hand, if a larger minimum node size is used, tree growth will be limited, indicating that the tree must be pruned. This minimizes the possibility of overfitting due to the complexity of the resulting model.

Combination of All Parameters
A total of 126 classification models were applied to obtain best parameter combination using all variables (29 variables) and the same training points. The highest and lowest overall accuracy percentages obtained for each model can be seen in Table 7.

Combination of All Parameters
A total of 126 classification models were applied to obtain best parameter combination using all variables (29 variables) and the same training points. The highest and lowest overall accuracy percentages obtained for each model can be seen in Table 7. The highest accuracy obtained by combining the parameters of mtry = k, tree = 1000, and minimum node = 6 produced an overall accuracy of 79.333%; whereas, the lowest accuracy obtained by the combination of parameters mtry = √ k, tree = 700, and minimum node size = 10 produced an overall accuracy of 77.333%. Based on Figure 4, an average of the overall accuracy of minimum node size 2 provides the highest accuracy of 78.886%. Meanwhile, the combination parameters of tree = 1000, mtry = k, and minimum node size = 6 produces higher overall accuracy than the using of minimum node size 2 which is 79.333%. It indicated that producing the best accuracy requires a combination of parameters used as input classifications. Therefore, the parameter combination of mtry = k, tree = 1000, and minimum node = 6 used as random forest parameters for mapping coffee plantations.

Distribution of Coffee Plantations
The spatial distribution of coffee plantations which were derived from the full set of pan-sharpened VHR image, topography, texture, multi-temporal NDVI, and multi-temporal tasseled cap (M5) using the random forest algorithm (mtry = k, tree = 1000 and minimum node = 6) produced overall accuracy, kappa statistics, producer accuracy and user accuracy of 79.333%, 0.774, 92.000%, and 90.790%, respectively. The results of the confusion matrix of classification accuracy produced by this model can be seen in Table 8. The accuracy of the classification results (rows) of all variables against the reference data (columns) can be seen in Table 8. The confusion matrix denotes overall accuracy (OA), kappa statistics, user accuracy (UA), and producer accuracy (PA). This model shows good performance in mapping coffee plantations (class 4) as confirmed by high producer and user accuracy rates of 90.79% and 92%, respectively. In contrast, classes 1 (water body), 2 (built-up), and 3 (road) produce a relatively low value of producer accuracy and user accuracy. This is due to all variables used as input classification model in this study indeed the variables for mapping coffee plantations, not for the other class. Based on the classification that was done using all the variables (29 variables), we compared this with four other models (Table 9) to investigate the influence of textures, spectral bands, topography, NDVI, and tasseled cap for mapping coffee plantations. A total of five models were tested in this research. The results of the overall classification accuracy produced from the entire model can be seen in Table 9, while the classification results can be seen in Figure 5.  The classification accuracy obtained by all models can be seen in Table 9. These data show a trend of increasing the accuracy of land cover classification in line with the increasing number of variables. The use of the M5 model (RGB + Texture + Elevation + NDVI + Tasseled Cap) significantly provides the highest overall accuracy for mapping coffee plantations. In the M1, the classification model derived from the spectral data of the pan-sharpened GeoEye-1 (the red, green, and blue bands) produced the lowest accuracy compared to the other models. Applying this model produced overall accuracy, kappa statistics, producer accuracy, and user accuracy of 58.540%, 0.530, 65.888%, and 59.899%, respectively. The application of a textural variable (M2) in the random forest classification model showed a significant increase (14.6%) in overall accuracy. An overall improvement accuracy of 2.21% was also reported; while producer accuracy increased by 2.58% and user accuracy increased by 2.37% when topographical variables (elevation, aspect, and slope) were added to this research (M3). On the other hand, the M4 model shows that the use of the NDVI variable has the lowest The classification accuracy obtained by all models can be seen in Table 9. These data show a trend of increasing the accuracy of land cover classification in line with the increasing number of variables. The use of the M5 model (RGB + Texture + Elevation + NDVI + Tasseled Cap) significantly provides the highest overall accuracy for mapping coffee plantations. In the M1, the classification model derived from the spectral data of the pan-sharpened GeoEye-1 (the red, green, and blue bands) produced the lowest accuracy compared to the other models. Applying this model produced overall accuracy, kappa statistics, producer accuracy, and user accuracy of 58.540%, 0.530, 65.888%, and 59.899%, respectively. The application of a textural variable (M2) in the random forest classification model showed a significant increase (14.6%) in overall accuracy. An overall improvement accuracy of 2.21% was also reported; while producer accuracy increased by 2.58% and user accuracy increased by 2.37% when topographical variables (elevation, aspect, and slope) were added to this research (M3). On the other hand, the M4 model shows that the use of the NDVI variable has the lowest improvement in accuracy, only 0.51%. Producer accuracy also decreased by 1.59% while user accuracy increased by 2.56%. Nevertheless, the addition of multi-temporal data; namely, the NDVI vegetation index and the tasseled cap, increased the overall accuracy of mapping coffee plantations. The addition of multi-temporal tasseled cap showed an increase in accuracy of 3.47%. Producer accuracy increased by 1.33% and user accuracy increased by 3.61%.

Mapping Coffee Plantations Using Variables of Importance
As mentioned above, although random forests can be used for dimensional data, the classification results can improve accuracy only when using the most important variables. The variable importance obtained by the full model can be seen in Figure 6. accuracy increased by 1.33% and user accuracy increased by 3.61%.

Mapping Coffee Plantations Using Variables of Importance
As mentioned above, although random forests can be used for dimensional data, the classification results can improve accuracy only when using the most important variables. The variable importance obtained by the full model can be seen in Figure 6. The variables of importance derived by the full set of RGB + Texture + Elevation + NDVI + Tasseled cap model using random forest (mtry = k, tree = 1000, and minimum node size = 6) can be seen in Figure 6. Based on this data, the five most important variables are: standard deviation 25 × 25, elevation, aspect, wetness index (dry season), and mean 15 × 15. The brightness (dry season) is the variable with the lowest level of importance. Several textural variables with a size of 25 × 25 also have low importance. The results show that the topographical effects (aspect, elevation, and slope) had a relatively large influence. This is due to coffee plants growing on mountain slopes, which means that topographical influences greatly contribute to differentiating coffee trees from other land covers. Furthermore, we selected a model based on the median and the mean value of permutation importance, as well as ±5 variables from the median. Therefore, the 10 most important, 12 most important, 15 most important and 20 most important variables were selected. The classification results can be seen in Table 10.  The variables of importance derived by the full set of RGB + Texture + Elevation + NDVI + Tasseled cap model using random forest (mtry = k, tree = 1000, and minimum node size = 6) can be seen in Figure 6. Based on this data, the five most important variables are: standard deviation 25 × 25, elevation, aspect, wetness index (dry season), and mean 15 × 15. The brightness (dry season) is the variable with the lowest level of importance. Several textural variables with a size of 25 × 25 also have low importance. The results show that the topographical effects (aspect, elevation, and slope) had a relatively large influence. This is due to coffee plants growing on mountain slopes, which means that topographical influences greatly contribute to differentiating coffee trees from other land covers. Furthermore, we selected a model based on the median and the mean value of permutation importance, as well as ±5 variables from the median. Therefore, the 10 most important, 12 most important, 15 most important and 20 most important variables were selected. The classification results can be seen in Table 10. The classification accuracy obtained by variations of the most important variables can be seen in Table 10. This data indicates that the use of the 12 most important variables produced the highest overall accuracy, kappa statistics, producer accuracy, and user accuracy percentages of 79.333%, 0.774, 91.33%, and 84.57%, respectively. The 12 most important variables are the standard deviation of 25 × 25, elevation, aspect, wetness index (dry season), mean 15 × 15, slope, brightness (wet season), correlation 5 × 5, greenness (dry season), blue channel, entropy 5 × 5, and the red channel. This model provides the same overall and kappa statistics using all variables, but with the different producer and user accuracy.

Parameter of Random Forest Classifications
In total, 126 model classifications were used for selecting the best parameter in constructing random forest classifications. Mtry is the number of variables selected randomly when splitting nodes. Mtry is used as a control in the variable selection process, indicating that the smaller the mtry value, the stronger the randomization will be. On the other hand, there is no randomization taking place in the split selection when using all the variables, indicating that all variables are allowed in splitting. In this research, the use of all variables (29 variables) significantly produced higher accuracy than the method proposed by Breiman [18]; namely, √ k. Based on these results, it can be seen that the fewer variables used for splitting, the lower the accuracy; whereas, the more variables used for splitting, the better the produced accuracy will be. This is also in line with research conducted by Gao [27], who found that the higher the number of mtries, the higher the produced accuracy.
The use of trees from 400 to 1000 does not provide a significant and fluid increase in accuracy. This is following the previous researches [84,85] who stated that tree has no significant influence on classification accuracy compared to mtry. In this research, the best accuracy was obtained by using the tree = 1000 which produced an average of overall accuracy of 78.64%. Although Gao [27] report that the use of tree 500 obtained high accuracy; however, the use of tree 500 in this research obtained the lowest accuracy. This is due to the different morphological conditions in the research area [86].
The lower number of samples in each node will provide better accuracy, otherwise, the more samples tends to decrease the accuracy. This research follows the recommendation made by Breiman [18], who indicates that the use of a minimum node size 2 will give good classification results. This is due to the small minimum node size, causing the tree to continue splitting until the nodes are truly pure; however, this condition will result in overfitting. Therefore, it is necessary to analyze the parameter combination.
Although an average of overall accuracy of minimum node size 2 provides the highest accuracy, that of 78.886% (Figure 4), the combination of each tree for each minimum node size gives a higher percentage of accuracy; namely, 79.333% as a result of the parameter combination of tree = 1000, mtry = k, and minimum node size = 6 ( Table 7). Based on the results that were obtained, it can be concluded that to obtain the best accuracy it is necessary to cross-combine all of the mtry parameters, the minimum node size, and the number of trees; these are then used as input random forest parameters in classifying.

Distribution of Coffee Plantations
In this section, the effects of increasing the variables for a particular model are analyzed. The classification used only the spectral data from the pan-sharpened GeoEye-1 which produced the lowest accuracy compared to other models. The results show that this model cannot discriminate coffee plants from other crops, including pine needle trees; due to the research area being influenced by topographical effects and spectral similarities with other vegetation [87]. This is in line with research conducted by Mukashema [88] who found that using only spectral of high-resolution images is insufficient for mapping coffee plantations accurately. Therefore, additional variables are needed to differentiate coffee plants from other land cover classifications.
The addition of textural variable in the random forest classification model significantly increases overall accuracy by 14.6%. This is in line with previous research conducted by [65,89] that textural variables have high contribution to accuracy. On the other hand, it was difficult to distinguish coffee plantations from other vegetation in an area that has different elevations and slopes. Therefore, by including topographical variables (elevation, aspect, and slope) it is possible to distinguish coffee plants from mixed vegetation that grow at different elevations and slopes. In areas with low elevation, coffee plants can be distinguished from mixed vegetation. By adding topographical variables, an increase in the overall accuracy of 2.21% was seen, along with producer accuracy of 2.58%, and user accuracy of 2.37%. The improvement in accuracy obtained by adding topographical data in mapping coffee plantations is also noted by previous research [14].
This research also indicates that the use of NDVI variable data shows the lowest increase in overall accuracy, that of 0.51% (M4). This model is comparable with prior research [14]. Kelley [14] indicate that the use of NDVI variables increases overall accuracy by only 1%. However, the addition of multi-temporal data; namely, the NDVI vegetation index and the tasseled cap obtained by M4 and M5 models, increases the overall accuracy of mapping coffee plantations. The addition of tasseled cap multi-temporal data showed an increase in overall accuracy of 3.47%, producer accuracy of 1.33%, and a user accuracy increase of 3.61%.
The most significant finding of this research is that the highest accuracy was achieved by full models (integration of pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM), using the best random forest parameters (mtry = k, tree = 1000, and minimum node size = 6). This model achieved an overall accuracy of 79.333% and a kappa statistics of 0.774. The overall accuracy obtained by this research was significantly higher than that of Sancho [7] who reported 65% accuracy, and the 76.7% accuracy reported by Huerta [11]. The overall accuracy of land cover mapping by this model is lower than that achieved by Kelley [14], which is 90.5%. This is probably due to differences in the image resolution used (30 m), the scale of the area under research (regional), and the use of precipitation data. In this research, precipitation data were not used for two reasons. The first reason is because of the unavailability of this data for the research area. Precipitation data currently available is CHIRPS with a resolution of 0.05 • (5 km), while the research area is 4 km. Secondly, precipitation data contributes only slightly to the overall accuracy, which is 1%; otherwise, it reduces coffee user accuracy to 4.4% [14].
This research integrated pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM using random forest classification (mtry = k, tree = 1000, and minimum node size = 6) which show good results in mapping coffee plantations as evidenced by high producer and user accuracy rates of 92% and 90.79%, respectively. These high values indicate that classification techniques using the random forest algorithm can be used for mapping coffee plants and land cover in agroforestry areas. In this research, producer accuracy and user accuracy were significantly higher than that reported by Sancho [7] who reported a user accuracy of 79.3% and a producer accuracy of 75%; Kelley [14] reported 82.1% user accuracy and 80% producer accuracy. However, some misclassifications between coffee and mixed vegetation still occurred. This is due to the unavoidable spectral similarity of coffee plants to other vegetation.

Important Variables
Classification results achieved from 12 important variables indicate that the random forest algorithm is a reliable approach for improving efficiency in classification. The use of the 12 most important variables provides reliable overall accuracy, kappa statistics, producer accuracy, and user accuracy which are 79.333%, 0.774, 91.333%, and 84.570%, respectively. This model achieved the same overall accuracy by using all the above-mentioned variables. This research is also in line with that of Gao [27], who also found that the random forest algorithm can reduce the variables. The model was derived by applying the 12 most important variables: standard deviation 25 × 25, elevation, aspect, wetness (dry season), mean 15 × 15, slope, brightness (wet season), correlation 5 × 5, greenness (dry season), blue, entropy 5 × 5, and red. The most important variable in this research is the standard deviation 25 × 25. Previous studies also suggest that the use of textures can improve accuracy [27,65].
Topographical effects (aspect, elevation, and slope) show higher importance in this model due to the fact that coffee plants mostly grow on mountain slopes so that topographical effects greatly contribute to differentiating coffee from other land covers. Table 9 shows that adding topographical effects (aspect, slope, and elevation) significantly increases accuracy in discriminating coffee plantations. However, the multi-temporal NDVI variable is relatively weak. This is in accordance with the results described in Table 9; i.e., the addition of the NDVI variable in the classification model only obtained an increase in accuracy of 0.5%. In this research, we conclude that the NDVI variable provides a minor contribution. This is because NDVI is a vegetation index that is affected by canopy density, chlorophyll content, and useful for measuring neighborhood greenness [90]. Whereas, the coffee plantation has spectral similarities with other plantations and has trees different characteristics in different seasons [87]. This result is also in line with Kelley [14] who state that the use of NDVI variables increases overall accuracy by only 1%.

Conclusions
Mapping coffee plantations using satellite imagery is particularly challenging due to spectral similarities with other plantations. Hence, the classification scheme for mapping coffee plantations requires proper understanding of the data used on the field. This research focuses on the suitability of integrating pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM using random forest algorithm for coffee plantation mapping in the agroforestry system. The spectral band as well as textural variables extracted from pan-sharpened GeoEye-1, multi-temporal NDVI, and tasseled cap extracted from Sentinel 2, and topographical effects (slope, aspect, and elevation) extracted from DEM were used as input in a random forest classification model. In total, 29 variables were used for constructing a random forest classification using the best parameters combination (mtry = k, tree = 1000, and minimum node size = 6). This model indicates that the integration of pan-sharpened GeoEye-1, multi-temporal Sentinel 2, and DEM shows the best performance in mapping coffee plantations in the agroforestry system and provides overall accuracy of kappa statistics, producer accuracy, and user accuracy of 79.333%, 0.774, 92.000%, and 90.790%, respectively. High classification accuracy was also achieved by using the most 12 important variables: standard deviation 25 × 25, elevation, aspect, wetness (dry season), mean 15 × 15, slope, brightness (wet season), correlation 5 × 5, greenness (dry season), blue, entropy 5 × 5, and red. This combination of variables provides overall accuracy of: kappa statistics, producer accuracy, and user accuracy of 79.333%, 0.774, 91.333%, and 84.570%, respectively. Our results indicate that using a random forest algorithm can increase efficiency in mapping coffee plantations in the agroforestry system when using only the 12 most important variables. In future work, we hope to explore these random forest parameters and variables in various other coffee-producing terrain and further develop this research, using both spatial and temporal high resolution for increasing accuracy in mapping coffee plantations.