Self-Guided Segmentation and Classification of Multi-Temporal Landsat 8 Images for Crop Type Mapping in Southeastern Brazil

Only well-chosen segmentation parameters ensure optimum results of object-based image analysis (OBIA). Manually defining suitable parameter sets can be a time-consuming approach, not necessarily leading to optimum results; the subjectivity of the manual approach is also obvious. For this reason, in supervised segmentation as proposed by Stefanski et al. (2013) one integrates the segmentation and classification tasks. The segmentation is optimized directly with respect to the subsequent classification. In this contribution, we build on this work and developed a fully autonomous workflow for supervised object-based classification, combining image segmentation and random forest (RF) classification. Starting from a fixed set of randomly selected and manually interpreted training samples, suitable segmentation parameters are automatically identified. A sub-tropical study site located in São Paulo State (Brazil) was used to evaluate the OPEN ACCESS Remote Sens. 2015, 7 14483 proposed approach. Two multi-temporal Landsat 8 image mosaics were used as input (from August 2013 and January 2014) together with training samples from field visits and VHR (RapidEye) photo-interpretation. Using four test sites of 15 × 15 km2 with manually interpreted crops as independent validation samples, we demonstrate that the approach leads to robust classification results. On these samples (pixel wise, n ≈ 1 million) an overall accuracy (OA) of 80% could be reached while classifying five classes: sugarcane, soybean, cassava, peanut and others. We found that the overall accuracy obtained from the four test sites was only marginally lower compared to the out-of-bag OA obtained from the training samples. Amongst the five classes, sugarcane and soybean were classified best, while cassava and peanut were often misclassified due to similarity in the spatio-temporal feature space and high within-class variabilities. Interestingly, misclassified pixels were in most cases correctly identified through the RF classification margin, which is produced as a by-product to the classification map.


Introduction
Monitoring projects yielding agricultural information in near-real-time and covering a large number of crops are important [1,2].This is particularly true for Brazil, which is one of the largest countries of the world and a main producer and exporter of agricultural commodities [3].For this reason, in Brazil, the use of remote sensing (RS) data for automated crop classification has been largely explored.Most studies used either moderate spatial resolution data from Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua and Terra satellites [4,5], or images from Landsat and Landsat-like satellites [6][7][8][9].Field size and crop type are the most important determinants for the sensor selection.Operationally until 2013, in Brazil only sugarcane in Middle-South was remotely monitored (e.g., Canasat Project) [10].
For crop identification, MODIS offers the advantage of a daily revisit frequency.This increases the possibility of obtaining cloud free data [11,12] and makes it possible to track crops along the season.The derived temporal profiles of vegetation indices such as Normalized Difference Vegetation Index (NDVI) can be used for automatic classification and land cover mapping [5,13,14].The multi-temporal data is also very useful for estimating crop production and yield [15].However, smaller fields cannot be mapped with this data [16][17][18] as the spatial resolution is too coarse (e.g., 250 m).Luiz and Epiphanio [19] for example, have shown that fields smaller than 10 ha have a high percentage (≥1/3) of mixed pixels at the crop boundaries.Rudorff et al. [18] and Yi et al. [18] have demonstrated that the 6.25 ha MODIS pixel size is not useful to map smallholdings in Rio Grande do Sul State.Lamparelli et al. [16] demonstrated that soybean fields in Paraná State should be at least 30 ha for being efficiently mapped.Unmixing approaches [20][21][22] and blending algorithms [23][24][25][26] are not yet accurate enough to counterbalance the relatively coarse spatial resolution of MODIS and other medium/coarse resolution sensors.
Compared to MODIS, Landsat offers a much better spatial resolution (≈30 m).This contributes positively to crop classification.Cowen et al. [27], for example, stated that the required spatial resolution of remotely sensed data needs to be at least one half the size of the smallest object to be identified in the image.Even in Brazil, fields are sometimes only a few hectare large demonstrating the need for decametric sensors like Landsat, ASTER, Sentinel-2, Resourcesat-2 and CBERS-3.The disadvantage of Landsat relates to its low revisit frequency of 16 days (eight days if Landsat-7 and 8 are combined) [28].Together with cloudiness, this may drastically reduce the number of high quality observations [29].
When pixel sizes are relatively small compared to object sizes, traditional pixel-based classification approaches may not be the best choice for land cover mapping.With such data sets, object-based classification approaches have several advantages compared to pixel-based approaches [30][31][32][33][34].For example, using object-based image analysis (OBIA), the impact of noise is minimized.At the same time, textural features may be integrated for improving classification results [9,35].Also, shape and context information can be used for the classification [30,32].The listed features are not readily available within pixel-based approaches.Consequently, object-based approaches have been widely used for crop classification in Landsat-like images [9,33,[36][37][38][39][40][41][42].
Using OBIA, the main problem relates to the fine-tuning of segmentation parameters [9,31,41].Only well-chosen segmentation parameters ensure good segmentation results.Manually defining suitable parameter sets can be a time-consuming approach, not necessarily leading to optimum results [34,48,49].Obviously, the manual approach is also highly subjective and therefore not always reproducible.A visually appealing segmentation also does not necessarily lead to a good classification [56].For this reason, in supervised segmentation one integrates the segmentation and classification tasks.
In the implementation of supervised segmentation by Peña et al. [42] a range of segmentation parameters are applied to the multi-spectral imagery followed by an assessment of the geometrical accuracy of the resulting segments.The geometric quality of the segmentation outputs is estimated by applying the empirical discrepancy method, which consists of a comparison of the derived segment boundaries with an already computed ground-truth reference.An alternative approach was recently presented by Stefanski et al. [57].In their supervised segmentation approach, the most suitable segmentation parameters are identified by running, for each segmentation, a supervised classification and keeping the parameter set minimizing the classification errors (out-of-bag).
In this contribution, we further develop the approach by Stefanski et al. [57] and present a fully automated, supervised segmentation and classification approach combining multiresolution segmentation and Random Forest (RF) classification.The fully autonomous workflow for supervised object-based classification also includes a robust feature selection procedure.Suitable segmentation and classification parameters are automatically identified starting from a fixed set of randomly selected and manually interpreted training samples.Using a large data set of independent validation samples, we demonstrate that the approach leads to a near optimum classification results when mapping five (crop) classes within an agricultural area located in South-East Brazil, using multi-temporal spectral information from the Landsat 8 sensor as input.

Study Site and Crop Calendar
The selected study site is located in the Assis microregion in South-East Brazil (south of São Paulo State), one of the major agricultural production areas of the country (mainly sugarcane and a number of summer crops).In Brazil, a microregion is a legally defined administrative unit consisting of groups of municipalities bordering urban areas used by IBGE (Brazilian Institute of Geography and Statistics) [58].
The study region occupies an area of roughly 715.000 ha.The climate in the region is sub-tropical with average annual temperatures of 21.3°C and annual precipitation of 850 mm [59].Four main soil types are found in the region (Figure 1a).The study site is fully covered by Landsat paths 220 and 221 (rows 75 and 76) (Figure 1b).Terrain is hilly with elevations ranging from 275 to 645 m above sea level (a.s.l.) (Figure 1b).Several other studies were conducted in São Paulo State [6,9,20] but not yet in the Assis microregion.The Assis microregion was selected as a test site because it is typical for Brazilian agricultural production areas with large sugarcane plantations, but also many other (summer) crops, occupying together roughly ¾ of the land.According to the official agricultural production data (from IBGE in 2013 at municipal level), the Assis' main crops are sugarcane (32%), soybean (20%), corn (20%), cassava (2%) and peanut (1%).The area has thus a very high proportion of agricultural land use.Within the São Paulo State, the region is the largest producer of corn and soybean [58] The typical crop calendar (double cropping) for the Assis microregion is shown in Figure 2. The average crop cycle length is 100-120 days with multiple sowing and harvesting periods.Thus, the same crops may appear at different times during the year.Sugarcane and cassava crops are semi-perennial crops and have a much longer vegetative period compared to the other crops of the area.Sugarcane can keep green between 1 and 1.5 years before being harvested.Sugarcane fields are mostly reformed at the beginning of the year using forage plants such as soybean and peanut [10].The harvest can start at the beginning of April and last until December [60].Most sugarcane fields are harvested during the month of October [61].Cassava is planted either in August/September or in May.It can be managed in two different forms: one or two years.For the two-year cassava, the plants are trimmed after one year and then are kept one more year on the field.
Corn is cropped in the Assis microregion mainly as secondary crop (called "Safrinha").In 2013, almost 74% of the corn areas were planted in March and April after soybean harvest [62].No information about corn could be collected at the field work because the safrinha season started only in the end of April/beginning of May (Figure 2).Hence, in this study, corn is part of the class others.
Typical appearances of sugarcane, soybean, cassava and peanut fields from Assis microregion are shown in Figure 3 at different times of the year using RapidEye and Landsat 8 imagery.Note that cassava and peanut fields have a similar spectral behavior as soybean (in bands B4, B5 and B6).Photos taken during the field campaign are also shown.

Satellite Imagery
For this study, Landsat 8 and very high resolution (VHR) RapidEye images were used.Landsat 8 (Operational Land Imager, OLI) records nine spectral bands with a ground sampling distance (GSD) of 30 m [63] (Table 1).RapidEye records the spectral reflectance in five spectral bands at a GSD of 6.5 m [50]: • Two multi-spectral Landsat 8 image mosaics were used for image segmentation and classification.Seven additional mosaics from Landsat 7 (Enhanced Thematic Mapper Plus, ETM+) and Landsat 8 (OLI) images were used for the generation of reference information.
• One RapidEye mosaic from four images was used as ancillary data for assisting in the visual photo-interpretation and for (manual) field delineation as suggested by Forster et al. [44].
Landsat 8 Imagery.To cover the study area, three Landsat 8 (OLI) scenes from paths 221 and 222 are necessary (paths/rows 222/75, 222/76 and 221/76) (Figure 1b).The two paths result in a nine-day difference of images from adjacent paths.All images were level-one terrain-corrected (L1T) products (USGS, 2014) and derived from USGS web-site (http://earthexplorer.usgs.gov).All images had less than 25% cloud cover.For the image segmentation and classification, two Landsat mosaics were created using three Landsat 8 images for each (Table 2): • A late-season mosaic combining scenes from 21 August 2013 and 30 August 2013, • A mid-season mosaic using scenes from 28 January 2014 and 6 February 2014.
The chosen images had the highest quality from all images of the 2013/2014 growing season.All other images had high cloud coverage.
To create the Landsat 8 mosaics, the two images from Path 222 (222/75 and 222/76) were directly combined (same day imagery).This combined image is hereafter referred to as the "parent" image.The third image (221/76), hereafter named as the "son" image, was acquired nine days later.Using the overlapping parts of parent and the son images, a linear transfer function was derived for each spectral band to radiometrically match the son image to the parent image [40,64].
For the creation of reference samples (see next section), seven other Landsat (OLI/ETM+) mosaics were created in a similar way using the images listed in Table 2: • Three mosaics between the late-season and mid-season (from October, November and December), and • Four mosaics after mid-season (from February, March, April and May) VHR Rapid Eye Imagery.RapidEye images were used as ancillary data to manually delineate the crop areas (top row of Figure 3).This data was not used for classification or segmentation.Images from paths 285-288 and rows 15-19 were used (

Reference Data Collection
A large set of reference samples (points and segments) were created for model training and validation (Figure 4 and Table 3): (A) Reference samples from field visit (n=177), (B) Reference samples obtained through manual photo interpretation (n=500), and (C) Manually delineated and interpreted fields (n=2502).Samples (A) and (B) were pooled and used for model training.The calibrated model was validated using samples (C).Point samples.During a field campaign (7-12 March 2014), 177 sample locations were identified by GPS, visited and assigned to one of the five classes (samples A-black stars in Figure 4): peanut (24), sugarcane (30), cassava (38), soybean (35) and others (57).Amongst those 177 samples, 91 were pre-defined based on random sampling.The remaining 86 samples were added ad hoc during the field work as three crops (cassava, soybean and peanut) revealed only small spectral differences in the mid-season image.A GPS was used to navigate from one random point to the next.

Reference data
To limit the travel distance, the field study had to be limited to four municipalities of Assis microregion (Figure 4): Campos, Novos Paulista, Quatá and Paraguaçú Paulista.As the field visit took place almost 1.5 months after the Mid-season image acquisition, care was taken to assign the class label corresponding to the date of image acquisition.Straw and germinating plants resulting from the cropping/harvest process were useful in this respect-as well as detailed agronomical knowledge of the crops and the study region [10,66].Table 3 summarizes information about the field sampling points.After field work, additional 500 sample points (samples B) were randomly distributed over the Assis microregion and visually interpreted by experienced photo-interpreters using Landsat 8 imagery (Figure 4-red crosses), levering experienced gained during the field.Afterwards, samples (A) and (B) were merged together.We hereafter referred to them as "reference points".In Figure 4, the two sample sets (A and B) can still be distinguished by their respective symbols.
Test Sites Photo-Interpretation. To permit assessment of segmentation quality and for creating additional reference samples for independent validation, four 15 km × 15 km test sites were placed over the study area (Figure 4).In these four test sites, field boundaries were digitized manually and the resulting segments were visually assigned to one of the five classes (samples C-Table 3).The field boundaries were delineated on the computer screen using RapidEye imagery and Landsat images acquired during mid-and late-season.The fields were digitized and interpreted by experienced photo-interpreters at INPE using a pre-defined visual classification protocol created using false color compositions: (OLI 564) and (ETM + 453).
The four test sites occupy a total area of 900 km 2 .A summary of the collected reference polygons (n=2502) is given in Table 4.We will refer to samples (C) as "reference fields".Note that the reference fields do not enter in the calibration stage (i.e., model development and training) but are solely used for independent model validation.Validation was done at pixel level yielding about 1 million pixels (900 km²) for the 2502 fields.
The first (TS1) and forth test site (TS4) had the largest number of sugarcane and soybean polygons, respectively.These are the two most important crops in the area.The second site (TS2) is important because the parcels in this test site show the highest variability in size and this across the five individual (crop) classes.The particularity of test site TS3 is the high number of cassava and peanut fields, which are almost absent in the other three test sites.Table 4 and Figure 4 also reveal that the spatial distribution amongst the four test sites is very different.

Methods
A general flowchart of the methodology applied in this work is shown in Figure 5.The various sub-tasks will be outlined in appropriate sub-sections.

Image Segmentation and Data Extraction
We used the multiresolution segmentation (MRS) implemented in eCognition Developer 8 software for image segmentation.The software is extensively applied in OBIA studies [30].MRS is a sequential region merging method starting with single pixels.It groups mutually similar adjacent pixel pairs to form initial segments.These segments are then iteratively merged with neighboring segments as long as the change in heterogeneity is below a user-defined threshold.
We used the two Landsat images as input data by stacking Red, NIR and SWIR 1 bands of the two images.The three bands were selected following several studies [9,10,66].These three bands also show low inter-band correlation.
Instead of manually fine-tuning the software parameters, a wide range of different MRS were performed by altering parameter levels.We varied the scale value from 50 to 250 (step size 5), the shape and also the compactness weight from 0.1 to 0.9 (step size 0.2).The parameter combinations led to 1025 different segmentations which were exported in shape file format as well as GeoTIFFs.
Afterwards, for each segment and each segmentation, different attributes were calculated and exported in table format.Due to the similar spectral behavior of some classes also texture and shape information were used, which is also recommended by other studies (e.g., [9]): (1) The mean value and standard deviation of each spectral band for each of the two dates (6 × 2 × 2 = 24 variables); (2) Maximum difference and brightness (2 variables); (3) The normalized difference vegetation index (NDVI) for each date (2 variables); (4) The maximum, medium and minimum mode of the segments in each spectral band for each date (3 × 6 × 2 =36 variables); (5) The quartiles to the segments in each spectral band and each date (5 × 6 × 2 = 60 variables); (6) The GLCM (Gray-Level Co-Occurrence Matrix) homogeneity, GLCM contrast, GLCM dissimilarity, GLCM entropy, GLCM angular 2nd moment, GLCM mean, GLCM standard deviation, GLCM correlation for all directions (8 variables); and (7) Shape and size information of the shapes (17 variables).
In total, 149 features were extracted for each segment and used in the subsequent classification analysis.

Classification and Feature Selection
Random Forests (RF) classification [67] was used to generate the supervised classification models.RF is a popular ensemble learning classification tree algorithm, which became very common for remote sensing data classification in the past few years [35,49,[68][69][70][71][72].
One main advantage of RF is that the construction of each tree is based on a bootstrap sample of the same size as the input data set (sampling with replacement).The generated trees are applied only to the not drawn samples (out-of-bag data, OOB) and the majority votes of the different trees are used to calculate the model accuracy.Additional benefits of RF are the randomly selection of split candidates, the robustness of the output with respect to the input variables (distribution of variables, number of variables, multi-collinearity of variables) and the provision of importance information for each variable [67,73,74].The latter characteristic of RF permits the user to rank and select features.Several studies reported that a reduced/optimized feature set further improved the mapping accuracy [35,70,74,75].
The two parameters which have to be set in the RF classifier were chosen as follows: • For mtry (the number of input variables) the default setting was kept (i.e., the square root of the number of input variables).
• Ntree (the number of trees) was fixed to 1000.
For each RF model, we applied a backward feature selection based on the RF importance value "Mean Decrease in Accuracy" (MDA) following the recursive feature elimination procedure described in Guyon et al. [75].We started with a model based on all available features, calculated the importance values and eliminated the least importance feature.With the remaining variables, a new model was built and again the feature with the smallest importance score was eliminated.This procedure was repeated till only one feature was remaining.The model based on the smallest number of input variables reaching the 97.5th percentile of the classification accuracies of all models was chosen as final model.Note that the feature selection was applied independently for each of the 1025 segmentations.
In addition to the most common class (majority vote), the second most common class of the 1000 trees was also identified for each segment.For first and second class, the number of mentions was counted [76].The reliability of the classification result was calculated as the difference between the two counts.Values near one (all decision trees point to the same class) imply high reliability; values approaching zero indicate uncertain classification (the two most common classes were equally often classified).
We developed our classification modeling procedure in the open-source statistical software R 3.1.3[77] and used the additional R packages randomForest [78], raster [79] and caret [80].

Model Performance and Model Evaluation
The models were evaluated in two ways: • Based on the out-of-bag (OOB) samples from the (point) reference data set (i.e., samples A and B-see Table 3) • Based on the manually digitized reference fields as an independent validation data set (i.e., samples C-see Table 3 and Figure 4) The OOB error of the RF models was used for the first model evaluation.For each segmentation, a confusion matrix for the 677 reference points was calculated and the common statistics estimated (e.g., overall accuracy, user's and producer's accuracy, kappa).
The test sites were used afterwards as additional, independent validation.In this case, the classification results of the automatic generated segmentation were compared pixel per pixel to the manual delineated and visual classified reference data (Figure 4).

Model Accuracy OOB
The distribution of the Overall Accuracy (OA) based on the 1025 different RF-models after feature selection shows a left skewed distribution (negative skew) (Figure 6).The OA varied between 65.8% and 85.8% (Median 81.8%).If segmentation parameters are not properly chosen, this variability would possibly lead to an avoidable loss of classification accuracy.For example, almost half of the 1025 evaluations yield OA between 75% and 85% implying already a 10% range in output accuracy.Overall accuracy and kappa statistics for the 1025 trials are summarized in Table 5.An overview of the corresponding variation of the class specific classification results can be found in Table 6.The reported values summarize again the results of the 1025 different segmentations and classifications.The highest accuracy values were obtained for peanut (user's accuracy) and soybean (producer's accuracy).On average, cassava was classified as the worst.However, even for this (largely) underrepresented crop, some models yield satisfying results.
Amongst the 1025 segmentations/classifications, the model with the highest overall accuracy was identified.The best (OOB) result was obtained for scale = 70, shape = 50 and compactness = 90.This model includes 25 variables (after feature selection) mainly metrics from spectral bands and NDVI values of both images.No texture or shape information is included in the final model.The five most important variables are: 5th Quantile of SWIR 2 from January, NDVI from August, 5th Quantile of Green from January, 50th Quantile of SWIR 1 from January and Max.Difference.The confusion matrix in Table 7 presents the detail results of the model with the highest OA of 85.8% and a Kappa value of 0.786.Large differences can be noted between the class specific accuracies.In addition, 87.2% of the sugarcane and 85.9% of the soybean samples were classified correctly.The two other crops show significantly lower producer's accuracies of 74.3% for peanut and only 37.0% for cassava.
Due to the similar spectral behavior of some of the crops, the usage of two Landsat images is not enough to separate these crops.The inclusion of additional images often fails due to the cloud cover.The availability of additional data from new satellite sensors (CBERS-3, Sentinel-2) may provide remedy.
The various parameter combinations of the segmentation algorithm showed a distinct pattern with respect to the (OOB) classification accuracy (Figure 7).In particular, the scale and shape parameter modified in a systematic way the (subsequent) classification accuracy.Compared to scale and shape, the compactness parameter had overall a much smaller impact.
Best results (black dot in Figure 7) were obtained for scale = 70 and shape = 50.Both, higher and lower values decreased the OA.Note that to some extent, a (too) high scale parameter can be compensated by a decreasing shape value.Worst results were obtained for largely under-segmented images (i.e., high scale) while choosing at the same time high shape (and compactness) values.The somehow rough surface in Figure 7 shows the remaining impact of the (random) sampling.Nevertheless, the main trends are clearly visible.

Model Robustness and Transferability to Test Sites
The results reported so far refer to the (OOB) evaluations against (point) samples A and B (Table 3).To test the robustness and transferability of the selected (best) model, the final model was also evaluated against the manually digitized field polygons (see Figure 4 and samples C in Table 3).For this purpose, a wall-to-wall Land Cover (LC) map was produced using the (best) model parameters from the training stage.The resulting map (Figure 8) was subsequently compared (pixel-wise) against the reference classification.The areas with available reference classifications are indicated in Figure 8 by four (blue) rectangles.

Figure 8.
Classification results obtained for the study area using the optimum segmentation and RF classification model identified using the (OOB) statistics (e.g., scale = 70, shape = 50 and compactness = 90).The blue rectangles indicate areas for which complete coverage (manually digitized) reference information is available (see Figure 4).
Overall, a good transferability of the automatically identified classification model can be stated (Table 8).When using the parameters optimized using the (point) training samples, the OA in the four test sites decreased only a few percent compared to the OOB results (i.e., 0.793 instead of 0.858, respectively).The decrease in OA was more or less similar for all 1025 segmentation results (Figure 9).The main reason for the observed decrease relates to the fact that the class others shows significantly lower producer accuracy for four test sites.This may be the result of the high intra-class heterogeneity of this class, which is not fully covered by the reference data set.
Regarding the different (crop) classes, strong differences in the classification accuracies remained (Table 8).Compared to the OOB results, the cassava and peanut crops were classified worst and the soybean and sugarcane best.This resulted in a marked underestimation of the surfaces occupied by cassava and peanut.In Table 9 and Table 10, we report the results one would obtain when choosing the optimum segmentation parameters using the independent validation data from the test sites.Interestingly, the differences in OA are very small (i.e., OA of 0.801 compared to 0.793).In addition, the "optimum" parameter values are quite similar (Table 9 top).Together this clearly shows that the model selected over the (point) reference samples (A and B) well represents the entire data set and study region.Otherwise, one would see in Table 9 (right side) increased accuracies.The results also demonstrate that the obtained confusions can probably not be reduced with the set of features used in this study.The results are also a clear indication that the OOB error statistics from RF well represent the accuracies one obtains with completely independent samples.

Exploitation of the Classification Margin
The (per pixel) classification margin between the first and second class amongst the 1000 voting decision trees is shown in Figure 10 for the four test sites (bottom right in each 2 × 2 block).A good correspondence with the (binary) misclassification maps (bottom left) can be stated.
The added value of the classification margin becomes even more evident when comparing classification accuracies within different ranges of this margin (Figure 11).A clear tendency can be seen that classification accuracy increases with increasing classification margin.The findings in Figure 10 and Figure 11 demonstrate that the classification margin holds useful information that deserves further exploitation.The margin as an indicator of the classification reliability provides useful information about the quality of the map for the user and it could be also used to improve the classification procedure.For example, the model developer could decide to acquire additional samples in areas of low classification margin or introduce a rejection class.Excluding, for example, all pixels with classification margin ≤40% increases the overall accuracy (of the remaining samples) from 0.79 to 0.85 (not shown).

Conclusions
In this study, we presented and evaluated a novel approach for the combined use of image segmentation and classification with the ultimate aim to improve object-based crop type mapping while minimizing operator inputs.The method was applied to bi-temporal Landsat-8 imagery in order to separate five classes: sugarcane, soybean, cassava, peanut and others.We found acceptable overall accuracies on independent validation samples as well as low internal (out-of-back) errors within the applied random forest (RF) algorithm.One of the main advantages of the approach is that the user input is minimized.Both the choice of segmentation parameters and the feature selection/optimization were fully automated.This reduces work load and provides reproducible results, not possible using traditional approaches heavily relying on manual fine-tuning and expert knowledge.Compared to classical pixel-based classifications, our object-based approach leverages information not present in single pixels such as texture and form, while minimizing the negative impact of sensor noise and other random fluctuations.With the soon available data from the European Sentinel-2 sensor in 10 m spatial resolution, it is reasonable to expect further incentives for using object-based (OBIA) approaches.
In summary, our study permits to draw four main conclusions: 1. Classification results depend to a high degree on the identification of suitable segmentation parameters.In our study, the worst segmentation yielded an overall accuracy (OA) of only 0.66 compared to the best result with OA 0.86 (with kappa of 0.48 and 0.79, respectively).This large spread in classification accuracies underlines that procedures are needed for (automatically) choosing suitable parameter settings.Today, this is mostly done in a manual way (trial-and-error), not necessarily leading to the best possible result.2. It is possible to fully automate the identification of suitable segmentation parameters (as well as input features for the classifier) using a full factorial design framework as chosen for this study (e.g., testing a pre-defined set of parameter combinations within fixed intervals each time optimizing input features using MDA information).The segmentation parameters derived from relatively few training samples were close to those one would have identified directly from another set of validation samples showing the robustness of the approach.The proposed optimization framework can be applied to all type of segmentation algorithms.Obviously, the simple feature selection approach chosen for our study can be replaced by other techniques.In our case, the resulting error surface (e.g., classification accuracy) was smooth enough to permit running a guided optimization procedure (Simplex or similar).It is not clear, however, if this would really speed-up the entire process.3.For regions similar to the Assis study area with sub-tropical climate, soybean and sugarcane can be classified with acceptable accuracy using two Landsat scenes acquired in January/February and August.However, using such acquisition dates, it was not possible to correctly map cassava and (to a lesser extend) peanut.The spectral signatures of these two crops were too similar for permitting a successful discrimination and mapping.Additional information would be required to map peanut and cassava with acceptable accuracy.The added-value of additional scenes also from other satellite sensors (e.g., Sentinel-2) deserves being tested in future work.In our study, additional scenes could not be processed due to persistent cloud coverage during the vegetative period.4. The classification margin-produced as a by-product using the Random Forest (RF) classifierprovides valuable information to the land cover (LC) map and deserves more attention.We were able to demonstrate (at pixel level) that the margin between first and second most often selected class (majority vote) closely reflects the accuracy of the final LC map.We found a close (linear) relation between the classification accuracy at pixel level and classification margin, with only 48% of the pixels being correctly classified in the worst case (margin <10%), while >92% of the pixels were correctly classified in the best case (margin >90%).
As part of the next steps, we plan to speed up the processing, for example, by using alternative segmentation algorithms (e.g., mean shift).In the same spirit, we will also investigate the usage of other texture features such as wavelets.Wavelets are known to considerably reduce processing time while providing information not present in GLCM.The highest time-savings can probably be achieved by optimizing the feature extraction for the segments.For this, a separation of the feature extraction part from the segmentation step is foreseen.Finally, we plan to compare the proposed method of supervised segmentation against unsupervised segmentation optimizations based on variance and Moran's I.If both methods yield similar results, considerable savings in processing time could be realized.A speedup of processing would be highly welcome to take full advantage of upcoming satellite data such as Sentinel-2.

Figure 1 .
Figure 1.Soil types (A) and digital elevation model (DEM) (B) of the study area.Soil names are shown according to Word Reference Base (WRB-FAO) and São Paulo Environmental System (2015), respectively.DEM from Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) in 90 m resolution.In (B) are also shown the two Landsat paths covering the study area.

Figure 2 .
Figure 2. Crop calendar of the main crops of the Assis microregion: peanut, sugarcane, cassava, corn and soybean.PT (light grey): planting; HV (dark grey): harvesting; CL (light grey): clearing.In white: crops keep green during the period.Note that field work was done in the first half of March.

Figure 3 .
Figure 3. Example images for the classes peanut, cassava, sugarcane, soybean and others (field boundaries in white).Note that the class "others" includes not only agricultural areas but also constructions, water, etc.The photos from the field work were made around 40 days after the "Mid Season" images were taken (e.g., in March).The RapidEye image is shown in band combination 453 and was taken July 2011.The Landsat 8 images are shown in band combination 564 and were taken in August 2013 (Late Season) and January 2014 (Mid Season).

Figure 4 .
Figure 4. Reference information collected within the Assis microregion.Reference samples obtained through field visit are shown as black stars.Additional reference samples were obtained through photo-interpretation and are shown as magenta crosses.Four smaller test sites (named TS1 to TS4) were placed within the study area and field boundaries were manually delineated.The resulting segments were visually interpreted to yield reference polygons.

Figure 5 .
Figure 5. Flowchart of the methodology applied in this work.

Figure 6 .
Figure 6.Distribution of the OOB overall accuracies of the 1025 RF-models based on different segmentations (combining samples A and B).The reported accuracies were obtained after model-specific feature selection in the RF modeling.The best OOB models yields an overall accuracy of 0.858 (kappa = 0.786)

Figure 7 .
Figure 7. Overall accuracies (OOB) for all combinations of scale, shape and compactness parameter tested in this study combining samples A and B. For plotting purposes, the individual results have been grouped into classes and color-coded.The black dot indicates the overall best (OOB) result at scale = 70, shape = 50 and compactness = 90.

Figure 9 .
Figure 9. Transferability of models to four test sites with manually delineated reference fields.The OOB overall accuracies obtained for the (point) samples A and B (n = 677) are shown on the y-axis, for (polygon) samples C (n ≈ 1 million pixels) on the x-axis.Each point in the scatter plot corresponds to one combination of scale, shape and compactness (plus feature selection in the RF classifier).In total, results from 1025 different segmentations and classifications are displayed.The best model and the best mapping parameter set are highlighted as red and green points.In addition, the 1-to-1 line and the regression line (in grey) are shown.

Figure 10 .
Figure 10.Evaluation of the "best" model (identified through reference samples A and B) against the manually delineated and visually interpreted test sites.The selected model was optimized against the (OOB) reference samples not the reference polygons.For each of the four test sites are shown: (a) the reference map, (b) the classification result, (c) the agreement/disagreement between reference and classification, and (d) the classification margin.The higher the classification margin, the more confident the classifier "sees" its decision.

Figure 11 .
Figure 11.Evaluation of the information provided by the classification margin for the four test sites samples (C).For each class of classification margin ("majority vote difference") is shown the frequency of correct and erroneous classification.The values inside the bars indicate the percentage of correct classifications.

Table 2 )
. Two different dates were necessary to cover the Assis microregion (3 July 2012 and 27 November 2012).The 2012 image collection was acquired from the National Institute for Space Research (INPE) catalog [65].

Table 2 .
Images (Landsat and RapidEye) used for the study.

Table 4 .
Descriptive summary of reference information collected within four test sites (TS1 to TS4).Numbers to which we refer in the text are highlighted.The reported statistics relate to individual polygons.Polygons were manually digitized and visually interpreted.All areas are in ha.

Table 5 .
Descriptive statistics regarding the overall accuracy (OA) and Kappa coefficient for 1025 segmentation and classifications (OOB results combining samples A and B).

Table 6 .
Class specific producer and user accuracies amongst the 1025 different segmentations and classifications (out-of-bag (OOB) results combining samples A and B).

Table 7 .
Confusion matrix (OOB) of best classification amongst 1025 segmentations and subsequent RF optimizations (incl.feature selection).The best OOB result was obtained for scale = 70, shape = 50 and compactness = 90.The values in the matrix refer to point samples combining samples A and B.

Table 8 .
Confusion matrix applying the best model parameter set to the four test sites.The values in the matrix are given in km².

Table 9 .
Comparison of best parameter sets (and corresponding classification statistics) obtained from reference samples A and B (left) and test sites (C-right).For a fair comparison, the reported classification statistics refer in both cases to the four test sites.

Table 10 .
Confusion matrix obtained by selecting the best mapping parameter set from results obtained for the four test sites.The values reported in the matrix are given in km² and refer to the four test sites (sample C).