Mapping of Agricultural Crops from Single High-Resolution Multispectral Images — Data-Driven Smoothing vs . Parcel-Based Smoothing

Mapping agricultural crops is an important application of remote sensing. However, in many cases it is based either on hyperspectral imagery or on multitemporal coverage, both of which are difficult to scale up to large-scale deployment at high spatial resolution. In the present paper, we evaluate the possibility of crop classification based on single images from very high-resolution (VHR) satellite sensors. The main objective of this work is to expose performance difference between state-of-the-art parcel-based smoothing and purely data-driven conditional random field (CRF) smoothing, which is yet unknown. To fulfill this objective, we perform extensive tests with four different classification methods (Support Vector Machines, Random Forest, Gaussian Mixtures, and Maximum Likelihood) to compute the pixel-wise data term; and we also test two different definitions of the pairwise smoothness term. We have performed a detailed evaluation on different multispectral VHR images (Ikonos, QuickBird, Kompsat-2). The main finding of this study is that pairwise CRF smoothing comes close to the state-of-the-art parcel-based method that requires parcel boundaries (average difference ≈ 2.5%). Our results indicate that a single multispectral (R, G, B, NIR) image is enough to reach satisfactory classification accuracy for six crop classes (corn, pasture, rice, sugar beet, wheat, and tomato) in Mediterranean climate. Overall, it appears that crop mapping using only one-shot VHR imagery taken at the right time may be a viable alternative, especially since high-resolution OPEN ACCESS Remote Sens. 2015, 7 5612 multitemporal or hyperspectral coverage as well as parcel boundaries are in practice often not available.


Introduction
Monitoring agricultural lands and estimating crop production are crucial for countries whose economy heavily depends on agricultural commerce.This includes not only keeping track of the past production, but also short-term monitoring and yield estimation, to forecast agricultural production and inform marketing and trading decisions [1].Traditionally up-to-date information of crop production is acquired by farmer declarations and/or ground visits of the fields.This procedure is not only subject to some errors and inconsistencies but also time consuming and expensive [2].Hence, there is a demand for automated crop classification.However, the problem has not yet been solved completely; in particular for high resolutions at the level of individual fields (ground sampling distance in the range of 1-5 m), where coverage with dedicated sensors such as hyper-spectral cameras or polarimetric RADAR is prohibitively expensive.
A number of review papers have described the classification work carried out in the past and further elaborated on the advantages and disadvantages of the developed approaches, e.g., [3,4].Here, we limit ourselves to supervised classification.In agreement with much of the literature (e.g., [4][5][6][7][8]) we prefer to start from labeled training examples of each class, since different crops may in some cases not exhibit the clear spectral separability required by unsupervised approaches [9].In contrast to many other works, we prefer to use a single multispectral image for classification.Our motivation to use single-date imagery instead of multitemporal (or multisource) data is as follows: (i) single-date multispectral imagery is easier to obtain and cheaper; (ii) in productive crop fields, crop types can change on an almost monthly basis, which challenges multitemporal approaches and requires additional resources for frequent field work.In this study we therefore focus on the potential of (supervised) crop classification with single-date, high-resolution multispectral imagery.
The literature on crop classification is vast.In accordance with the topic of our investigation we only discuss supervised methods dealing with single-date optical images.Early efforts were dominated by pixel-based approaches, in which each pixel is separately categorized into one of the pre-determined classes, according to its spectral properties.In agricultural (and other) applications, the spectral properties are strongly influenced by changing imaging conditions, e.g., illumination, variations in soil moisture or nutrients, etc. Per-pixel classification methods often fail to capture the intra-class variability and therefore mislabel a rather large number of pixels.To increase robustness, it has been proposed to include additional grouping information rather than consider pixels in isolation.The basic idea of so-called parcel-based approaches is to divide the image into homogeneous regions using agricultural parcel boundaries, i.e., the location and extent of each field is assumed to be known a priori, e.g., [7,[10][11][12][13].Despite the success of the parcel-based approach, a drawback is that up-to-date parcel boundaries are not available for many agricultural areas, and collecting parcel or even field boundaries through interactive digitization and field observations is a time-consuming and expensive process.To bypass that effort object-based image analysis replaces the true field boundaries by the segment boundaries of an over-segmentation into homogeneous areas [14].The segments are then labeled instead of the parcels, e.g., [15][16][17].Such object-based image segmentation has gained quite some popularity, e.g., [2,[18][19][20][21][22][23][24][25][26][27][28][29].It often works well, but suffers from the conceptual problem that it aims to find segments made up of a single class during preprocessing, when by definition the class information is not yet available.As a consequence, the method tends to be sensitive to the (additional) parameters of the segmentation step, which directly affect the accuracy of the resulting thematic maps [25,30].
An alternative way to exploit information from larger pixel neighborhoods is to classify on a per-pixel level, but include a smoothness prior into the classification framework.The smoothness assumption corresponds to the basic fact that, as soon as the spatial sampling is dense enough, neighboring pixels tend to have the same class label.Schindler recently provided an overview and comparison of smooth labeling methods for land cover classification [31].Both local filtering methods and global labeling solutions were investigated in the study.The most striking conclusion was the magnitude of the performance boost (up to 33%) after imposing the smoothness prior.The best performance was achieved using global conditional random field (CRF) labeling with graph cuts [32].In spite of its popularity in image processing, to our knowledge, only a limited number of studies exist that use graph cuts in remote sensing image classification, predominantly in the context of hyperspectral images, e.g., [33][34][35][36][37][38].In line with the image processing literature, they conclude that the CRF formulation improved the classification accuracy and call for further experimental evaluations.
In this paper, we investigate a general framework for crop classification in agricultural lands from high-resolution, single-date optical satellite images, based on the CRF formulation.Specifically, the main objective of this work is to expose the performance difference between state-of-the-art parcel-based smoothing and purely data-driven CRF smoothing, which has to our knowledge not yet been investigated in the literature.Towards this objective, we perform extensive tests with four different classification methods (Support Vector Machines, Random Forest, Gaussian Mixtures, and Maximum Likelihood) to compute the unary potential (data term) of the CRF; and we also test two different definitions of the pairwise potential (smoothness term), namely the linear contrast sensitive [31,39] and exponential contrast sensitive [40] smoothing functions.To summarize, we make the following two specific contributions: • A detailed assessment between parcel-based smoothing with known parcel boundaries and data-driven CRF smoothing; as far as we know, such an evaluation is missing in the literature.• The first systematic study that assesses the effects of CRF smoothing, and of different smoothness functions, for high-resolution crop classification.
The experiments are carried out using four different high-resolution multispectral images taken from two different years, to yield results that hopefully generalize well over different very high-resolution (VHR) sensors (Ikonos, QuickBird and two Kompsat-2 images).We find that pairwise CRF smoothing comes close to the parcel-based method that requires parcel boundaries (average difference ≈ 2.5%).
Our results indicate that, on our test images, ≈90% classification accuracy over six crop classes can be reached with a single-date multispectral (R, G, B, NIR) image.

Method
In this section, we briefly summarize the standard CRF-based labeling process.Thereafter, we describe unary and pair-wise terms.Finally, a brief description of state-of-the-art parcel-based smoothing is provided.For further details please refer to the original publications or textbooks, as cited in the corresponding subsections.

Smooth Labeling with Conditional Random Fields (CRFs)
Image classification maps radiometric observations to class labels.Given a set of M pixels with observations x = (x1, x2, …, xM), we are looking for a function c(x) which assigns every pixel a label ci, one out of a set {c1, c2, …, cK} of possible class labels, to obtain a thematic map c = (c1, c2 , …, cM).To that end, every label configuration is assigned an "energy" E(c,x), which is the sum of two terms: where γ is a constant that controls the level of smoothing.The first term is related to the pixel-wise class probability conditioned on the data, and is referred to as "data cost" or "unary term" of a given pixel ( ( , )).It encodes the label preferences for each pixel i, based on the observed spectral values xi, with lower energy corresponding to higher likelihood of taking on label ci.The second term is referred to as "smoothness cost" or "pairwise term" for two neighboring pixels i and j ( , , , ), where n denotes the pixel neighborhood.This term assigns a cost to neighboring pixels if their labels differ, and should be smaller if the observations xi and xj are dissimilar.Note that, if this term is excluded from Equation (1), the classification reduces to the standard pixel-based strategy, which ignores the neighborhood relations between pixels.When smoothing is turned on, the energy depends not only on the attributes xi, but also on the labels cj of its neighboring pixels.Given the above notation, the maximum a posterior (MAP) labeling cmap of the random variables is Finding the exact minimum of energy E(c, x) for the multilabel case (K > 2) is NP-hard (non-deterministic polynomial-time hard, i.e., the computational complexity is exponential in the number of variables).However, for only pairwise cliques (first order random fields), efficient graph-based approximation methods exist [41].In a graph representation of an image, nodes correspond to pixels.The edges between the nodes of the graph are defined by neighborhood relations of pixels, and each edge carries an associated pairwise potential that encodes the smoothness cost between its two end nodes.To minimize the energy, different variants of belief propagation or graph cuts are available.We use the standard α-expansion algorithm [39], which iteratively visits different labels α and solves the two-class problem between α and all other labels.The algorithm is guaranteed to converge in finite number of iterations, and empirically is linear in the image size and quadratic in the number of labels.
We go on to briefly describe the different unary and pairwise terms tested in our evaluation.

Unary Terms
We test four popular classifiers, representative of the state of the art, to compute the data costs in Equation (1): Support Vector Machines, Random Forest, Gaussian Mixtures, and Maximum Likelihood.

Support Vector Machines (SVMs)
SVMs are based on the concept of structural risk minimization [42,43].In its basic form a SVM learns a separating hyper-plane between two classes which maximizes the margin between them [44].
The training points that define the hyper-plane are called support vectors, and completely define the classifier.To extend the concept to non-linear decision boundaries, the training samples are implicitly mapped to a higher-dimensional space with a kernel function.Details can be found in Vapnik [42].We use the radial basis function (RBF) kernel.Two parameters, the width of the kernel function and the strength of the regularization, control the behavior of the SVM (cf.Section 4.5).We prefer the one-against-one method to solve our multitask with the binary SVM algorithm, since it has been reported to offer better performance and shorter training time [45].SVMs have been shown to also work well for crop classification, e.g., [46][47][48].

Random Forests (RFs)
RFs are collections of discriminative decision tree classifiers.Different trees are generated by randomly picking the training samples and/or split functions of the individual trees [49,50] and averaged to improve the robustness of the prediction.Three parameters are required (cf.Section 4.5), namely the number of trees to grow, the number of variables used to determine the split at node and the minimum number of observations per tree leaf, respectively.In the implementation, Classification and Regression Trees (CART) are used as individual trees.For further details see [50].The RF classifier has been used in quite a few studies dealing with crop classification, e.g., [51][52][53].

Maximum Likelihood Classification (MLC)
MLC (in statistics texts referred to as "Quadratic Discriminant Analysis") is a classical generative model that has been widely used in remote sensing and is therefore included in our tests.It is based on fitting a single multivariate Gaussian to the training examples of each class.These Gaussians then determine the class-conditional likelihoods of a test sample [5].The normal distributions are completely determined by the training samples; hence the approach needs no user-defined parameters.

Gaussian Mixture Models (GMMs)
A Gaussian Mixture Model (GMM) is a parametric probability density function represented by a weighted sum of multiple Gaussian densities.The parameters of the Gaussian mixture model are the mean vectors, covariance matrices and mixture weights for all component densities.GMMs are frequently used in labeling tasks as a computationally efficient, generative model for multimodal distributions (in the limit they can in fact approximate arbitrary smooth densities).In general, a GMM is a sum of (few) Gaussian functions, each with their own mean and covariance matrix.There are variants on the GMM, e.g., the covariance matrices can be constrained to be diagonal or a single covariance can be shared amongst several components.The choice of model (cf.Section 4.5) is often determined by the amount of data available for estimating the GMM parameters.For further details see [54].In this work, we utilized the binary tree quantization algorithm [55] to initialize the class information and bootstrap GMM fitting.This avoids randomness in the GMM procedure and ensures consistent results across different runs.

Pair Wise Terms
In this work, we employ two variants of the smoothness prior , linear contrast-sensitive [31] and exponential contrast-sensitive [40].
In both versions, the penalty for a change of label depends on the intensity gradient between the two neighboring pixels.For the linear version, the gradients of image Ig are estimated with Gaussian derivative filters with standard deviation σ pixels (independently for each of four pixel directions).Thereafter, a truncated linear function maps the gradients to pairwise potentials (if ci ≠ cj): In Equation ( 3), ϕ denotes a constant value that defines the amount of dissimilarity beyond which no more penalties are applied.Because the images used in this work have four channels (R, G, B, NIR), the largest (absolute) gradient over all channels is used [31].It should be emphasized that two neighboring pixels pay no penalty if they are assigned the same label (if ci = cj).
The second variant of the smoothing instead uses an exponential contrast sensitive function [40]: Here, dij denotes the distance between the neighboring pixels computed in image space, || .|| is the Euclidean norm, and β is a normalization constant computed for each image independently using where the symbol mean( .) denotes the average operator.As in the linear contrast-sensitive smoothing, no penalty is given to two neighboring pixels if they are assigned to the same label (if ci = cj).Note that in contrast to the linear variant the gradient is estimated in (R, G, B, NIR)-space.

Parcel-Based Smoothing
Parcel-based classification is seen as the state-of-the-art for agricultural crops.It combines remotely sensed imagery with vector data of field boundaries [11,[56][57][58].With regard to crop classification, this means that the location and the extent of each field are known.The integration between the two datasets can be achieved (i) before classification, (ii) during classification and (iii) after classification (in the terminology of [59] "pre-classification stratification," "classification modification" and "post-classification sorting").Various studies indicate good performance by integrating images and boundaries after classification [7,10,[56][57][58][59][60][61][62], hence we prefer to integrate parcels after the classification.First the input image is classified with one of the base classifiers (cf.Section 2.2).Then, the cumulative class likelihoods over all pixels in a parcel are computed and all pixels are assigned to the class with the highest cumulative likelihood.By aggregating information over entire fields the method can overcome variability within fields (assuming correct boundaries).

Dataset
The study area is located in Karacabey, an agricultural area in Bursa, northwest Turkey (Figure 1a).It covers approximately 100 km 2 , centered at 28°14ʹ12ʺ and 40°11ʹ09ʺ.Rich and loamy soils with good weather conditions (mean annual temperature of 14.4 °C and mean annual precipitation of 706 mm) make the study area one of the most productive and valuable agricultural regions of Turkey.The area has a flat terrain and the mean elevation above sea level is 10 m.The major crop types cultivated in the test site are corn, tomato, wheat, rice, sugar beet.The area also contains several grassland fields (pasture) providing feed to livestock.Most of the fields in the study area have a rectangular shape based on a land consolidation project performed between 1988 and 1992.The sizes of the fields are between 0.0074 ha and 48 ha (Figure 1).The reference parcel maps used in this study comprise of two different field boundary maps produced from 1:5000 scaled digital cadastral maps of the study area, corresponding to updated field boundary information in 2004 and 2008.The parcel map includes 4130 agricultural fields for the Ikonos and QuickBird images (2004) while the Kompsat-2 (2008) images have 4685 agricultural fields, where each field encloses only one crop type, due to the changes of cultivation activities between years.Note that the data include two different years (2004 and 2008) with greatly changed parcels/crop patterns, which can be considered independent tests for the same crop types.
Reference data related to fields were collected during field works performed concurrently with the image acquisitions.In our sampling strategy, we expect to have our results comply with at least 95% confidence level and 5% margin of error.Therefore, amongst the fields available across the study area (4130 agricultural fields for the Ikonos and QuickBird images, and 4685 agricultural fields for Kompsat-2 images), evenly distributed subsets of 1012, 914, 631, and 382 parcels were selected and visited during the fieldworks to represent the variety of crop types for the Ikonos, QuickBird, and Kompsat-2 (June and July) images, respectively.Thereafter, four different pixel-based reference maps corresponding to four images are generated from the related parcel maps by assigning a crop label to the pixels enclosed by each parcel boundary.For analyses, the reference data were separated into two groups: one group (about 3%) to be used for training samples and the rest to be used for assessing the results of the classification.The training samples for each image are selected by an experienced operator.However, note that the samples are collected independently from each image to achieve the best performance for each dataset.We also utilized transformed divergence index to investigate the spectral separability of training samples (average separability values computed are 1.9756, 1.9191, 1.9237, and 1.9209 for the Ikonos, QuickBird, and Kompsat-2 (June and July) images, respectively).
During preprocessing, the images were orthorectified with the supplied Rational Polynomial Coefficients (RPC), using commercial remote sensing software [63].The digital elevation model (DEM) for orthorectification was generated from 1:25,000 scale digital maps obtained from the General Command of Mapping, which is the national mapping agency of Turkey.These maps are compiled to NATO level A standards.The Ikonos and QuickBird MS images were orthorectified using 20 evenly distributed GCPs.Root Mean Square (RMS) values of the orthorectification were below one pixel for these images (±0.40 pixel).The Kompsat-2 MS images were orthorectified using 16 (for June) and 19 (for July) homogeneously distributed GCPs.The RMS values of the Kompsat-2 images were computed as 0.34 and 0.47 pixels for the June and July data, respectively.We utilized nearest-neighbor resampling for all images during the orthorectification, because the corresponding errors of up to 0.5 pixels seemed acceptable in order to preserve the original spectral values provided by the image vendors.

Experiments
We now proceed to a detailed investigation of the classification strategies (without smoothing, CRF smoothing, and parcel-based smoothing).Two measures, overall accuracy (total number of correct pixels/total number of reference pixels) and kappa index (difference to chance), are calculated.We note that the kappa index closely correlates with the overall accuracy, and additionally suffers from conceptual problems [64], but nevertheless quote it for completeness, as it has been widely used in earlier studies of crop classification.We also provide confusion matrices in which detailed performances of each crop type as well as producer's and user's accuracies can be examined.To assess the significance of the observed differences between different methods, we employ the paired-sample Wilcoxon signed rank test (The Wilcoxon test is a rank-order based pairwise test for general distributions.The choice is motivated by the fact that the tested input samples-accuracies for different images-cannot be assumed to follow Gaussian normal distributions, which would be required for the paired t-test).The test is based on the null hypothesis that the classification performances of the classifiers are not systematically different from each other.It should be pointed out that four samples (images) are not sufficient to properly determine statistical significance.Still, whenever one method beats the other one on all four datasets then there is an 87.5% probability that the dominance is systematic and not due to chance (i.e., the p-value of the Wilcoxon test is 0.125).
The assessment is run with parameter configuration(s) that maximize the performance of each method.The implementation and processing was performed in MATLAB, except for the SVM, for which we utilized LIBSVM [65].All experiments were performed on a computer with an Intel i7 processor with 2.40 GHz and 16 GB RAM.In Section 4.1, we compare the base classifiers without any smoothing.Section 4.2 compares performance with and without smoothing, and Section 4.3 investigates the differences between data-driven and parcel-based smoothing.The classification performance with respect to individual crop types is discussed in Section 4.4.Finally, the sensitivity of each method to the main parameters is tested (Section 4.5).

Performance without Smoothing
Overall per-pixel accuracies for all base classifiers are given in Figure 2. SVM provides the highest performance for all images (Ikonos 80.6%, Kompsat-2 July 83.6%,QuickBird 81.5%, Kompsat-2 June 63.3%).Classification performance for Kompsat-2 June is noticeably lower (with all methods).This is mainly because of the imperfect acquisition period.In June, most of the crop types are in the early planting season, and bare soil dominates the visible surface, and hence the spectral response.SVM consistently works best, which suggests a systematic, albeit small, advantage.Note however that this comes at the price of significantly longer processing time (Figure 3  over different runs as a consequence of the randomness in the classifier (for the Kompsat-2 June image we observe a somewhat larger spread with standard deviation 1.6%).The reported results of RF in Figure 2 are means over 20 runs.Also, we stress that RF classifiers are not as "standardized" as others and results may vary depending on the implementation.MLC provided good results in our tests as a base classifier.This indicates that for our crop classes the assumption of a unimodal Gaussian distribution holds well; however, this rather restrictive assumption might not apply for other crop types.Note also that the MLC and GMM methods suffer from the curse of dimensionality, i.e., for larger feature sets that also include texture filters etc., it might not be possible to learn them reliably from a reasonable number of training samples.

Performance of CRF Smoothing
We observe significant performance improvements with CRF smoothing, both for linear contrast sensitive (LCS) or exponential contrast smoothing (ECS), see Figures 4 and 5.According to the signed rank test over all 16 trials (4 methods × 4 datasets) the differences are highly significant (p-value 0.0004).The best results overall are achieved with the combination SVM-ECS (Ikonos 86.4%, Kompsat-2 July 92.4%,QuickBird 90.8%, Kompsat-2 June 74.4%).In our tests the improvement in overall classification accuracy due to smoothing lies between 5% and 14%, with a mean of almost 10%.When comparing different smoothing functions, ECS dominates LCS in all runs except for two cases on the Ikonos dataset (Figure 5).Even though there is a systematic difference between LCS and ECS according to the signed rank test (p-value 0.002), the performance difference between the two is small (<2%), and perhaps not very relevant for our problem.Still, ECS also requires fewer user parameters (cf.Section 4.5).Overall, we may conclude, in line with other studies, that also for our task of crop classification it is vital to use smoothing, whereas the exact choice of the potential is less critical.

CRF vs. Parcel-Based Smoothing
The results in Figure 6 indicate that state-of-the-art parcel-based smoothing consistently gives the best results for all four data sets.This is not surprising given that strategy utilizes land parcel information in addition to image data, which induces ground-truth segmentation boundaries and greatly increases the redundancy per parcel.Again, the gains are consistent across different base classifiers and datasets and highly significant (p-value 0.0004).On average, SVM-ECS results are 2.5% lower in OA and 3.5% lower in Kappa.Although there clearly is a difference, it is relatively small compared to the amount of additional information used by the parcel-based method.Purely data-driven classification may therefore be a useful alternative for many practical applications: gathering parcel boundaries and keeping them up-to-date is rather time consuming, especially in operational applications, which typically cover much larger areas.It also requires experienced operator(s), and the digitized boundaries may (partially) require validation in the field.Furthermore, in valuable agricultural regions where multiple crops are cultivated in one season, parcel information must be updated several times per year.Considering these requirements, we believe that state-of-the-art methods without parcel boundaries, such as the proposed smoothing with graph cuts, are an excellent option for many tasks.In our experiments they achieve results almost on par with those of parcel-based classification, and absolute accuracies that should allow for their use in many practical applications (>85% accuracy for images acquired at the correct time of year).

Discussion of the Classification Results
In this section, we go through the confusion matrices, user's accuracies (UA) and producer's accuracies (PA) in more detail.Among the data-driven methods, we only discuss the most promising ones (cf.Section 4.2).Generally, among our crop types, confusions happen mostly between sugar beet, tomato, pasture, and corn.This can be explained by the similar spectral response characteristics of these crop types and the confusion of crop reflectance with bare soil during the early planting season of the crops (e.g., Kompsat-2 June image).Detailed comments for each image are provided in the following.
Figure 7 shows the outputs for the Ikonos image, and Table 1 shows the corresponding confusion matrix of SVM-ECS.The total number of reference pixels used for the classification is 1,840,093.The major source of error is confusion between pasture and corn.Of the total 257,703 pasture pixels, only 153,932 are correctly classified.A similar behavior is observed for sugar beet, in this case due to confusions with tomato, which is spectrally similar.Overall, the classifier seems to be biased towards large classes, leading to high PA (up to 97% and always above a remarkable 82%), but for some classes low UA (many false negatives).It may be possible to mitigate this behavior by additional prior knowledge about the expected (relative) frequency of different crop types.The classification outputs for the Kompsat-2 July image and the confusion matrix of SVM-ECS are shown in Figure 8 and Table 2, respectively.The total number of reference pixels is 901,107.The results are similar.Once again, a major confusion is observed between tomato and sugar beet due to their similar spectral responses.Also important is the confusion of corn with rice and wheat.PA is generally high (>84%), but rare classes tend to be swallowed and have low UA.The confusion matrix for the QuickBird test image is given in Table 3.The classification results are illustrated in Figure 9.The total number of reference pixels included is 4,811,904.Main sources of error are confusions between rice and wheat, and between sugar beet and tomato.In this case, the bias toward abundant classes is less severe (worst UA > 72%).PA nevertheless stays high (rice 74%, others > 88%).We note that for the corn class we get 95.7% PA and 95.5% UA, which is quite remarkable for a method using only four broad multispectral channels.Compared with the other three test images, the degree of confusion is higher for the Kompat-2 June image (Figure 10 and Table 4).As mentioned before, this is mainly due to the acquisition date shortly after the early planting period, causing distorted spectral responses of the different crop types.
As plants are at an early growth stage, they are harder to discriminate and they only partially cover the ground, such that bare soil is visible and disturbs the classification.Another more technical factor is the limited number of reference data for two classes, corn and tomato.Possibly the training set might be insufficient to learn these two classes.The total number of reference pixels for validation is 1,186,694.For the small classes (corn and tomato) classification essentially fails.Furthermore, there are important confusions between pasture and wheat.Still, even in this challenging setting the larger classes reach UA and PA of 85% or more, and a respectable 74% of all pixels are classified correctly.One can speculate that with an image taken only two weeks later, satisfactory results might be possible.Comparing the performances for Ikonos, QuickBird and Kompsat-2 July images, we achieved the best classification performance for the Kompsat-2 July dataset (OA: 93.32% and Kappa: 90.95%).Indeed, this is a result we expected, because the number of reference pixels for classes corn and tomato is lower in 2008 compared to 2004; therefore, confusions between these two classes have a smaller influence on the overall performance.

Sensitivity to Parameters
Like any classification pipeline ours requires a small number of parameters to be set by the user.To assess the robustness with respect to this user input we ran a parameter study.We limit ourselves to the main parameters, whereas values that would normally not be changed for different problems and datasets remain fixed, see Tables 5 and 6.For the GMM classifier, the main parameter is the number of Gaussians.Although a number of approaches exist for estimating the number of components, e.g., [66], our tests performed with different numbers of components reveal that this parameter can be set across a range of values (3)(4)(5) without large impact on the performance (Figure 11).We think that the reason is the binary tree quantization clustering [55] used to initialize the GMM parameters.For the remaining parameters, we use default values: independent full covariance matrices for each component, and a small regularization term (10−5) to ensure positive-definite covariance matrices.For the RF classifier, the number of trees is a user-defined parameter.Figure 12 shows that, except for an overly small number of trees (≤4), the performance is rather stable.Other parameters of the RF classifier are set to fixed defaults, since preliminary tests showed that they do not significantly affect the performance (<0.5%).

Overall Accuracy vs. Number of Trees
For the SVM, we prefer the radial basis function (RBF) kernel, which has only one additional parameter compared to the linear kernel, and is known to perform well across a variety of classification tasks (e.g., [47,67,68]).For the two parameters-the kernel width gamma and the cost for misclassified samples-we run a grid search (Figure 13).The results in Figure 13 confirm that different combinations of SVM parameters affect the overall accuracies at most 3% for Ikonos, Kompsat July and QuickBird test images, and the performance is rather consistent (within 2.5%) across a range of values.The largest differences in classification accuracy, up to 11%, are observed for the Kompsat June image.It is interesting to note that the trend is similar to the ones observed for Ikonos and Kompsat July, namely that the main failure mode is a too large kernel width.We conclude that it is easy to choose a parameter set that generalizes across different images, and set, Gamma = 0.5, Cost = 2100 for all our tests.Table 6 summarizes the parameters used for the CRF smoothing approaches.We fix the connectivity to the 8-neighborhood for both of the smoothing strategies; it has been repeatedly observed that it performs very similarly to the 4-neighborhood, but slightly reduces metrication artifacts along class boundaries.
We also fix the standard deviation of the gradient filter to the standard value of 0.5.The parameters that we vary are the truncation of the linear potential function (ϕ) and the smoothness weight (γ).
Our tests reveal that the best results are achieved when ϕ is set to 0, i.e., no penalty at intensity gradients above 50% of the observed maximum.Figure 14 shows how the overall accuracies of SVM-LCS change with respect to different truncation values.We observe that overall accuracies are all negatively affected by large ϕ values, especially with stronger smoothing.
We find that the performance across a range of smoothness weights γ is similar, both for LCS and ECS.However, we also observe that the performances drop slightly when over-smoothing occurs and this fact is mainly due to the relatively small-sized parcels.In our last experiment, we test the effect of training set size on classification accuracy.We randomly selected, at each run and for each class, a number of training samples (5, 10, 30, 50, 100, and 500 pixels) from the whole training set, and ran the classification without smoothing, with CRF smoothing, and with parcel-based smoothing.Figure 15 shows the classification performance as a function of training set size.While too-small training sets are obviously detrimental, all classifiers perform acceptably with 50 training samples per class, and with 100 training samples almost reach the performance achievable with the full training sets, which have >10,000 pixels.
An exception is the Kompsat-2 June image, where the classification performance does not seem to saturate, indicating that the class boundaries are indeed more complicated due to the early acquisition date.Also note that the results with smoothing follow the ones without smoothing-even when the performance of the unaries saturates, smoothing still helps, but weaker unaries learned from overly small training sets also benefit from smoothing.
Figure 15d shows how training set size affects the computation times during testing.As expected, runtimes of MLC and GMM are independent of the training set size, as the model complexity does not change with more samples; RF takes slightly longer with more training data, as the trees can grow deeper; and SVM is strongly affected, since the number of support vectors grows with the training set size.In the worst case (QuickBird) the processing time reaches ≈ 4 h.Note, all timings are indicative and depend on the specific implementation.In particular, multiclass SVMs as well as RF naturally lend themselves to parallelization.The time spent for smoothing is practically constant in each image and slightly longer for the QuickBird data, since with a fixed neighborhood definition it depends only on the number of pixels and the number of classes.

Conclusions
We have investigated the possibility of classifying agricultural crops at high spatial resolution, using only a single multispectral image as input.In an extensive evaluation, we find that (i) smoothing based on known parcel boundaries, as expected, delivers the result, (ii) purely data-driven smoothing comes close to the parcel-based approach and could in many applications be a valuable alternative, (iii) if the images are taken at the right time of year, i.e., after the crops have developed sufficient size and coverage, standard classifiers achieve satisfactory classification accuracy, using only the R, G, B and NIR bands.
The four different tested classifiers deliver rather similar results.SVM provides the best performance in all cases, but at the cost of longer processing times.Our tests also show that CRF smoothing increases the overall classification performance by about 10%.The results indicate that ECS smoothing slightly (≈2%) outperforms LCS smoothing.Notably, the difference between parcel-based smoothing and CRF smoothing is rather small (≈2.5%).Considering the effort to obtain up-to-date parcel boundary information, CRF smoothing seems to be an interesting alternative unless reliable parcel boundaries are already available.
A further finding is that moderate training sets sizes of about 100 pixels per class are sufficient, in spite of the rather coarse spectral resolution.Note however that it might in practice be easier to collect larger training sets, in order to make sure that the samples are representative of the full class distribution.
This study also highlights the importance of image acquisition dates based on crop phenology.For most crop types, it is better to be in their highest ripening period during image acquisition.Such a setup reduces the visibility of bare soil and guarantees a high coverage of the parcels with crops.In this study, we observe that in Mediterranean climate images taken in June provide poor classification results compared to those acquired in July and August.The spectral confusions between (early) crops and bare soil are most apparent in June, and are a major source of classification errors.However, we should also stress that there may be large variations of crop phenology among different crop types for different geographic locations and times.There might be crop combinations for which no single best acquisition period exists; or the ideal time window might be very short or fall into an unfavorable weather period, such that it becomes a matter of luck to obtain cloud-free VHR images.In such circumstances one might have to revert to multitemporal datasets [8,52] and/or other sensor types such as SAR.
Considering the individual crop types used in our study, confusions are principally observed between pasture/corn, rice/sugar beet, tomato/sugar beet, and corn/sugar beet.We explain this fact mainly by similar spectral characteristics of these classes under the limited spectral resolution (B, G, R, NIR) of the images.This, of course, can be mitigated to some extent by collecting more training samples or by integrating additional features and color spaces [69] into the classification framework.Currently, we have not tested the impact of additional information, but we plan to do so in the future.
On a more technical note, we believe that high-throughput classification is possible with the tested methods, but this would require parallel implementations, probably using GPUs.The obvious starting point for an efficient implementation is the pixel-wise computation of the unaries, but other multilabel energy minimization methods like FastPD [70] could potentially also bring down computation times.

Figure 1 .
Figure 1.(a) Study area, (b) Ikonos image (RGB) overlaid with the parcel boundaries (in yellow color).(c) and (d) Histogram showing the distribution of agricultural fields with respect to their sizes (ha) in years 2004 and 2008, respectively.

Figure 3 .
Figure 3. Processing time of each method.

Figure 11 .
Figure 11.Overall accuracy vs. number of components of GMM.(The results of Kompsat-2 June image are presented on the right-y axis to improve visibility).

Figure 12 .
Figure 12.Overall accuracy vs. number of trees of RF classifier.(The results of Kompsat-2 June image are presented on the right y axis to improve visibility).

Figure 13 .
Figure 13.Sensitivity of SVM parameters with respect to overall accuracy.Please note the different scales of the color bars.
bands lie between 0.45-0.52μm to blue, 0.51-0.60μm to green, 0.63-0.70μm to red, and 0.76-0.85μm to near-infrared channel.The processing level of the Ikonos image is standard, i.e., geometrical and radiometric distortions of the sensor have been corrected, and the images have been geo-coded without ground control points (GCPs).The QuickBird MS image has three visible bands (blue, green, and red) and one NIR band with 2.4-meter spatial resolution.Spectral intervals of the QuickBird MS image are 0.45-0.52μm,0.52-0.60μm,0.63-0.69μm,0.76-0.90μmforblue,green, red, and NIR band, respectively.The QuickBird standard imagery was radiometrically corrected; sensor and geometric corrections were also applied.Dataset II: The second dataset are composed of two Kompsat-2 images collected in 2008 (13June, (early-season) and 11 July (mid-season)).Akin to the Ikonos image, the Kompsat-2 MS images have three visible and one infrared image channel with 4-meter spatial resolution.The spectral characteristics of Kompsat-2 MS image are analogous to the characteristics of QuickBird image.The processing level of the Kompsat-2 image is Level-2A, which represents similar properties compared with the processing level of Ikonos data.
PercentageHectaresranges of these

Table 1 .
Confusion matrix of SVM-ECS classification for Ikonos test image.

Table 2 .
Confusion matrix of SVM-ECS classification for Kompsat-2 July test image.

Table 3 .
Confusion matrix of SVM-ECS classification for QuickBird test image.

Table 4 .
Confusion matrix of SVM-ECS classification for Kompsat-2 June test image.

Table 5 .
Parameter test settings of each classifier without smoothing.

Table 6 .
Parameter settings of each classification with smoothing.