Synergistic Use of Radar Sentinel-1 and Optical Sentinel-2 Imagery for Crop Mapping : A Case Study for Belgium

A timely inventory of agricultural areas and crop types is an essential requirement for ensuring global food security and allowing early crop monitoring practices. Satellite remote sensing has proven to be an increasingly more reliable tool to identify crop types. With the Copernicus program and its Sentinel satellites, a growing source of satellite remote sensing data is publicly available at no charge. Here, we used joint Sentinel-1 radar and Sentinel-2 optical imagery to create a crop map for Belgium. To ensure homogenous radar and optical inputs across the country, Sentinel-1 12-day backscatter mosaics were created after incidence angle normalization, and Sentinel-2 normalized difference vegetation index (NDVI) images were smoothed to yield 10-daily cloud-free mosaics. An optimized random forest classifier predicted the eight crop types with a maximum accuracy of 82% and a kappa coefficient of 0.77. We found that a combination of radar and optical imagery always outperformed a classification based on single-sensor inputs, and that classification performance increased throughout the season until July, when differences between crop types were largest. Furthermore, we showed that the concept of classification confidence derived from the random forest classifier provided insight into the reliability of the predicted class for each pixel, clearly showing that parcel borders have a lower classification confidence. We concluded that the synergistic use of radar and optical data for crop classification led to richer information increasing classification accuracies compared to optical-only classification. Further work should focus on object-level classification and crop monitoring to exploit the rich potential of combined radar and optical observations.


Introduction
One of the requirements for ensuring food security is a timely inventory of agricultural areas and the regional proportion of different crop types [1].Next to the public sector, the private sector, including the agro-and insurance industries, benefit as well from early season crop inventories as an important component of crop production estimation and agricultural statistics [2,3].In addition to regional estimates, early crop type information at the parcel level is also an essential prerequisite for any crop monitoring activity that aims at early anomaly detection.Satellite remote sensing has proven to be an invaluable tool for accurate crop mapping in regions across the world [4][5][6][7].
The most conventional way of performing satellite-based crop classifications is the use of optical imagery.This has started with the Landsat missions [8][9][10].Gradually, the spatial, temporal, and spectral resolutions have increased, constantly improving the classification results [11][12][13][14].
In June 2015, Sentinel-2 was launched as part of the Copernicus Sentinel-2 mission [15].This mission consists of a constellation of two identical satellites, Sentinel-2A and Sentinel-2B.The revisit frequency of one satellite is 10 days, resulting in a revisit frequency of 5 days for the constellation.Considering overlap between adjacent orbits, the coverage frequency in Europe is often even below 5 days.Sentinel-2 carries a MultiSpectral Instrument (MSI) with 13 bands in the visible, near infrared, and shortwave infrared part of the spectrum.These state-of-the-art specifications have been used in recent research on crop classification [16][17][18].
However, clouds remain a major drawback for optical sensors, since they cannot be penetrated by optical radiation.Clouds and cloud shadows, therefore, lead to gaps in optical imagery and missing data in optical time series [19].For classification and monitoring purposes, this drawback significantly affects performance [6,20].
Among possible strategies to overcome the disadvantages of clouds and shadows are multisensor combinations, especially strategies that exploit different parts of the electromagnetic spectrum [21].Microwave radiation in the C-band, for example, has wavelengths well above the typical size of cloud particles and can therefore penetrate clouds mostly unaltered [22].Satellite sensors exploiting this part of the spectrum need to send out their own energy pulse and subsequently measure the reflected energy.A synthetic aperture radar (SAR) is such a system that uses the motion of the instrument to attain a satisfactory ground resolution [22].
Although the potential of SAR observations from space has been demonstrated extensively, their availability in the past was restricted to specific campaigns [23,24].However, with the advent of the Sentinel-1 mission, SAR applications have become more widespread, with almost global data availability at no charge and regular time intervals [25,26].The revisit frequency is 6 days now that both the Sentinel-1A as well as the Sentinel-1B satellites are operational.Considering overlap and combining ascending and descending orbits, this leads to a Sentinel-1 observation every ~2 days in Europe.
Contrary to optical imagery that provides insights into biophysical processes of vegetation, SAR imagery reflects the structural components and moisture content of vegetation and the underlying surface, resulting in a complementary data source that can be used for classification purposes [27].Figure 1 shows Sentinel-1 radar backscatter profiles in addition to Sentinel-2 normalized difference vegetation index (NDVI) observations for a winter wheat field in Belgium.The radar response to the structural development of the wheat plants is clear, with for example a pronounced decrease in VV backscatter during the vertical stem elongation phase.
The rich information on plant structure that is present in SAR backscatter amplitudes has been used for classification purposes in previous studies, in particular for rice and forest mapping [28][29][30], but more recently also for broader crop type mapping [31].
The complementarity of optical and radar information allows the development of multisensor classification procedures that exploit both sources simultaneously [21].This has led to numerous successes in integrated SAR/optical land use classifications [32][33][34][35][36].In addition to these broad land use classes, previous studies have also shown how radar and optical imagery can be jointly used for crop type identification.Blaes et al. [37] reported at least 5% accuracy gain for crop type discrimination when adding ERS and RADARSAT radar imagery on top of optical SPOT and Landsat images.Mcnairn et al. [38] successfully used several RADARSAT-1, Envisat ASAR, SPOT-4/5, and Landsat-5 images for annual crop inventories, while Soria-Ruiz et al. [39] used RADARSAT-1 and Landsat and reported higher accuracies for land use classifications in cloudy regions in Mexico, considering a few major crop classes.Based on combined Sentinel-1 and Landsat-8 data, Inglada et al. [40] specifically quantified the added value of joint radar/optical imagery for early crop type identification, pointing towards future potential for integration of Sentinel-2 imagery.Zhou et al. [41] also combined Sentinel-1 and Landsat-8 but focused on the detection of winter wheat in China.Recently, Sentinel-1 was also shown to improve the detection of irrigated crops in India, especially during the monsoon season [42].For the agricultural landscape in the United States, recent work by Torbick et al. [43] has demonstrated within-season crop inventories based on near real-time (NRT) Sentinel-1, Sentinel-2, and Landsat-8 imagery.However, despite the existing efforts to combine optical and radar imagery for crop type identifications, few studies exist to date that (i) combine dense time series of Sentinel-1 and Sentinel-2, (ii) classify a broad range of crop types on country scale, and (iii) provide insights into classification confidence and performance along the growing season.
The aim of this study was therefore to address these shortcomings and to provide further evidence of the added value of joint radar and optical imagery for crop type identification.We used dense time series of Sentinel-1 backscatter intensity and Sentinel-2 NDVI during the growing season of 2017 over Belgium to address the following research questions:


What is the pixel-level classification accuracy in function of the input source (optical, radar, or a combination of both)? What is the evolution of classification accuracy during the growing season and how can this be framed in terms of crop phenology? What is the individual importance of each input predictor for the classification task? What is the classification confidence and how can this information be used to assess the accuracy?
We first describe the relevant data sources used and the preprocessing routines for Sentinel-1 and Sentinel-2 imagery.Next, a hierarchical classification methodology is discussed in view of different classification schemes.Finally, the results are discussed in light of crop type differences across the growing season.However, despite the existing efforts to combine optical and radar imagery for crop type identifications, few studies exist to date that (i) combine dense time series of Sentinel-1 and Sentinel-2, (ii) classify a broad range of crop types on country scale, and (iii) provide insights into classification confidence and performance along the growing season.
The aim of this study was therefore to address these shortcomings and to provide further evidence of the added value of joint radar and optical imagery for crop type identification.We used dense time series of Sentinel-1 backscatter intensity and Sentinel-2 NDVI during the growing season of 2017 over Belgium to address the following research questions: What is the pixel-level classification accuracy in function of the input source (optical, radar, or a combination of both)?What is the evolution of classification accuracy during the growing season and how can this be framed in terms of crop phenology?What is the individual importance of each input predictor for the classification task?What is the classification confidence and how can this information be used to assess the accuracy?
We first describe the relevant data sources used and the preprocessing routines for Sentinel-1 and Sentinel-2 imagery.Next, a hierarchical classification methodology is discussed in view of different classification schemes.Finally, the results are discussed in light of crop type differences across the growing season.

Field Data
The Integrated Administration and Control System (IACS) is the most important system for the management and control of payments to farmers made by the Member States in application of the European Common Agricultural Policy.The Land Parcel Information System (LPIS) is the spatial database that permits spatial and alphanumeric queries and data retrieval for the application of administrative cross-checks.In addition to information on the area of the field, the database contains information on the crops cultivated that farmers have to report annually to the government; extracts from the database can be used for different applications [44].In Belgium, this is arranged at the regional level, resulting in two similar datasets for the Flemish and the Walloon regions.From these datasets, crop cover per agricultural parcel was extracted.
Belgium's utilized agricultural land covers 1,341,487 ha, out of which 30% is grassland, 2% permanent crops, and 68% arable land.In order of importance, the main arable crops are winter wheat, maize, winter barley, potatoes, sugar beet, and rapeseed.

Sentinel-1 Data
The Sentinel-1 data originated from the Level-1 Ground Range Detected (GRD) Interferometric Wide Swath (IW) product as ingested in Google Earth Engine (GEE) [45].This GRD product consisted of Sentinel-1 radar observations projected onto a regular 10-m grid.The data were preprocessed by GEE using the Sentinel-1 toolbox.Preprocessing included thermal noise removal, radiometric calibration, and terrain correction.The primary features recorded by Sentinel-1 over Belgium were the VV and VH polarized backscatter values (in decibels, dB).Backscatter is strongly influenced by the orientation of the emitted radar beam.Due to the significantly different viewing orientation between the ascending and descending satellite overpasses, these were separated and considered as complementary observations, comparable to the approach by Inglada et al. [40].We applied a refined Lee filter [46] to reduce radar speckle in the images, using a damping factor of 1 and a kernel size of 7 X 7. The number of looks was set to 25.
To reduce the effects of incidence angle variations on the backscatter values, we performed two procedures.First, all observations recorded under an incidence angle below 32 • or above 42 • were disregarded since these viewing geometries were too different from the mean incidence angle in our region of 37 • .Next, a square cosine correction was applied, as outlined in [47].All remaining backscatter values as observed under an incidence angle θ were rescaled to backscatter values that would be observed under a reference angle θ ref , based on Equation (1).
In this equation, σ 0 θ is the backscatter intensity under the observed incidence angle θ, and σ 0 is the estimated backscatter intensity under a reference angle θ ref of 37 • .This simplified correction was based on Lambert's law of optics and included assumptions about the scattering mechanisms.Lambert's law states that the amount of diffuse light that is reflected from an ideal lambertian surface reflector is equal in all directions and proportional to the cosine of the incidence angle of the radiation source.The earth surface, however, is not a homogeneous ideal lambertian reflector.For our classification purpose, this incidence angle correction was sufficient, as it removed the highest dependencies of the backscatter magnitudes on the observed incidence angles.Finally, we created 12-day backscatter mosaics from the combined Sentinel-1A and Sentinel-1B observations.Given the revisit frequency of 12 days for each satellite, at least two observations were recorded within this timeframe.Our analyses showed that after masking extreme incidence angles, between two and five observations were recorded for each location within the 12-day timeframe.For each pixel, all recorded backscatter values within the 12-day window were converted from their dB values to the original values, then averaged, and converted back to the 12-day mosaic backscatter values in dB.This two-way conversion was necessary since a direct calculation of the mean backscatter signal on the dB values would be mathematically incorrect.The averaging procedure was carried out for two reasons: it served as a simple multitemporal filtering approach, removing any remaining artefacts that resulted from speckle or varying viewing geometries, while it also ensured an equal amount of features for each location in the study area.An RGB composite of three processed 12-day mosaics is shown in Figure 2. The backscatter mosaics were projected to the Belgian EPSG:31370 grid with a spatial resolution of 10 m.
This processing workflow was conducted on all available Sentinel-1 imagery between 1 March 2017 and 16 August 2017, resulting in a total of 60 ascending and descending VV and VH 12-day mosaics used as input features in the classification procedure.values to the original values, then averaged, and converted back to the 12-day mosaic backscatter values in dB.This two-way conversion was necessary since a direct calculation of the mean backscatter signal on the dB values would be mathematically incorrect.The averaging procedure was carried out for two reasons: it served as a simple multitemporal filtering approach, removing any remaining artefacts that resulted from speckle or varying viewing geometries, while it also ensured an equal amount of features for each location in the study area.An RGB composite of three processed 12-day mosaics is shown in Figure 2. The backscatter mosaics were projected to the Belgian EPSG:31370 grid with a spatial resolution of 10 m.This processing workflow was conducted on all available Sentinel-1 imagery between 1 March 2017 and 16 August 2017, resulting in a total of 60 ascending and descending VV and VH 12-day mosaics used as input features in the classification procedure.

Sentinel-2 Data
The Sentinel-2 Level-1C data as retrieved from the Copernicus Open Access Hub were first masked for clouds and cloud shadows by the Sen2Cor algorithm [48] and subsequently atmospherically corrected using the iCOR atmospheric correction scheme [49].For poorly coregistered Sentinel-2 scenes (i.e., with a multitemporal coregistration error of >0.5 pixels), an additional geometric correction was applied based on manually identified ground control points.
We used the NDVI value calculated from the red (B4) and near-infrared (B8) bands, both at a spatial resolution of 10 m.The choice of the NDVI metric as a typical optical descriptor was based on its numerous successes in previous crop classification studies [4,50,51].While NDVI is known to sometimes saturate during the most productive stages of the growing season [52], using this index in time series instead of single-date imagery has shown to resolve this shortcoming, since the classifier

Sentinel-2 Data
The Sentinel-2 Level-1C data as retrieved from the Copernicus Open Access Hub were first masked for clouds and cloud shadows by the Sen2Cor algorithm [48] and subsequently atmospherically corrected using the iCOR atmospheric correction scheme [49].For poorly coregistered Sentinel-2 scenes (i.e., with a multitemporal coregistration error of >0.5 pixels), an additional geometric correction was applied based on manually identified ground control points.We used the NDVI value calculated from the red (B4) and near-infrared (B8) bands, both at a spatial resolution of 10 m.The choice of the NDVI metric as a typical optical descriptor was based on its numerous successes in previous crop classification studies [4,50,51].While NDVI is known to sometimes saturate during the most productive stages of the growing season [52], using this index in time series instead of single-date imagery has shown to resolve this shortcoming, since the classifier is then able to look at the full growing curve instead of considering one potentially saturated value during peak development of the vegetation [50,51,53].These NDVI values served as input to the classification procedure.However, frequent and impartial cloud cover imposes a significant challenge for crop classification of a large area covering several image tiles.Cloud obstruction was removed using an updated version of a pixel-wise weighted least-squares smoothing of the NDVI values over time [54][55][56].The smoothed NDVI images were produced at 10-daily intervals between 1 March 2017 and 31 August 2017, resulting in a total of 18 input features for the classification.The final images were projected to the Belgian EPSG:31370 grid with a spatial resolution of 10 m, similar to the Sentinel-1 images.An RGB composite of three smoothed NDVI images is shown in Figure 3.

Hierarchical Random Forest Classification
The classification was performed by a random forest (RF) classifier, a nonparametric decision tree learning algorithm that is part of the machine learning classifiers [57].RF was chosen over other supervised classification models, such as neural networks [58], since it is known for performing particularly well and efficiently on large input datasets with many different features [59], and previous research has used this classifier for mapping purposes with great success [43,[60][61][62][63].The RF algorithm falls under the ensemble learning methods, in which multiple decision trees (forming a random forest) are built during training, after which the mode of the predicted classes of the individual trees forms the output class of the forest.The RF classifier usually outperforms simple decision trees due to less overfitting [64].The random forest is constructed using a bootstrapping

Hierarchical Random Forest Classification
The classification was performed by a random forest (RF) classifier, a nonparametric decision tree learning algorithm that is part of the machine learning classifiers [57].RF was chosen over other supervised classification models, such as neural networks [58], since it is known for performing particularly well and efficiently on large input datasets with many different features [59], and previous research has used this classifier for mapping purposes with great success [43,[60][61][62][63].The RF algorithm falls under the ensemble learning methods, in which multiple decision trees (forming a random forest) are built during training, after which the mode of the predicted classes of the individual trees forms the output class of the forest.The RF classifier usually outperforms simple decision trees due to less overfitting [64].The random forest is constructed using a bootstrapping technique in which each tree is fitted based on a random subset of training samples with replacement, while at each split in this tree, a random subset of the input features is also selected [59].
A two-step hierarchical classification procedure was followed, in which a first step distinguished between a broad crop class and forest, water, and built-up classes.In a second step, the crop class was further classified into the different crop types including grassland.This hierarchical procedure is schematically outlined in Figure 4.
The optimal random forest parameters were determined using a grid-search approach [59].These parameters included the number of trees, the impurity criterion, the minimum amount of samples required to split a node, the minimum amount of samples required to be at a leaf node, and the maximum number of features to consider when looking for the best split.Classification of a sample resulted from the majority voting of class predictions across the trees in the forest.technique in which each tree is fitted based on a random subset of training samples with replacement, while at each split in this tree, a random subset of the input features is also selected [59].
A two-step hierarchical classification procedure was followed, in which a first step distinguished between a broad crop class and forest, water, and built-up classes.In a second step, the crop class was further classified into the different crop types including grassland.This hierarchical procedure is schematically outlined in Figure 4.
The optimal random forest parameters were determined using a grid-search approach [59].These parameters included the number of trees, the impurity criterion, the minimum amount of samples required to split a node, the minimum amount of samples required to be at a leaf node, and the maximum number of features to consider when looking for the best split.Classification of a sample resulted from the majority voting of class predictions across the trees in the forest.

Calibration and Validation Data
The classification procedure was trained based on the Flemish LPIS dataset, operated by the Flemish Authority of Agriculture (cfr.Section 2.1.1).To ensure independent calibration and validation sets, the available parcels in the ground truth database were randomly subdivided in a 10% calibration set and a 90% validation set.This threshold was chosen in such way that the final amount of training points was not too large for efficiently training the classifier.However, since the training and validation sets for forest, built-up, and water classes needed to be generated manually as they are not part of the LPIS dataset.These in turn follow an 80-20 splitting rule in order to have sufficient representation of these classes in the training set.The calibration parcels were subsequently subset on parcels above 2.5 ha which were expected to yield purer spectral signatures compared to the smaller fields.However, this area threshold was not applied to the validation set of parcels to assess the accuracy of classification on all parcel sizes.For a similar reason, the remaining calibration parcels were buffered 30 m inward to avoid field border effects that could contaminate the spectral signatures during training.This buffer was not applied on the validation set.Finally, both the remaining calibration and validation parcels were rasterized to the 10-m input imagery grid, meaning that calibration, validation, and ultimately classification procedures were pixel-based in this study.Due to the different field sizes and the exclusion of small fields in the training set, the initial proportion of calibration and validation parcels was different from the proportion in pixels, with a much larger amount of independent validation samples (Table 1).

Calibration and Validation Data
The classification procedure was trained based on the Flemish LPIS dataset, operated by the Flemish Authority of Agriculture (cfr.Section 2.1.1).To ensure independent calibration and validation sets, the available parcels in the ground truth database were randomly subdivided in a 10% calibration set and a 90% validation set.This threshold was chosen in such way that the final amount of training points was not too large for efficiently training the classifier.However, since the training and validation sets for forest, built-up, and water classes needed to be generated manually as they are not part of the LPIS dataset.These in turn follow an 80-20 splitting rule in order to have sufficient representation of these classes in the training set.The calibration parcels were subsequently subset on parcels above 2.5 ha which were expected to yield purer spectral signatures compared to the smaller fields.However, this area threshold was not applied to the validation set of parcels to assess the accuracy of classification on all parcel sizes.For a similar reason, the remaining calibration parcels were buffered 30 m inward to avoid field border effects that could contaminate the spectral signatures during training.This buffer was not applied on the validation set.Finally, both the remaining calibration and validation parcels were rasterized to the 10-m input imagery grid, meaning that calibration, validation, and ultimately classification procedures were pixel-based in this study.Due to the different field sizes and the exclusion of small fields in the training set, the initial proportion of calibration and validation parcels was different from the proportion in pixels, with a much larger amount of independent validation samples (Table 1).

Classification Schemes
A major aim of this study was to quantify the contribution of individual and combined optical and SAR imagery to the classification accuracy.In addition, an objective insight into the evolution of the classification accuracy during the growing season would yield useful information with regard to the expected accuracy of a classification at a certain period in the growing season.
We specified 18 classification schemes (Table 2).In the first six schemes, only Sentinel-1 SAR imagery was used with increasing frequency from only one month (March 2017) to the entire period (March-August 2017).In schemes 7-12, a similar approach was followed for Sentinel-2 NDVI images, while the last 6 schemes followed the same procedure for combined Sentinel-1 and Sentinel-2 imagery.
For each classification scheme, the overall accuracy (OA) and Cohen's kappa coefficient of agreement (κ) [65] were calculated as the classifier performance estimators.The OA was defined as where predictions were calculated for all available validation samples and the predicted labels were compared to the true labels.Similarly, κ was defined as in which p 0 is equal to the OA, while p e is related to the expected accuracy by chance.A perfect agreement leads to κ = 1, while no agreement other than what would be expected by chance leads to κ = 0.For the calculation of OA and κ reported in this study, all validation samples in the study area were considered (including forest, water, and built-up) to avoid any uncertainty biases following the two-step hierarchical classification procedure.In practice, this means that all crop type accuracy estimates implicitly include the uncertainty of the first hierarchical classification step as well.

Classification Confidence
In addition to the classification, the random forest classifier was used to provide insight into the expected classification confidence.The confidence was estimated from the predicted class probabilities [66], which reflect the convergence of the classifier for each sample.In our setup of the random forest, the predicted class probabilities of an input sample were computed as the mean predicted class probabilities of the trees in the forest, which naturally sum to unity.We defined the classification confidence for a certain sample as the class probability of the winning class.A high confidence reflects strong agreement across the different trees and a more reliable classification, while a low confidence reflects disagreement between the trees and likely a less reliable classification.In addition to a general accuracy assessment of the produced classification, this analysis should allow a much more specific assessment of classification confidence on a sample or pixel level.

Classification Accuracy
Table 2 demonstrates the evolution of the OA and kappa for the different classification schemes.As expected, OA and kappa increase gradually with the amount of input images during the season.This was explained by the increased discriminating power of a classifier when more information was added and differences between the different crop types became more apparent [38].
The classification result using Sentinel-1 in March was significantly better than using Sentinel-2 in March (OA of 47% vs. 39%, kappa of 0.31 vs. 0.22), while the overall accuracy of optical-only classification across the whole season performed slightly better than SAR-only classification across the season (OA of 78% vs. 76%, kappa of 0.70 vs. 0.68).This suggests that different crop types had comparable optical spectral features in their early stages, while their structure, reflected by the backscatter signatures, appeared more different.However, this was only the case for crop types such as winter cereals which were already on the field during this period.For crop types such as maize and potato that were planted in April-May, optical and radar signatures in March reflected the winter cover crop and management practices, and a direct link between these signatures and the classification performance for the main crop was not straightforward.
In each month, the cumulative number of images used demonstrated that a combination of Sentinel-1 and Sentinel-2 performed better than using a single sensor (Table 2).By the end of July, a maximum accuracy of 82% (kappa 0.77) was reached, which could not be improved by adding the August imagery.
Figure 5 shows the final classification result by the end of August 2017 based on the combination of Sentinel-1 and -2 inputs.The detailed inset demonstrates a surprisingly smooth result of the classified parcels in a highly diverse arable landscape, given that no prior segmentation technique or a post-classification filtering method was applied.
The results (Table 2) were also apparent in Figure 6, where for the center field, which is part of the validation dataset, certain within-field zones were wrongly classified as beet in the early mapping stage (taken here as June 2017), while these same zones were correctly identified as potato by the end of August.This is further illustrated with the confusion matrix (Tables 3 and 4).Early mapping in June led to an OA of 75% (corresponding to a kappa coefficient of 0.66), with typical confusion occurring between winter cereals and grassland on the one hand and potato, beet, and to a lesser extent maize on the other hand.Confusion between the "other crop" class that contained the less frequent crop  This is further illustrated with the confusion matrix (Tables 3 and 4).Early mapping in June led to an OA of 75% (corresponding to a kappa coefficient of 0.66), with typical confusion occurring between winter cereals and grassland on the one hand and potato, beet, and to a lesser extent maize on the other hand.Confusion between the "other crop" class that contained the less frequent crop types and the larger crop classes was also considerable.By the end of August, OA had increased to 82.5% (corresponding to a kappa coefficient of 0.76), and all confusion values had decreased Figure 6.The combined Sentinel-1 and -2 classification result of June (upper panel) shows more within-field variability than the classification result of August (lower panel).In this particular example, the zones in the center field that were wrongly classified as beet in June but were correctly classified as potato in August.This is further illustrated with the confusion matrix (Tables 3 and 4).Early mapping in June led to an OA of 75% (corresponding to a kappa coefficient of 0.66), with typical confusion occurring between winter cereals and grassland on the one hand and potato, beet, and to a lesser extent maize on the other hand.Confusion between the "other crop" class that contained the less frequent crop types and the larger crop classes was also considerable.By the end of August, OA had increased to 82.5% (corresponding to a kappa coefficient of 0.76), and all confusion values had decreased considerably.The "other crop" class remained challenging due to the high intraclass variability for this mix of different crop types.While training was limited to the Flanders region, classification was applied to the whole of Belgium.Since crop types and management practices can differ considerably between regions in Belgium, an independent validation was carried out using the Walloon LPIS dataset which was not used during training, resulting in an OA of 82% for early mapping in June and 86% for mapping in August.These values were slightly higher than in Flanders, mostly because the fields are larger and more homogeneous in the Walloon region.

Classification Confidence
The classification confidence map is shown in Figure 7 and it is clear from the detailed inset that the confidence was lowest near the edges of the parcels.Border pixels were typically showing contaminated spectral features due to a mixture with surrounding land cover signals.This is further illustrated in Figure 8, where the borders near the center parcel were classified differently, but the associated confidence map shows that these border results had a high uncertainty.The true crop of this parcel was sugar beet, as correctly identified in the middle of the parcel with high confidence.Some regions in the centers of parcels also have a somewhat lower confidence level, which could be due to field heterogeneity.considerably.The "other crop" class remained challenging due to the high intraclass variability for this mix of different crop types.While training was limited to the Flanders region, classification was applied to the whole of Belgium.Since crop types and management practices can differ considerably between regions in Belgium, an independent validation was carried out using the Walloon LPIS dataset which was not used during training, resulting in an OA of 82% for early mapping in June and 86% for mapping in August.These values were slightly higher than in Flanders, mostly because the fields are larger and more homogeneous in the Walloon region.

Classification Confidence
The classification confidence map is shown in Figure 7 and it is clear from the detailed inset that the confidence was lowest near the edges of the parcels.Border pixels were typically showing contaminated spectral features due to a mixture with surrounding land cover signals.This is further illustrated in Figure 8, where the borders near the center parcel were classified differently, but the associated confidence map shows that these border results had a high uncertainty.The true crop of this parcel was sugar beet, as correctly identified in the middle of the parcel with high confidence.Some regions in the centers of parcels also have a somewhat lower confidence level, which could be due to field heterogeneity.We investigated to which degree this classification confidence could be used to assess the classification performance by quantifying the mean classification accuracy as a function of the classification confidence (Figure 9).This was calculated by binning all validation samples according to their classification confidence as reported by the RF classifier and calculating the mean classification accuracy for all validation samples within a particular confidence bin.The strongly positive correlation between both indicated that the produced confidence estimates could be used to assess the expected classification accuracy for new samples not part of the training and validation set.However, this confidence is not equal to a statistical probability and should not be used as such.We investigated to which degree this classification confidence could be used to assess the classification performance by quantifying the mean classification accuracy as a function of the classification confidence (Figure 9).This was calculated by binning all validation samples according to their classification confidence as reported by the RF classifier and calculating the mean classification accuracy for all validation samples within a particular confidence bin.The strongly positive correlation between both indicated that the produced confidence estimates could be used to assess the expected classification accuracy for new samples not part of the training and validation set.However, this confidence is not equal to a statistical probability and should not be used as such.In this specific example, the crop class is sugar beet, as correctly classified in the center of the field.The color legends are similar to Figures 4 and 6.

Figure 9.
Mean classification accuracy as a function of classification confidence as well as the relative occurrence of each confidence level.These numbers were calculated by binning all validation samples according to their classification confidence as reported by the RF classifier and calculating the mean classification accuracy for all validation points within a particular confidence bin.This analysis clearly demonstrates the strong correlation between accuracy and confidence, meaning that the confidence level can be qualitatively used to assess the expected accuracy of the classification in a specific region.In this specific example, the crop class is sugar beet, as correctly classified in the center of the field.The color legends are similar to Figures 4 and 6.

Figure 9.
Mean classification accuracy as a function of classification confidence as well as the relative occurrence of each confidence level.These numbers were calculated by binning all validation samples according to their classification confidence as reported by the RF classifier and calculating the mean classification accuracy for all validation points within a particular confidence bin.This analysis clearly demonstrates the strong correlation between accuracy and confidence, meaning that the confidence level can be qualitatively used to assess the expected accuracy of the classification in a specific region.Mean classification accuracy as a function of classification confidence as well as the relative occurrence of each confidence level.These numbers were calculated by binning all validation samples according to their classification confidence as reported by the RF classifier and calculating the mean classification accuracy for all validation points within a particular confidence bin.This analysis clearly demonstrates the strong correlation between accuracy and confidence, meaning that the confidence level can be qualitatively used to assess the expected accuracy of the classification in a specific region.Table 3. Confusion matrix for early mapping in June 2017 using Sentinel-1 and -2 inputs.Small differences in "total true" and "total estimated" between Tables 3 and 4 are due to rounding.4. Confusion matrix for final mapping in August 2017 using Sentinel-1 and -2 inputs.Small differences in "total true" and "total estimated" between Tables 3  and 4 are due to rounding.

Feature Importance
The large amount of input predictors from both radar and optical sources required a detailed investigation on their individual importance on the eventual classification result.Figure 10 shows the Gini importance of all input features [67].In the random forest implementation used in this study, the Gini importance of a feature is defined as the total decrease in node impurity (Gini impurity) by adding that feature to the classifier averaged over all trees in the forest.A large Gini importance means that a specific predictor played an important role in the classification, while a low importance means only limited added value of that predictor.

Feature Importance
The large amount of input predictors from both radar and optical sources required a detailed investigation on their individual importance on the eventual classification result.Figure 10 shows the Gini importance of all input features [67].In the random forest implementation used in this study, the Gini importance of a feature is defined as the total decrease in node impurity (Gini impurity) by adding that feature to the classifier averaged over all trees in the forest.A large Gini importance means that a specific predictor played an important role in the classification, while a low importance means only limited added value of that predictor.From Figure 10a, it appeared that with regard to Sentinel-1, very limited information was available in predictors before May, which is in agreement with the phenology of the considered crops, showing large similarities early in the season.Throughout subsequent months, varying degrees of importance occur without any clear patterns with regard to timing or polarization channel (VV/VH).The importance of the NDVI predictors of Sentinel-2 behaved quite differently (Figure 10b), with two distinct periods of higher feature importance.The first period was situated around April and May, coinciding with the period of rapid stem elongation of winter cereals, in addition to the emergence of some of the summer crop types.As the start of the growing season generally leads to an exponential increase in productivity and hence NDVI values, differences between the phenology of the different crop types led to increased separability based on NDVI in this period of the year.The second period was situated around July and August, typically months in which winter cereals are harvested.In addition, this is the period during which the phenological differences between summer crops are largest, therefore leading to increased separability based on NDVI.The 25 most important From Figure 10a, it appeared that with regard to Sentinel-1, very limited information was available in predictors before May, which is in agreement with the phenology of the considered crops, showing large similarities early in the season.Throughout subsequent months, varying degrees of importance occur without any clear patterns with regard to timing or polarization channel (VV/VH).The importance of the NDVI predictors of Sentinel-2 behaved quite differently (Figure 10b), with two distinct periods of higher feature importance.The first period was situated around April and May, coinciding with the period of rapid stem elongation of winter cereals, in addition to the emergence of some of the summer crop types.As the start of the growing season generally leads to an exponential increase in productivity and hence NDVI values, differences between the phenology of the different crop types led to increased separability based on NDVI in this period of the year.The second period was situated around July and August, typically months in which winter cereals are harvested.In addition, this is the period during which the phenological differences between summer crops are largest, therefore leading to increased separability based on NDVI.The 25 most important features are shown in Figure 10c.The five most important predictors were based on NDVI observations, demonstrating the importance of optical imagery during key periods of the growing season.Also, Sentinel-1 backscatter predictors appear frequently in this list of most important features, demonstrating again the added value of radar information in the classification procedure.

Discussion
Crop classification requires uniform inputs across the whole region.For Sentinel-1, this was accomplished by creating 12-day backscatter mosaics in which special care was taken to reduce incidence angle effects on the results.An enhanced Lee filter was used to clean the radar images, although radar speckle unavoidably remained part of the signal.However, by considering 12-day backscatter mosaics and using time series as input features instead of single-date images, the impact of radar speckle could be minimized.The use of 12-day backscatter mosaics could somehow limit the methodology for NRT applications, but this is not an issue in classification tasks that are not time critical such as in this study.For NRT applications, a multitemporal filtering approach such as that proposed by Quegan et al. [68] could be considered as well, although this was not part of the present study.
For Sentinel-2, this was realized by smoothing all cloud-free NDVI observations at the pixel level and creating 10-daily cloud-free NDVI mosaics.These time series inputs were used in a hierarchical random forest classification to yield a crop map for the whole of Belgium.The hierarchical two-step procedure resulted in a 1.5% OA gain over a nonhierarchical approach in which all classes were predicted in one step.This indicates that performing a first classification step to distinguish among a broad crop class and the other classes was helpful in the classification procedure, albeit with a small accuracy gain.Indeed, OA of this first classification step at the end of August was 98%, owing to the distinct radar and optical signatures of the three noncrop classes (forest, built-up, and water) against one major crop class consisting of more similar radar and optical signatures.The second classification step then allowed a specific training of the random forest classifier to detect differences between the crop types.
OA and kappa clearly increased along the growing season.However, typical summer crops such as potato, maize, and sugar beet well emerged only by the end of May.Winter cereals, in turn, were already well developed by then.This led to classification accuracies of particular crop types deviating considerably from the OA.For example, winter cereals were mapped against the other crops by the end of May with a user's accuracy of 96% and a producer's accuracy of 71%, while potato was mapped against the other crops with a user's accuracy of 86% and a producer's accuracy of only 40%.This demonstrates how the accuracy within the growing season depends strongly on the evolution of the crop's phenological stages.These insights agree with previous research showing that using optical time series which include phenological differences in classification tasks has a positive impact on the performance [14].The ability of discriminating winter cereals earlier in the season compared to summer crops could in principle be used in future work to discriminate winter cereals early in the growing season, as this might be helpful for some applications.However, further identification amongst these crops (e.g., winter wheat vs. winter barley) is at this moment in the growing season hardly possible due to their strong similarities.A discrimination of winter cereals early in the season might also lead to error propagation of this early discrimination layer later on in the season.It is therefore generally better to perform the winter cereals classification simultaneously with the other crops.
From the confusion matrix in Tables 3 and 4, it became apparent that specific confusion occurred between winter cereals and grassland on the one hand and among the different summer crops on the other hand.The former could be explained by the fact that both winter cereals and grassland appear green early in the season and both start growing in April-May, causing confusion amongst these classes.The latter could be explained by the fact that all summer crops (maize, potato, beet) appear very similar until the end of April due to their comparable bare soil conditions or presence of indistinguishable seedlings.These are physical causes for confusion that cannot be easily solved in a satellite-based classification task.
The increasing OA and kappa during the season lead to a tradeoff: where the quality of the classification results improved with time, the timeliness of the results decreased.Our analysis provided insight into this tradeoff and may help choose the appropriate time in the growing season for a specific application where crop classification is needed.We found that early in the season, the OA of a classification based on Sentinel-1 was higher than a similar classification based on Sentinel-2 (at the end of March 47% and 39%, respectively).However, after only one month in the season, the amount of predictor variables varied considerably between Sentinel-1 and Sentinel-2, with 12 predictors for the former (both VV and VH 12-daily backscatter mosaics in ascending and descending passes) and only 3 predictors for the latter (three 10-daily NDVI mosaics).The higher accuracy for radar-only classification could therefore also be due to the amount of input features.Indeed, when limiting this analysis only to VH backscatter for Sentinel-1, OA dropped to 36%, which is in fact even lower than the Sentinel-2 NDVI classification.Based on our analyses, no clear conclusion could therefore be made with regard to the performance differences of exclusive radar vs. exclusive optical classification early in the season.
Limited performance of the optical-only classifications could be attributed to the use of NDVI as the only predictor.NDVI is not a full descriptor of Sentinel-2 optical imagery, although it has been used successfully in the past [69].It is therefore often beneficial to use more bands, especially when exclusively using optical imagery [70].However, the least-squares smoothing procedure as outlined in Section 2.1.3was not designed to work on the individual spectral bands of optical sensors, but on vegetation indices such as NDVI.Hence, the 10-daily smoothed data set cannot be realized using individual spectral bands without introducing new challenges with regard to country-wide mosaicking of individual acquisitions, and thereby jeopardizing the ability to closely monitor crops during the growing season.Therefore we focused on 10 m resolution NDVI as a well-known vegetation index, while the addition of other optical metrics could be subject to future research.This is also the case for radar-only classification which we solely based on backscatter intensity mosaics.It has already been shown in the past that other derived radar products can increase classification accuracies, such as stacks of interferometric coherences [71,72].It should therefore be subject to future research whether an optimized classification based on radar only (including more radar predictors) performs better early in the season than an optimized classification based on optical only (including more optical predictors), which would provide answers to our hypothesis that radar images might be more helpful early in the season due to their sensitivity to early crop stages during an in general cloudy part of the growing season.
Considering the feature importance in the final classification task at the end of August (Figure 8), we found that the NDVI predictors show two distinct periods of increased importance.In April and May, the winter cereals showed rapid stem elongation and some of the summer crops started emerging, enhancing their separability compared to bare soil or seedling conditions.In addition, this period is in general less cloudy than the preceding months, leading to higher-quality optical images.July and August were found to be important months for optical acquisitions as well, since the period of winter cereal harvesting and peak productivity of the summer crops also led to increased separability of the different crop types.Regarding SAR predictors, we found no clear difference in importance between the two polarization channels (VV and VH), which is somewhat contradictory to Inglada et al. [40], who found VV to be more important for their classification than VH.Considering joint Sentinel-1 and Sentinel-2 predictors, it became apparent that the most important features were based on key NDVI acquisitions, while the other important features were a mixture of optical and radar predictors, stressing the added value of a combined optical/radar classification.
The classification confidence was shown to provide helpful insights at the pixel level with regard to the quality of classification.Parcel borders were typically characterized by lower classification confidence than the center parts of the parcels.Figure 9 demonstrated the strong correlation between estimated classification confidence and final classification accuracy, suggesting that this information can be qualitatively used to assess specific classification performances.However, it was also mentioned that this confidence is not equal to a statistical probability and should therefore not be used to infer any statistical conclusion on classification accuracy.
Our results clearly showed the advantage of using in addition to optical Sentinel-2 observations complementary Sentinel-1 SAR observations that give insight into the plant structural components [27].An advantage of the approach implemented here is the application of dense image time series over single-date images avoiding the need for manual date selections which would hamper its use in an operational context [73][74][75].Several recent studies that used combined radar and optical imagery reported higher accuracies compared to this work.One of the reasons is that, while some of these studies were segment or field-level classifications [74,76] or implemented some kind of postclassification filtering routine [38], our approach was an unfiltered pixel-wise classification, which is generally characterized by lower OA values.In addition, classification of 11 different classes, of which 8 are crop types, is a challenging task for a classifier and might explain lower accuracies compared to other studies that focus on a few specific classes, such as very accurate winter wheat [41], paddy rice [77], and maize and grassland mapping [39].Notwithstanding these differences with previous studies, the general conclusions of this work are largely in agreement with other research [21, [32][33][34][38][39][40][41][42]78].
Further steps could include working on the feature instead of the pixel level, where combined Sentinel-1 and Sentinel-2 observations could serve as input to an automatic parcel detection tool.Classification can subsequently be performed at the parcel level with the strong advantage that spectral signatures can be averaged for a field, significantly increasing the signal-to-noise ratio [79].This is especially beneficial for analyzing radar signatures, where reduced radar speckle at the parcel level will decrease intraclass variability and increase interclass variability, thereby improving classification accuracies.Future work could also focus on a comparison between radar-only and optical-only classification early in the season with a larger set of predictors, including for example interferometric coherence for radar acquisitions and SWIR bands for optical imagery, allowing a fairer comparison between both data sources.Lastly, it should be the subject of future research whether some crop types, such as winter wheat and winter barley, could be classified separately and earlier in the season than the other crop types based on their specific phenological differences.

Conclusions
Growing food demands require timely estimates of crop acreages at a large scale.Identification of different crop types forms the basis for such tasks.Also, at the parcel level, early crop type information is essential for crop monitoring purposes that aim at early anomaly detection.State-of-the-art earth observation technologies allow such identification from space.With the unprecedented potential of sensor synergies in the Copernicus program, multisensor classification procedures that go beyond traditional optical-only procedures can now be developed operationally.
Previous research clearly demonstrated the complementary nature of optical and radar signals to improve classification accuracies, in addition to alleviating cloud obstruction, and therefore challenging optical-only monitoring of vegetation.
We created 12-day Sentinel-1 backscatter mosaics and 10-daily Sentinel-2 smoothed NDVI mosaics over Belgium to develop a multisensor crop mapping approach tested on the 2017 growing season.Based on an optimized random forest classifier, we predicted eight different crop types with a maximum accuracy of 82% and a kappa coefficient of 0.77.The combination of radar and optical data always outperformed a single-sensor classification procedure, with a clearly increasing accuracy throughout the growing season up till July, when the maximum mapping accuracy was reached.While the most important predictors in the final classification were found to be related to optical acquisitions during key periods of the growing season, other important predictors consisted of a combination of optical and radar predictors, stressing the added value of a multisensor perspective.The concept of classification confidence was shown to provide additional information going beyond a hard classification result, indicating regions where confidence was lower, such as near parcel boundaries.
These results highlight the need for multisensor crop classification and monitoring efforts to fully exploit the rich potential of existing and future complementary satellite sensors (The final 2017 cropmap produced in this study is available upon request).

22 Figure 1 .
Figure 1.Example of profiles of (upper panel) Sentinel-2 normalized difference vegetation index (NDVI) and (lower panel) Sigma0 VV and VH backscatter intensities for a winter wheat field.The characteristic increase in NDVI during plant development, followed by a decrease during ripening and harvest, are well correlated with the decrease in mostly VV backscatter during the growth phase of the plant.During senescence and harvest, backscatter intensities start increasing again.

Figure 1 .
Figure 1.Example of profiles of (upper panel) Sentinel-2 normalized difference vegetation index (NDVI) and (lower panel) Sigma0 VV and VH backscatter intensities for a winter wheat field.The characteristic increase in NDVI during plant development, followed by a decrease during ripening and harvest, are well correlated with the decrease in mostly VV backscatter during the growth phase of the plant.During senescence and harvest, backscatter intensities start increasing again.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 22 is then able to look at the full growing curve instead of considering one potentially saturated value during peak development of the vegetation[50,51,53].These NDVI values served as input to the classification procedure.However, frequent and impartial cloud cover imposes a significant challenge for crop classification of a large area covering several image tiles.Cloud obstruction was removed using an updated version of a pixel-wise weighted least-squares smoothing of the NDVI values over time[54][55][56].The smoothed NDVI images were produced at 10-daily intervals between 1 March 2017 and 31 August 2017, resulting in a total of 18 input features for the classification.The final images were projected to the Belgian EPSG:31370 grid with a spatial resolution of 10 m, similar to the Sentinel-1 images.An RGB composite of three smoothed NDVI images is shown in Figure3.

Figure 3 .
Figure 3. RGB composite of smoothed NDVI.First 10-day period of March 2017 (red), last 10-day period of June 2017 (green), and last 10-day period of August 2017 (blue).Inset shows a detailed view of this RGB, demonstrating the differences in temporal NDVI mosaics across different agricultural parcels.

Figure 3 .
Figure 3. RGB composite of smoothed NDVI.First 10-day period of March 2017 (red), last 10-day period of June 2017 (green), and last 10-day period of August 2017 (blue).Inset shows a detailed view of this RGB, demonstrating the differences in temporal NDVI mosaics across different agricultural parcels.

Figure 4 .
Figure 4. Schematic overview of the two-step hierarchical classification procedure.In the first step, all crop types and grassland were combined in one general crop class.

Figure 4 .
Figure 4. Schematic overview of the two-step hierarchical classification procedure.In the first step, all crop types and grassland were combined in one general crop class.

22 Figure 5 .
Figure 5. Final classification result by the end of August 2017 based on Sentinel-1 and -2 inputs.Inset shows a detailed view on the classification results in a highly diverse arable landscape.

Figure 6 .
Figure6.The combined Sentinel-1 and -2 classification result of June (upper panel) shows more within-field variability than the classification result of August (lower panel).In this particular example, the zones in the center field that were wrongly classified as beet in June but were correctly classified as potato in August.

Figure 5 . 22 Figure 5 .
Figure 5. Final classification result by the end of August 2017 based on Sentinel-1 and -2 inputs.Inset shows a detailed view on the classification results in a highly diverse arable landscape.

Figure 6 .
Figure 6.The combined Sentinel-1 and -2 classification result of June (upper panel) shows more within-field variability than the classification result of August (lower panel).In this particular example, the zones in the center field that were wrongly classified as beet in June but were correctly classified as potato in August.

Figure 7 .
Figure 7. Classification confidence for end of August 2017, defined as the random forest predicted class probability of the majority class for each pixel.Inset shows that the confidence was typically lower at the field borders, which can be explained by spectral contamination of pixels near field edges.

Figure 7 .
Figure 7. Classification confidence for end of August 2017, defined as the random forest predicted class probability of the majority class for each pixel.Inset shows that the confidence was typically lower at the field borders, which can be explained by spectral contamination of pixels near field edges.

Figure 8 .
Figure 8. Detail of classification result at the end of August (left) and its classification confidence (right), demonstrating that the classification confidence near the field borders is significantly lower.In this specific example, the crop class is sugar beet, as correctly classified in the center of the field.The color legends are similar to Figures 4 and 6.

Figure 8 .
Figure 8. Detail of classification result at the end of August (left) and its classification confidence (right), demonstrating that the classification confidence near the field borders is significantly lower.In this specific example, the crop class is sugar beet, as correctly classified in the center of the field.The color legends are similar to Figures 4 and 6.

Figure 8 .
Figure 8. Detail of classification result at the end of August (left) and its classification confidence (right), demonstrating that the classification confidence near the field borders is significantly lower.In this specific example, the crop class is sugar beet, as correctly classified in the center of the field.The color legends are similar to Figures 4 and 6.

Figure 9 .
Figure 9.Mean classification accuracy as a function of classification confidence as well as the relative occurrence of each confidence level.These numbers were calculated by binning all validation samples according to their classification confidence as reported by the RF classifier and calculating the mean classification accuracy for all validation points within a particular confidence bin.This analysis clearly demonstrates the strong correlation between accuracy and confidence, meaning that the confidence level can be qualitatively used to assess the expected accuracy of the classification in a specific region.

Figure 10 .
Figure 10.Feature importance defined as the reduction in Gini impurity by that feature.The presented values were extracted from the combined Sentinel-1/Sentinel-2 classification step in the hierarchical procedure, in which the different crop types were distinguished.(a) Chronologically ordered Gini importance for the Sentinel-1 predictors (asc = ascending pass, des = descending pass).Each date indicates the start of a 12-day period for which the mosaic was created.(b) Chronologically ordered Gini importance for the Sentinel-2 NDVI predictors.Each date indicates the start of a 10-day period for which the mosaic was created.(c) Twenty-five most important features for the combined Sentinel-1/Sentinel-2 classification.

Figure 10 .
Figure 10.Feature importance defined as the reduction in Gini impurity by that feature.The presented values were extracted from the combined Sentinel-1/Sentinel-2 classification step in the hierarchical procedure, in which the different crop types were distinguished.(a) Chronologically ordered Gini importance for the Sentinel-1 predictors (asc = ascending pass, des = descending pass).Each date indicates the start of a 12-day period for which the mosaic was created.(b) Chronologically ordered Gini importance for the Sentinel-2 NDVI predictors.Each date indicates the start of a 10-day period for which the mosaic was created.(c) Twenty-five most important features for the combined Sentinel-1/Sentinel-2 classification.

Table 1 .
Total number of calibration and validation parcels and pixels per class.

Table 2 .
Overall Accuracy (OA) and kappa coefficient (κ) of the different classification schemes.