Crop Classification Based on Temporal Signatures of Sentinel-1 Observations over Navarre Province, Spain

Crop classification provides relevant information for crop management, food security assurance and agricultural policy design. The availability of Sentinel-1 image time series, with a very short revisit time and high spatial resolution, has great potential for crop classification in regions with pervasive cloud cover. Dense image time series enable the implementation of supervised crop classification schemes based on the comparison of the time series of the element to classify with the temporal signatures of the considered crops. The main objective of this study is to investigate the performance of a supervised crop classification approach based on crop temporal signatures obtained from Sentinel-1 time series in a challenging case study with a large number of crops and a high heterogeneity in terms of agro-climatic conditions and field sizes. The case study considered a large dataset on the Spanish province of Navarre in the framework of the verification of Common Agricultural Policy (CAP) subsidies. Navarre presents a large agro-climatic diversity with persistent cloud cover areas, and therefore, the technique was implemented both at the provincial and regional scale. In total, 14 crop classes were considered, including different winter crops, summer crops, permanent crops and fallow. Classification results varied depending on the set of input features considered, obtaining Overall Accuracies higher than 70% when the three (VH, VV and VH/VV) channels were used as the input. Crops exhibiting singularities in their temporal signatures were more easily identified, with barley, rice, corn and wheat achieving F1-scores above 75%. The size of fields severely affected classification performance, with ~14% better classification performance for larger fields (>1 ha) in comparison to smaller fields (<0.5 ha). Results improved when agro-climatic diversity was taken into account through regional stratification. It was observed that regions with a higher diversity of crop types, management techniques and a larger proportion of fallow fields obtained lower accuracies. The approach is simple and can be easily implemented operationally to aid CAP inspection procedures or for other purposes.


Introduction
Crop classification is one of the most important agricultural applications of remote sensing [1]. Knowing what crops are present in the fields is very useful both at a local and global scale [2]. For instance, this information is valuable for the design and implementation of agricultural policies [3], as well as for crop management and food security assurance [4]. Crop classification is also a pre-requisite for implementing other remote-sensing-based applications at the field scale (e.g., biomass and yield estimation, anomaly detection, etc.). Satellite Earth observations provide information about biophysical properties of the Earth's surface and their spatial variability with a given revisit time. This constitutes a very rich information source that can be used for identifying the crop types being cultivated and for monitoring them along their growing cycle [5].
European farmers benefit from the Common Agricultural Policy (CAP) support of the European Union, which started back in 1962. CAP establishes different financing instruments including area-based payments and cross-compliance requirements (Regulation (EU) No 1306/2013). Competent authorities might check whether farmers' declarations are correct (i.e., they are actually growing the crops they declare on each field and follow cross-compliance requirements). Traditionally, this has been done through field inspections on a sample of fields, constituting an expensive, inefficient and incomplete check system. On June 2018, the European Commission drafted the new modification of the CAP (Regulation (EU) 2018/746) regarding applications, payment claims and checks to be fully implemented in 2020. These modifications recommend the establishment of procedures to check and track all eligibility criteria using Copernicus Sentinels data or similar data. Therefore, several initiatives and research efforts are being conducted at present to fulfill this aim, including EU-funded projects like Sen2-Agri [6], RECAP [7] or SEN4CAP [8].
Indeed, the Copernicus program has been a game changer in this field of research, with frequent observations potentially useful for crop classification (both in the microwave (Sentinel-1, S1) and optical (Sentinel-2, S2) domain), which are acquired systematically over the world and freely distributed [9]. Their excellent compromise between spatial and spectral resolution, and above all, their enhanced revisit time, through the use of constellations of twin satellites (S1A-S1B and S2A-S2B), make them a unique solution for monitoring dynamic processes like vegetation growth at the scale of agricultural fields.
Until the Sentinels were launched, typical approaches for crop classification were mostly based on the evaluation of the spectral signature of crops obtained from multispectral imagery on key moments of the phenological cycle and their statistical (di)similarity [10]. However, early studies [11] already recalled that this approach might have limitations after observing that different fields of the same crop might have different spectral responses due to slight differences in the agricultural management calendar from field to field. In addition, different crops might expose a similar spectral response on a given date, thus, an examination of the multi-temporal evolution of the spectral response of a field was hypothesized as a potentially relevant information for crop recognition [11]. Yet, the unavailability of multi-temporal datasets with the sufficient revisit time (i.e., 16 days for Landsat) precluded the further development of this type of method until sensors such as MODIS became available [12,13]. With the availability of S2 data, similar approaches but with a higher spatial resolution have been developed [7,14,15].
In large parts of the world, remote sensing in the optical domain has been difficult to implement operationally due to persistent cloud cover limiting the number of viable observations along the agricultural season [16]. Synthetic Aperture Radar (SAR) sensors can be an interesting alternative for those regions since they can operate in cloud-covered and poorly illuminated conditions. In addition, SAR data provide complementary information to optical sensors, reflecting geometric and dielectric properties of vegetation and soil, which have been proven relevant for crop classification [17,18]. The first SAR-based crop classification experiences used a few single polarization scenes and generally achieved limited results [5], but soon after, several studies demonstrated that adding dimensionality through multi-frequency, multi-polarization or multi-temporal SAR observations increased the accuracy of crop maps [19,20]. Observation frequency determines the size of the scattering elements that contribute to backscatter [21], with short wavelengths (X-band) interacting with smaller canopy constituents (e.g., leaves and fruits) than medium (C-band, e.g., small branches and stems) or longer wavelengths (L-band, e.g., larger branches and trunks). The latter might also be more sensitive to soil surface roughness and moisture conditions below the canopy. Regarding polarization, quad-polarized data provided polarimetric features which are potentially interesting for classification and other agricultural applications [21,22]; however, dual-polarized configurations might be more efficient for multi-temporal large-area mapping and still provide interesting information for crop monitoring [23] and classification [24,25]. On the other hand, multi-temporal observations of single or dual-polarized backscatter have demonstrated a clear benefit on the accuracy of the resulting classifications [24], since morphological differences in some particular stages of the phenological development might be key for differentiating pairs of crops that are otherwise quite similar the rest of the year [18].
Given the actual availability of SAR constellation missions with an extremely short revisit time and high spatial resolutions, classification methods that explicitly exploit time series information have recently been developed with promising results. In effect, supervised classification schemes were developed based solely on time series information, where training data were used to obtain model time series for each class (i.e., temporal signatures) that were later compared with the elements to classify using some type of error metric. This way, Whelen and Siqueira [26] classified alfalfa, corn and wheat fields over a 10 km × 23 km area using L-band UAVSAR observations, and obtained accuracy values higher than 75% when HV backscatter, entropy or alpha time series were used. Later, using C-band Sentinel-1 data, this approach was refined with a case study over a portion of central North Dakota considering four crops (corn, soybean, wheat and grass/pasture) and obtaining an accuracy of 85% [27]. In a similar fashion, Xu et al. [28] classified two corn-dominated sites using Sentinel-1 time series, and obtained accuracies~90%. In these studies, the error metrics used to compare the temporal signatures with the time series of the elements to classify differed, but only slight differences were observed in the obtained accuracies [28]. The performance of this classification approach was comparable to results obtained with machine learning algorithms. Yet, the simplicity and robustness of the approach makes it transparent, portable from season to season and resilient to crop growth anomalies. However, further studies were recommended to explore the performance of this approach on regions with a higher crop diversity [27] and more heterogeneous conditions. Indeed, the number of crop classes considered strongly impacts the accuracy of the results obtained [24]. Typically, published crop classification studies based on SAR data considered six different crop classes or less [17,26,28], although some studies had a higher number-however, this was normally below 10 [25,29] or 15 [24,30]. Furthermore, some studies did not solely address crop classification, as they considered both crop categories and other land covers such as forests, water surfaces or urban areas [29]. Regarding the number of fields used for training and validation, the published studies differ greatly, although in many cases, they were below 100 [26,28] or 1000 [24,30]. Few published works considered near-operational applications with 10,000-100,000 fields or more and these used optical imagery [3,6]. In addition, the heterogeneity in terms of agro-climatic conditions of the area of interest and field sizes might impact the performance of the approach, as it has mostly been applied to rather homogeneous areas with large fields [26][27][28].
Therefore, the main objective of this study is to investigate the performance of a supervised crop classification approach based on crop temporal signatures obtained from Sentinel-1 time series on a challenging case study with a large number of classes and a high heterogeneity in terms of agro-climatic conditions and field sizes. Hence, the temporal signatures of all crop categories were interpreted and the confusion with other classes was evaluated. The study is framed on a real case study of crop identification for the aid of CAP inspections focused on the Spanish province of Navarre, a region of strong agro-climatic and geographic diversity with pervasive cloud cover. The region was stratified into different agro-climatic regions and different classification models were trained to evaluate the benefits of stratification. Thus, a specific objective of this study was to test whether the differences in crop growing conditions by agricultural region had any influence on the crop classification accuracy. Furthermore, the relation between field size and classification results was investigated. Special attention was also paid to the analysis of results obtained with classification models based on different sets of input data, comparing VH and VV polarizations, but also the VH/VV ratio and implementing an ensemble approach that jointly considered VH, VV and VH/VV temporal signatures. Different types of crops are grown in each region. Grasslands prevail in northern regions (1 and 2). Winter cereals are found throughout the whole province, while other winter crops, such as legumes and rapeseed, are common only in central regions (3 and 4). The diversity of crops in southern regions (6 and 7) is larger, where alfalfa, corn, asparagus, horticultural crops and permanent crops are also widespread. Corn, the most common summer crop, is mainly grown under irrigation systems, except in Region 1, where the high amount of precipitation allows rainfed cultivation. Other summer crops cultivated in Navarre are sunflower, found in every region, and rice, only grown in southern regions. It is important to mention that the proportion of fallow fields varies strongly depending on the region, being particularly high in Southern Navarre, especially in region 7.

Sentinel-1 Data
The SAR data used in this study were Sentinel-1 C-band Interferometric Wide (IW) swath mode images downloaded as level-1 Ground Range Detected (GRD) products. All available Sentinel-1 images covering the study area from 1 September 2015 to 30 December 2016 were acquired ( Figure 1). The images belonged to three relative orbits: one ascending node and two descending nodes (103ASC, 8DESC and 81DESC). Yet, only the 103ASC relative orbit covered the entire area of Navarre, whereas the descending nodes excluded a small area of the province from the image, specifically the Western part in 8DESC and the Eastern part in 81DESC ( Figure 2).
The images were processed using SNAP Sentinel-1 Toolbox following these steps: (1) thermal noise removal; (2) slice assembly; (3) apply orbit file operator; (4) calibration to β 0 ; (5) speckle filtering; (6) terrain-flattening; (7) range-doppler terrain correction and (8) subset to the extent of Navarre. The speckle reduction was performed using a 3 × 3 window Gamma-MAP filter. The SRTM 1sec HGT DEM was used for terrain-flattening and terrain correction. After these corrections, terrain-flattened γ 0 backscatter coefficient values were obtained. The results of this process were 110 images with three channels each: two backscatter coefficients (VH and VV) and their ratio (VH/VV) in dB units. The pixel size of the final products was set to 20 m. All the images corresponding to the three relative orbits were stacked according to their time.  The images were processed using SNAP Sentinel-1 Toolbox following these steps: (1) thermal noise removal; (2) slice assembly; (3) apply orbit file operator; (4) calibration to β 0 ; (5) speckle filtering; (6) terrain-flattening; (7) range-doppler terrain correction and (8) subset to the extent of Navarre. The speckle reduction was performed using a 3 × 3 window Gamma-MAP filter. The SRTM 1sec HGT DEM was used for terrain-flattening and terrain correction. After these corrections, terrain-flattened γ 0 backscatter coefficient values were obtained. The results of this process were 110 images with three channels each: two backscatter coefficients (VH and VV) and their ratio (VH/VV) in dB units. The pixel size of the final products was set to 20 m. All the images corresponding to the three relative orbits were stacked according to their time.

Ground Truth and Data Extraction
Ground truth information for training and validation was extracted from a geographical database containing field boundary polygons for EU Common Agricultural Policy (CAP) declarations (335,094 polygons) and inspections (21,850 polygons) for the agricultural year 2016. This database was provided by the Agriculture Department of the Government of Navarre (information   The images were processed using SNAP Sentinel-1 Toolbox following these steps: (1) thermal noise removal; (2) slice assembly; (3) apply orbit file operator; (4) calibration to β 0 ; (5) speckle filtering; (6) terrain-flattening; (7) range-doppler terrain correction and (8) subset to the extent of Navarre. The speckle reduction was performed using a 3 × 3 window Gamma-MAP filter. The SRTM 1sec HGT DEM was used for terrain-flattening and terrain correction. After these corrections, terrain-flattened γ 0 backscatter coefficient values were obtained. The results of this process were 110 images with three channels each: two backscatter coefficients (VH and VV) and their ratio (VH/VV) in dB units. The pixel size of the final products was set to 20 m. All the images corresponding to the three relative orbits were stacked according to their time.

Ground Truth and Data Extraction
Ground truth information for training and validation was extracted from a geographical database containing field boundary polygons for EU Common Agricultural Policy (CAP) declarations (335,094 polygons) and inspections (21,850 polygons) for the agricultural year 2016. This database was provided by the Agriculture Department of the Government of Navarre (information

Ground Truth and Data Extraction
Ground truth information for training and validation was extracted from a geographical database containing field boundary polygons for EU Common Agricultural Policy (CAP) declarations (335,094 polygons) and inspections (21,850 polygons) for the agricultural year 2016. This database was provided by the Agriculture Department of the Government of Navarre (information not publicly available). Field inspections were carried out from June to August 2016, as part of the verifications of CAP subsidy declarations.
The geographical database was refined as follows. A 5 m inside buffer was applied to the agricultural field boundaries in order to avoid mixed pixels that could contain information from different crops. Fields smaller than 0.5 ha were masked out in the declarations dataset. In the inspections dataset, the mask was less restrictive, leaving out fields that contained less than 6 pixels. Then, field statistics were computed, obtaining the median, mean, minimum, maximum and standard deviation backscatters values for the S-1 channels. For each field, a temporal signature with the information of the three relative orbits was obtained. Fields not covered in one of the descending orbits had missing values for those specific dates. Finally, the crops were grouped into 14 classes: alfalfa, asparagus, barley, corn, fallow, grasslands, legumes, oats, other crops, permanent crops, rapeseed, rice, sunflower and wheat. At most, 10% of farmers' declarations are expected to be wrong due to errors, fraud etc. [32]. These wrongly declared fields could affect the training phase by introducing erroneous information in the learning process of supervised classification algorithms. In order to minimize the impact of wrongly declared parcels in the declarations dataset, the 10% of fields whose median backscatter temporal signature differed the most with the median signature of its declared class in each region was discarded. The median was chosen to further reduce the effect of speckle inside the field. The comparison between the temporal signature of the crop and that of each field was based on the R 2 metric. The final number of fields in Navarre is shown in Table 2 and Figure 3 (see Supplementary Materials for the number of fields per region: Table S1).

Separability Analysis
As a preliminary step, a feature importance analysis was conducted comparing the statistical features calculated for the different polarizations. For this analysis, the Jeffries-Matusita (JM) distance was evaluated [33], which provides a measure of separability between pairs of classes and

Separability Analysis
As a preliminary step, a feature importance analysis was conducted comparing the statistical features calculated for the different polarizations. For this analysis, the Jeffries-Matusita (JM) distance was evaluated [33], which provides a measure of separability between pairs of classes and has been identified as a good indicator of crop separability [34]. The JM distance between a pair of probability distributions is given by the following: where JM ij is the JM distance between a pair of classes ω i and ω j , x is the feature observed (field backscatter statistics) and p (x|ω i ) and p (x|ω j ) are the conditional probability functions for x given ω i and ω j , respectively. JM distance values range between 0 and 2, with higher values indicating higher separabilities between pairs of crops. The JM distances were calculated per date for each pair of crops, for the different polarizations and for the different statistical features. A mean JM distance value was calculated for the whole period of study, averaging the distances obtained for each date. This mean JM-distance was used to compare the importance of the different polarizations and statistical features calculated. Two statistical features obtained the highest separabilities: the mean and the median. The median was chosen for the classification because of its higher robustness to outliers.

Classification
A supervised classification algorithm based on the temporal signatures of crops was applied. The algorithm is similar in concept to the one used in Whelen and Siqueira [26,27] and Xu et al. [28]. However, there are some significant differences in the way similarity is measured (see below). The basic concept behind this classification algorithm is that crops exhibit physical differences during the growing season due to their phenological development, which leads to different backscatter time series. Thus, the characteristic time series of each crop, as observed by Sentinel-1, is referred to as its temporal signature. This concept of temporal signature somehow mimics the idea of the spectral signature of a crop, i.e., the typical reflectance response of a crop with respect to wavelength. Therefore, we could define the temporal signature as the typical backscatter response of a crop with respect to time. The basic hypothesis is that these temporal signatures are representative of each crop and can be used to identify whether an unknown field belongs to one class or another. In the training phase, the temporal signatures of each crop class were obtained following a three-fold cross-validation scheme using the declarations dataset, where 2/3 of the ground truth samples were used for training and the remaining 1/3 for validation, in three successive random folds. Backscatter time series were formed by adding the information obtained at the three relative orbits used. After this training phase, each field was classified by comparing its temporal signature one by one with the different crop temporal signatures. In fact, each field was assigned to the class providing the best fit between its temporal signature and the crop temporal signature. To quantify the best fit, two variants were explored, the first variant assumed the best fit as the one with the lowest root mean squared error (RMSE), and the second as the one with the highest determination coefficient (R 2 ). This differed from related studies; Whelen and Siqueira [26,27] used a normalized error metric and Xu et al. [28] used a spectral similarity measure. Here, an error metric (RMSE) and a correlation metric (R 2 ) were evaluated, as they provide a complementary way to quantitatively compare two curves. Since the crops can have different calendars, the duration of the curves to be compared was adjusted to each crop type (Table 3). In addition, each crop could be characterized by three different temporal signatures depending on the channel used: VH, VV and VH/VV. Thus, three different classification schemes were applied by considering these channels individually. One additional scheme was explored by considering the similarity to the three Remote Sens. 2020, 12, 278 8 of 29 polarizations conjointly and named as the ensemble scheme (Ens). Therefore, in total, eight variants were explored (four input features × two goodness-of-fit measures). In order to test whether the differences in crop growing conditions by agricultural region had any influence on the crop classification accuracy, the supervised classification algorithm was fitted at different spatial extents: first, for the whole province of Navarre, and second, stratified for each of the seven agricultural regions. In the first case, the temporal signatures were obtained for the whole province, whereas in the second, specific temporal signatures per region were obtained.
For each classification scheme, the accuracy was assessed through a confusion matrix whereby the Overall Accuracy (OA) was calculated, as well as each crop's Producer's Accuracy (PA), User's Accuracy (UA) and F1-score (the harmonic mean between PA and UA). In fact, a two-level accuracy assessment was implemented. The first level corresponded to the three-fold cross-validation and used the 1/3 ground truth data set left out for validation on each of the three-folds. Additionally, a more rigorous second level validation was performed based on the field inspections dataset available, called external validation. This second dataset was obtained through field visits, thus, its reliability is believed to be higher. However, the sampling did not follow the standard recommendations for accuracy evaluation [35], and had a higher proportion of fallow, permanent crops and other crop classes ( Table 2), categories that can be more difficult to classify than others such as grasslands with a much lower number of samples. This might negatively impact the OA values obtained. In addition, the size of fields was smaller compared to the declarations dataset (Table 2). Therefore, the results should be interpreted bearing this point in mind. The accuracy metrics for these two validation levels were reported in the results. Due to the small size of several fields in the external validation dataset, accuracy results were also reported and analyzed by field size considering three size groups: (1) <0.5 ha; (2) 0.5 ha-1 ha; (3) ≥1 ha. Table 4 shows the results for the statistical features importance analysis, where the mean JM distances for all pairs of crops was calculated. The standard deviation time series obtained the lowest values, while the mean and median achieved the highest JM distances, and were very similar. As indicated in Section 2.4, the median time series was selected for computing the temporal signatures of each crop class due to its higher robustness to outliers. The separability values between each pair of crops for the median time series are represented in Figure 4. Although the values are, in general, low (<1), it must be pointed out that this is an average value for the whole period of study, thus, they should be used for comparison between crop pairs and polarizations. There were pairs of crops that obtained similar values (in most cases low) for the three polarizations. In VH polarization, rapeseed and rice had relatively high separabilities with most of the crops. In VV, alfalfa was the crop with the highest values, followed by oats and barley. In the case of VH/VV, the highest separabilities were achieved when comparing winter versus summer crops.

Separability Analysis
JM distances reported correspond to the average values for the whole period of study. The separability values between each pair of crops for the median time series are represented in Figure 4. Although the values are, in general, low (<1), it must be pointed out that this is an average value for the whole period of study, thus, they should be used for comparison between crop pairs and polarizations. There were pairs of crops that obtained similar values (in most cases low) for the three polarizations. In VH polarization, rapeseed and rice had relatively high separabilities with most of the crops. In VV, alfalfa was the crop with the highest values, followed by oats and barley. In the case of VH/VV, the highest separabilities were achieved when comparing winter versus summer crops. The temporal evolution of crop separability provides interesting information on the time periods that contribute the most to crop identification ( Figure 5). Monthly boxplots of JM distances for all crops show that in general the period between March and August provided the highest separabilities for the three polarizations (Figure 5a-c). When evaluating the separability between winter crops (barley, legumes, oats, rapeseed and wheat), both VH and VV polarizations had rather high values in spring months, while VH/VV distances were considerably lower (Figure 5d-f). In the case of summer crops (corn, rice and sunflower), separabilities were lower than for winter crops, yet they increased between May and August for VH and VV polarizations, and between July and November for VH/VV (Figure 5g  The temporal evolution of crop separability provides interesting information on the time periods that contribute the most to crop identification ( Figure 5). Monthly boxplots of JM distances for all crops show that in general the period between March and August provided the highest separabilities for the three polarizations (Figure 5a-c). When evaluating the separability between winter crops (barley, legumes, oats, rapeseed and wheat), both VH and VV polarizations had rather high values in spring months, while VH/VV distances were considerably lower (Figure 5d-f). In the case of summer crops (corn, rice and sunflower), separabilities were lower than for winter crops, yet they increased between May and August for VH and VV polarizations, and between July and November for VH/VV (Figure 5g-i). Permanent crops had in general low separability values throughout the whole period, taking slightly higher values in winter months for VH and VV polarizations, and in spring for VH/VV (Figure 5j-l).  Figure S1). For each polarization, the temporal curves of the three relative orbits had similar backscatter values and the trend was the same throughout the whole period of study. In the following sections, the temporal signatures are commented on by crop type.

Winter Crops (Barley, Legumes, Oats, Rapeseed and Wheat)
Winter cereals (barley, oats and wheat) are normally sown between October and November, with the exception of Region 7, where short cycle varieties of barley and wheat are also sown in January and February. Harvest takes place normally during June and July, starting earlier in the most southern regions. VH/VV signatures of winter cereals responded positively to the growth of vegetation along the season until the ripening stage ( Figure 6), which occurred in May (~250 day) for barley and in June (~275 day) for wheat and oats. Then, during senescence, VH/VV values decreased until the crop was harvested. It is possible to identify in the signature of barley that harvest occurred earlier than in wheat or oats ( Figure 6).
VH and VV temporal signatures were very similar for the three cereal crops during autumn and winter, with no clear trends and a strong variability between successive dates. In April (~200 day), a consistent decrease in backscatter was initiated in both polarizations, being sharper in VV. After 250 days, backscatter values increased again. In this phase, barley increased more than wheat and oats. During senescence, VH values in the three cereal crops and VV in barley decreased steadily until the harvest was finalized. The temporal signatures of cereals showed the same trends in all agricultural regions, except for Region 7, where VH and VV backscatter signatures were noisier (Supplementary Materials: Figure S1).
The legume crops cultivated in Navarre are mostly beans and peas. These crops, which are intended for fodder in the majority of the region, are grown from October to July. In this case too, VH/VV responded positively to crop growth ( Figure 6). Regarding VH, an increase in backscatter values was detected from the end of autumn due to the development of the crop. The onward increase of backscatter finished in June (~275 day), and was followed by a decrease due to crop senescence. In the case of VV, there was not a clear trend until March (~200 day), when backscatter started growing smoothly until June. VH backscatter values grew slowly in winter and early spring. There was a stronger increase from April (~215 day) to June (~275 day). After this peak, the values dropped, matching the senescence period. The temporal signatures of legumes were shorter in Region 7 (see Supplementary Materials: Figure S1). In this region, the increase of the curves came later-at the end of March-and the slope of the decreasing curves was steeper.
Rapeseed is mostly grown in the central area of Navarre. Its agricultural calendar is similar to that of winter cereals, with sowing in September and harvest in July. VH/VV signatures followed the growth of the crop along the cycle, reaching their highest peak in May (~250 day) ( Figure 6). VH backscatter values of rapeseed increased since the beginning of the cycle, but there was a steeper increase between the end of April (~240 day) and the peak at the end of May, coinciding with the inflorescence emergence. After this peak, VH backscatter dropped significantly until mid-July, when the crop was finally harvested. In the case of the VV signature, no clear trend was apparent until mid-April (~230 day), when the values first decayed and then suddenly increased, depicting a peak similar to the VH curve.

Summer Crops (Corn, Rice and Sunflower)
These three crops are usually sown in April; thus, the variability of the time series before this date was high in all cases (see the interquartile ranges in Figure 7). Harvesting of sunflower occurs around September before cereal is sown, as it is a common crop for cereal rotations. Rice and corn harvest dates can range from October to December.  Looking at VH/VV time series (Figure 7), the temporal signature responded positively to the growth of corn and rice, but for sunflower, it was not as evident, particularly in the descending orbit curves. For this crop, backscatter values increased both in VH and VV from the beginning of June until August. In this case, VH/VV did not adequately resemble crop growth. In corn, there was an increase in VH backscatter from plant emergence until July. After that moment, there were no clear trends. Corn VV series did not show any clear trend during the growing cycle. Rice is flooded before sowing, and this caused an evident effect in VH and VV time series. There was an increase in backscatter at the beginning of April, coinciding with soil tillage, followed by a strong decrease due to the specular reflection caused by standing water in the fields. Once the crop started to grow, backscatter also increased steadily until August (~336 day). After that point, VH backscatter continued increasing until the end of the period of study, whereas VV remained steady till November (~400 day), when it started to grow. It is remarkable that the VH/VV time series of corn were all similar throughout the different regions, except for the northern regions, which were shorter than the rest (see Supplementary Materials: Figure S1).  Alfalfa is a crop that can be sown in different periods of the year (September or spring months), whose management normally includes irrigation and several cuts throughout the cycle. These cutting events complicate the interpretation of the crop signature since different fields might be cut on different dates, resulting in a poorly informative average curve. Yet, although VH and VV temporal signatures showed different peaks during the period of study, the values remained relatively Alfalfa is a crop that can be sown in different periods of the year (September or spring months), whose management normally includes irrigation and several cuts throughout the cycle. These cutting events complicate the interpretation of the crop signature since different fields might be cut on different dates, resulting in a poorly informative average curve. Yet, although VH and VV temporal signatures showed different peaks during the period of study, the values remained relatively constant. A small increase in VH/VV was observed from January to May.
Asparagus follows a cycle of two years. This crop is planted between February and April of the first year, and harvested between March and May of the second year. VH and VV temporal signatures showed many backscatter peaks along the period of study. It is possible to detect an increase of VH from May to November, the period of vegetative growth (Figure 8). VH/VV also augmented in this period, although it increased very smoothly.
Backscatter signatures of grasslands were heterogeneous and did not follow any defined pattern ( Figure 8). In this case (as in alfalfa), the influence of individual cuttings complicated the interpretation of the temporal signature.
Permanent crops are another heterogeneous group including different woody crops: vineyards, olive trees, almond trees and different types of fruit trees. In VH and VV curves, there were backscatter peaks that were smoothed during spring and summer. VH/VV slightly increased from mid-spring until mid-autumn. Backscatter signatures of grasslands were heterogeneous and did not follow any defined pattern (Figure 8). In this case (as in alfalfa), the influence of individual cuttings complicated the interpretation of the temporal signature.
Permanent crops are another heterogeneous group including different woody crops: vineyards, olive trees, almond trees and different types of fruit trees. In VH and VV curves, there were backscatter peaks that were smoothed during spring and summer. VH/VV slightly increased from mid-spring until mid-autumn.
The great variability of crops included in the "other crops" class was clearly visible in the dispersion of the class' temporal signature (Figure 8). This class included herbaceous and horticultural crops with both summer and winter cycles. There were no clear trends and the interquartile range was large throughout the period of study.
Fallow temporal signatures also had high interquartile ranges. For all the three polarizations, there were no clear trends, with different peaks along the period of study. (a)

General Results
Overall Accuracy (OA) results showed differences among regions, polarization channels used as input, best fit metric and type of validation (cross-validation or external validation) (Figure 9). The best classification results were achieved in Region 3 in cross-validation, using Ens as the input and R 2 as the best fit metric, with OA higher than 85%. On the other hand, using only VH/VV as input and R 2 resulted in the poorest results, with an OA of around 50% in specific regions. In Figure 10, the classification map obtained with the best configuration (Ens scheme using R 2 as the best fit metric) is represented.
External validation provided systematically lower OA values than cross-validation, with the exception of Region 3, where the results were very similar. Although the accuracies were lower in the external validation, it can be observed that the results followed a similar distribution to the cross- The great variability of crops included in the "other crops" class was clearly visible in the dispersion of the class' temporal signature (Figure 8). This class included herbaceous and horticultural crops with both summer and winter cycles. There were no clear trends and the interquartile range was large throughout the period of study.
Fallow temporal signatures also had high interquartile ranges. For all the three polarizations, there were no clear trends, with different peaks along the period of study.

General Results
Overall Accuracy (OA) results showed differences among regions, polarization channels used as input, best fit metric and type of validation (cross-validation or external validation) (Figure 9). The best classification results were achieved in Region 3 in cross-validation, using Ens as the input and R 2 as the best fit metric, with OA higher than 85%. On the other hand, using only VH/VV as input and R 2 resulted in the poorest results, with an OA of around 50% in specific regions. In Figure 10, the classification map obtained with the best configuration (Ens scheme using R 2 as the best fit metric) is represented. validation output (Figure 9). In the following sections, the cross-validation results will be analyzed (see Supplementary Materials for the detailed external validation results).   validation output (Figure 9). In the following sections, the cross-validation results will be analyzed (see Supplementary Materials for the detailed external validation results).   External validation provided systematically lower OA values than cross-validation, with the exception of Region 3, where the results were very similar. Although the accuracies were lower in the external validation, it can be observed that the results followed a similar distribution to the cross-validation output (Figure 9). In the following sections, the cross-validation results will be analyzed (see Supplementary Materials for the detailed external validation results).

Input Features and Goodness-of-Fit Metrics
OA results showed that classifications based on single polarization channels did not reach satisfactory results (Figure 9). VH and VV channels achieved similar accuracies in each region individually (at best 80% OA), while classifications using the VH/VV ratio as the input obtained lower values (at best, 75% OA). On the other hand, considering all the available input features (Ens in Figure 9) yielded the best results for all the regions, reaching 87% of OA in Region 3.
Regarding the goodness-of-fit metric used for classification, R 2 obtained the best results for VH, VV and Ens, while RMSE was the best for VH/VV. Although R 2 achieved the best global results for the most favorable schemes (Ens), there were differences among crop classes. As is shown in Table 5, legumes, oats, rapeseed and rice achieved an improvement of the F1-score between 6% and 18% when RMSE was considered. On the other hand, alfalfa, fallow, grasslands and permanent crops obtained better results when R 2 was used. Asparagus, barley, corn, sunflowers, other crops and wheat did not present large differences between one metric and the other.

Results per Region
Regarding the classification results for the different regions, the OA values varied strongly. Focusing on the Ens approach with R 2 as the goodness-of-fit metric (Figure 9), Region 3 obtained the highest accuracies (OA = 87%). The accuracies in Region 2 were also good (i.e., 84%). Region 4 and 5 also had good OA results-around 80%. In Region 1 and Region 6, the OA results yielded 78% and 74%, respectively, whereas the worst OA results were achieved in Region 7 (64%). Table 6 shows the increment in crop accuracy (PA, UA and F1-sore) when the regional results were compared to Navarre's results. The regional accuracies were extracted from a confusion matrix built by adding the individual confusion matrices of each region (Supplementary Materials: Table S2). This matrix allowed for an easy comparison of whether the stratification of the province in agricultural regions improved the general classification results. PA, UA and F1-score results were higher for the stratified case, with the exception of PA-and F1-score of fallow, which decreased by 2% and 1%, respectively, UA of grasslands decreased by 3% and PA of rapeseed decreased by 1%. In total, OA increased from 72% to 77%. Table 6. Increment (∆) in crop accuracy (PA, UA and F1-sore) when the stratified regional results are compared to Navarre's results for Ens scheme (using R 2 as the best fit metric).

Results per Crop
In the following section, the results per crop are provided, reporting the Ens and R 2 classification scheme for Navarre (Table 7) (for the rest of schemes, see the Supplementary Materials: Table S2 and Figure S2). Interesting results related to polarization channels and fit metrics are also reported. Table 7. Confusion matrix of Ens scheme in Navarre (using R 2 as the best fit metric). Values are given as a percentage of the total number of fields classified. Barley is the crop that achieved the best classification results, with F1-score of 87% and balanced PA and UA results ( Table 6). The F1-score of wheat was 71%, with higher UA than PA (84-69%, respectively) ( Table 6). On the other hand, oats obtained poorer classification results, with an F1-score of 42%. Although PA was relatively high (70%), UA was low (30%), due to many wheat fields being incorrectly assigned to oats (Table 7). In general, the results obtained per region were similar (Supplementary Materials: Table S2 and Figure S2). Region 2 and Region 3 achieved the highest PA, UA and F1-score for the three cereal crops compared with other regions. In Region 1, although most barley fields were correctly classified, the F1-score was worse due to the incorrect assignment of grasslands fields to the barley class. The lowest classification accuracies of winter cereals occurred in Region 7, where other crops were also misclassified as barley and wheat, and many fallow fields were incorrectly classified as winter cereals.
Legumes obtained an F1-score of 47%, with a higher accuracy in PA than in UA (70% and 35%, respectively) ( Table 6). Incorrectly classified legumes fields were assigned mainly to the rapeseed class (Table 7), and in a lower proportion, to winter cereals, grasslands and fallow. On the other hand, some grasslands, winter cereals (barley and wheat), fallow and other crops fields were wrongly assigned to legumes causing low UA values (Tables 6 and 7). In the case of the different regions, the values of PA were also higher than UA for legumes (Supplementary Materials: Table S2 and Figure S2). The low values of UA were mainly due to incorrectly classified grasslands and in some regions (6 and 7), fallow fields. The exception was Region 3, where both PA and UA scores were high, achieving an F1-score of 89%.
Rapeseed fields reached high PA (99%), while UA was not as good (59%), leading to an F1-score of 74% (Table 6). The fields wrongly assigned to rapeseed belonged principally to legumes, other crops, barley and fallow classes (Table 7). Per region, UA values were higher than in Navarre (Supplementary Materials: Table S2 and Figure S2), reaching high values of F1-score, especially in Region 3 (97%). The exception was Region 6 (UA = 20%), due to the lower number of rapeseed fields and the incorrect assignment as barley, legumes and fallow fields.

Summer Crops (Corn, Rice and Sunflower)
The accuracy values of summer crops in Navarre were variable (Table 6). Rice fields were mostly correctly classified, reaching a PA of 99%, UA of 74%, and an F1-score of 84% (Table 6). Corn had an F1-score of 78%, with balanced PA and UA values (75% and 81%, respectively), and some confusion with asparagus and other crops fields. Wrongly assigned fields to corn were mainly the other crops class. Sunflower was the summer crop with the worst results (F1-score of 55%, PA of 87% and UA of 41%), with confusion with other crops and fallow fields, causing these low UA values (Table 7).
Rice, which is only present in Regions 6 and 7, achieved slightly higher accuracies when the classification per regions was considered, with F1-scores of 90% and 93% for Regions 6 and 7, respectively (Supplementary Materials: Figure S2). In the case of corn, the classification was highly accurate in Regions 3 and 5 (F1-scores of 96% and 93%). In Regions 6 and 7, where the majority of corn fields were found, the F1-scores were also good (around 83%). Yet, in Regions 1 and 2, the UA was much lower, mostly due to confusion with grasslands. In Region 4, UA values were not high (43%) due to the small number of corn fields. Sunflower repeated the patterns of Navarre in the different regions, with lower UA values compared to PA values, except in Region 3, where, again, both PA and UA were high (100% and 91%) (Supplementary Materials: Table S2 and Figure S2).
Rest of Crops (Alfalfa, Asparagus, Grasslands, Permanent Crops, Other Crops and Fallow) The classification of alfalfa fields achieved an F1-score of 48%, with higher PA (74%) than UA (35%) ( Table 6). PA's errors were mostly due to being wrongly assigned to legumes, while UA values were impacted by confusion with grasslands, permanent crops and fallow (Table 7). In Regions 6 and 7, where alfalfa is most frequent, the results were similar (Supplementary Materials: Table S2 and Figure S2).
Asparagus was one of the crops with the poorest classification results, with an F1-score = 25%, PA = 60% and UA = 16%. Many fields belonging to the class permanent crops were classified as asparagus and, in a smaller proportion, corn and other crops (Table 7). These incorrect assignments were even higher than the actual number of asparagus fields, which explained the low UA. Similar results were obtained per region (Supplementary Materials: Table S2 and Figure S2).
Grasslands was the largest class in the declarations dataset. The classification had higher UA values (92%) than PA (71%), achieving an F1-score of 80% (Table 6). Although PA was not low, wrongly classified grassland fields impacted the UA of many different crops due to their higher proportion ( Table 7). The main errors in PA were due to the confusion with other crops, legumes, alfalfa and permanent crops. Per region, similar results were obtained, except for Regions 6 and 7, where results were worse (F1-scores below 70%) (Supplementary Materials: Table S2 and Figure S2).
The class permanent crops obtained intermediate accuracies (PA = 65%, UA = 72% and F1-score = 68%). The main error in UA was related to incorrectly classified grasslands and fallow fields, while PA was affected by the confusion with asparagus class. Similar results were achieved per region (Supplementary Materials: Table S2 and Figure S2), with the exception being Region 3, where UA dropped to 29%.
Other crops was the class that was worst classified, with F1-score = 12%, PA = 13% and UA = 12% (Table 6). In all the regions, the results were also poor, and confusion occurred with many different classes, i.e., grasslands in northern regions, and permanent crops and fallow in southern regions.
Finally, fallow classification achieved better UA than PA (88% and 58%, respectively) and had an F1-score of 70%. The wrongly classified fallow fields were assigned to many different crop classes, but particularly to legumes, barley, other crops and grasslands. In contrast, in Region 1, the classification results were poor (F1-score = 2%), while in the rest of the regions, results were similar to those of the whole province.

Influence of Field Size
In most of the classification schemes and regions, the OA values obtained were higher for large fields in comparison with small ones. In some cases, the difference between small (<0.5 ha) and large fields (>1 ha) was higher than 14% (Table 8). Regarding the results per crop, it can also be observed that, in most cases, F1-score values were higher for large-and medium-sized fields, in contrast to smaller fields ( Figure 11). This was particularly true for fallow and rice classes, where the accuracies were considerably higher for large fields than for small ones. Remote Sens. 2019, 11, x FOR PEER REVIEW 22 of 29 Figure 11. F1-score, Producer's Accuracy (PA) and User's Accuracy (UA) of Ens scheme (using R 2 as the best fit metric) based on field size for external validation in Navarre.

Crop Classification Results with Regard to Its Temporal Signatures
This study implemented a supervised classification technique based on the temporal signature of crops to a CAP inspection case study on a region with strong agro-climatic diversity. The results obtained varied significantly from crop to crop, with the crops that exhibited any singularities in their time series being the Figure 11. F1-score, Producer's Accuracy (PA) and User's Accuracy (UA) of Ens scheme (using R 2 as the best fit metric) based on field size for external validation in Navarre.

Crop Classification Results with Regard to Its Temporal Signatures
This study implemented a supervised classification technique based on the temporal signature of crops to a CAP inspection case study on a region with strong agro-climatic diversity. The results obtained varied significantly from crop to crop, with the crops that exhibited any singularities in their time series being the best classified. Cereals, and in particular, wheat and barley, are the most common crops in the region, and showed quite a distinctive temporal signature with a strong decay in backscatter in the stem elongation phase, where surface scattering from the soil was progressively attenuated by growing cereal stems [36]. This decay was particularly strong in VV, due to the vertical arrangement of stems through the so-called differential attenuation mechanism [37,38]. This characteristic decay was followed by an increase in backscatter due to the structural changes experienced by the canopy during the heading stage [23,37,39,40]. At this moment, a strong and well-defined peak occurred in VH and VV barley time series, probably due to the bending of barley spikes at this phase [38,39], which did not occur in wheat and oats that maintained a vertical geometry. From ripening to senescence and harvest, cereal canopies dried-out and a greater penetration [41] and a higher influence of the soil surface [37] was observed. The unique backscatter pattern of barley allowed a successful identification with F1-scores of~85%. Wheat was also adequately identified (F1-score of~76%), although was sometimes confused with oats, which was the cereal crop with the poorest results (F1-score of~50%).
The architecture of legumes and rapeseed canopies is completely different from cereals, with heterogeneous shrub-like structures that produce volume backscatter [39]. This is the reason why VH and VV time series were so different from cereals, although the period of cultivation was the same. Unlike cereals, rapeseed and legumes' curves in VH started increasing right after the germination of the crop, due to the volume scattering produced by plants [38,[42][43][44]. This behavior changed suddenly for rapeseed at the period of pod formation after flowering, producing a strong peak in VH (and to a lesser extent in VV). In VV, just before this peak, a small backscatter decrease was observed coinciding with the start of flowering, as also reported by Wiseman et al. [45] and Veloso et al. [23]. Ripening and senescence produced a strong decrease in VH backscatter (and to a lesser extent in VV) due to canopy drying and a reduction of the volume scattering component. This decrease occurred for both rapeseed and legumes, although it was much greater for the former. Accordingly, classification results were successful for rapeseed (F1-score of~75%) but worse for legumes (F1-score of~50%), whose temporal signature was confounded with other crop classes.
Summer crops also achieved good classification results. Corn, with an F1-score of~75%, was characterized by a rather insensitive VV curve, and on the contrary, a clear increase in VH due to volume scattering during the vegetative growth phase (250-300 days) until the plants reached their maximum height [46], after which it remained rather constant. Rice achieved an F1-score of~85%, mainly due to its characteristic time signature due to flooding at the time of sowing, which led to very low backscatter due to specular reflection [47]. After rice emergence, VV increased rapidly due to double-bounce scattering, followed by a subsequent increase in VH polarization caused by volume scattering from the rice canopy that went on until the end of the season. Sunflower was the summer crop with the poorest F1-score (~60%), but this was not expected due to its rather unique temporal signatures exhibiting both VH and VV sensitivity to crop growth in the stem elongation phase where most vegetative growth occurred (250-400 days). The structure of sunflower, with broad leaves and thick stems with large open spaces between plants, caused both volume and double-bounce backscatter [48]. Sunflower fields were mostly correctly classified (PA > 85%), but a significant number of grasslands, fallow and other crops fields were incorrectly assigned to sunflower, leading to low UA values.
Asparagus, although having a biannual cycle, had a backscatter behavior similar to summer crops, with an increase in VH in summer (250-400 days) coinciding with the period of vegetative growth. Bargiel et al. [49] observed an increase in VV backscatter at X-band for this crop during summer, but our data at C-band showed a rather constant VV response. Classification results for asparagus were poor (F1-score of~25%), mainly due to the incorrect assignment of corn and grasslands fields to asparagus (UA~20%).
Some categories have a large variability in terms of management or species compositions. For instance, grasslands, alfalfa and fallow are classes where agricultural practices can vary significantly. Grasslands can range from an intensified management (in terms of soil preparation, sowing, fertilizing and mowing) to extensive rangelands that behave more like permanent covers and are only grazed in summer season. Fallow fields can also vary significantly depending on the management applied to spontaneous vegetation, i.e., it might be left to grow or it might be removed either chemically or mechanically. Alfalfa is a forage crop that is mowed several times throughout the season. However, the exact mowing dates might vary significantly from field to field, thus, the mean temporal signature might not be informative for this crop, showing a high variability during the season (the same applies for grasslands and fallow). Surprisingly, classification results were not as poor as expected given the rather unspecific temporal signatures of these crops. The three categories achieved quite good PA values (~70% for alfalfa and grasslands and~60% for fallow), and UA values were high for grasslands and fallow but low for alfalfa. In fact, a significant amount of grasslands and fallow fields were incorrectly assigned to alfalfa causing its UA to fall. Altogether, such high variability categories should be better addressed by designing field-specific strategies (not based on median class signatures), such as those based on the multitemporal interferometric coherence [50].
Permanent crops was another heterogeneous class including vineyards, olives and other fruit trees, yet its classification was not too bad (F1-score~70%). In this case, soil backscatter was predominant due to the open spaces between trees. Thus, during the period of study, the backscatter values presented peaks without a clear increasing or decreasing pattern, probably responding to soil moisture dynamics. Li et al. [51] also found a stable VH signal throughout the year in almond and walnut trees. During spring and summer, backscatter variations in VH and VV were smoothed, probably due to slower soil moisture dynamics at this time of the year, and because the growth of leaves during this period attenuated soil backscatter.
However, the worst results were obtained by the other crops class, a very heterogeneous category composed of a range of different minor crops, with both summer and winter cycles. Again, in this case, temporal signatures did not show any clear pattern, and in turn, its variability was very large, particularly in summer months. As a result, F1-scores dropped down to~10%, and other crops fields were confounded both with summer (i.e., corn and sunflower) and winter crops (i.e., wheat and legumes). A redefinition of this class (e.g., splitting it into two more specific categories), could lead to an improved classification of this class, but also to higher UA values for other classes that were mistaken with it.

General Results
Previous research recommended multi-temporal optical data as the primary source for crop classification [52], reporting overall accuracies~12% higher when optical data were used instead of SAR data. Indeed, crop classification approaches based on optical data provided typically accuracies above 75%, as long as imagery acquired in key phenological stages were available [3,6,15,53]. Yet, optical and SAR data provide complementary information that when combined might result in enhanced classification results [54]. However, when cloud conditions limit the viability of optical data, SAR observations still provide useful information [4,26]. The classification results obtained here were considered successful given the number of crop classes considered, the large number of fields classified and the agro-climatic diversity of the territory.
In this study, the configuration using the three S-1 features as the input (Ens) obtained the best results. For most crops, VH/VV responded positively to crop growth from emergence to senescence. Previous studies already identified VH/VV as a notable indicator of crop development correlated with fresh biomass [23] and plant height [55]. Yet, this information was very similar for crops of the same season and did not enable their separation. Therefore, VH/VV might differentiate when a field is vegetated or not, and even the start and end of the growing season, but crops with different morphological structures might produce very similar VH/VV temporal signatures. On the contrary, VH and VV polarizations provided diverse time series depending on different backscattering mechanisms and phenological events of the crops, and thus were more useful for crop classification. In each phenological stage, the structural characteristics of vegetation elements (size, shape, density, orientation) can vary [51], but also their water content and dielectric properties [41], in particular in the senescence phase [23,26]. VH and VV were useful to identify cereal heading [56] and rapeseed pod filling. Regarding backscattering mechanisms, VH was sensitive to volume scattering (e.g., in legumes, rapeseed, corn and sunflower) [26,51], and can be used as a vegetative growth indicator for these crops [57]. On the other hand, VV polarization was sensitive to surface scattering and its attenuation by cereals in the stem elongation phase [17,37], and also to double-bounce effects on sunflower [48] and rice [58]. Rice flooding was also easily detected in both VH and VV [58]. During the rest of the year, the main drivers of backscatter dynamics were meteorological events, soil preparation practices and harvesting procedures [26].

Influence of Field Size
Classification results were clearly influenced by the size of the fields, with fields >1 ha showing an OA improvement of~14% when compared to small fields (<0.5 ha). Similar results were already reported in the recent literature [6], showing that the operational applicability of crop-type mapping based on imagery with a spatial resolution of~20 m in regions with very fragmented landscapes requires further advances to address the small field issue.

Regional Stratification
In general, our results improved when the classification algorithm was trained and applied separately to the different agricultural regions, with increments of~5% in the OA and stronger improvements (10%) in the F1-score for particular crops such as alfalfa, legumes and asparagus. In our case, small and homogenous regions with a rather reduced crop legend and a small proportion of "difficult" categories (other crops, fallow, etc.) obtained the best results (i.e., Region 3). In turn, regions with a larger crop diversity, a higher proportion of fallow fields and more variability in crop management techniques obtained poorer results. In particular, southern regions (Region 6 and Region 7), due to their dryer climate, had a mix of rainfed and irrigated agriculture leading to a larger crop variety (including a higher proportion of other crops and fallow fields), more variable temporal signatures and poorer classification results. This occurred for categories such as winter cereals or legumes, where both long and short cycle varieties coexisted, some of them were intended for forage and some for human consumption. Corn was another category that improved after regional stratification, due to its diversity, with a shorter cycle rainfed corn for fodder production in Region 1 and a longer cycle irrigated corn for grain production in the rest of the province. Altogether, the results of this study showed that in geographic areas with great agro-climatic diversity, crop temporal signatures might change significantly; therefore, operational classification approaches should be applied to stratified regions with less variability.

Methodological Aspects
Some other methodological details were found to have an influence on the results obtained. First, the use of either R 2 or RMSE as the metric for finding the best matching category did not provide consistent results in all cases. In general, the best classification strategies were those using R 2 . However, we found that RMSE provided the best results when VH/VV was the only input feature used, whereas R 2 was better for VH, VV and Ens. The different shape of their temporal signatures (with smoother dynamics for VH/VV) could be behind these differences. Analyzing the results per crop, it was found that oats, rapeseed, legumes and rice obtained significantly better results with RMSE (6-18% improvement in F1-score) when the Ens configuration was used. In contrast, alfalfa, permanent crops, grasslands and fallow had higher F1-scores with R 2 , and for some cases, differences were minor (e.g., asparagus, barley, corn, sunflower or wheat). R 2 might be more insensitive to class variability, with fields with slightly delayed phenology still having high R 2 values, as long as they show a similar trend. Furthermore, RMSE might be more sensitive to the actual backscatter values and to abrupt changes in the time series due to particular phenological events or soil moisture variations.

Future Research
In this study, the duration of the time series was adjusted to the seasonality of the crops to be compared (winter, summer and permanent crops). This aspect was crucial due to the negative influence of the periods without vegetation cover on the variability of the time series. During autumn and winter months, differences in soil preparation dates and techniques (particularly important in furrowed crops such as asparagus) and soil moisture dynamics responding to wetting and drying events [23] are the main backscatter drivers because vegetation is still short. Furthermore, in the case of summer crops, during this period, spontaneous vegetation might show false dynamics that confound crop recognition. After harvest, some fields might exhibit regrowth of natural vegetation, while some others might be quickly tilled and prepared for the next rotation, thus, this period also introduces uncertainty into the classification. Future improvements of this classification approach might try to further adjust the duration of the growth season, attending to the periods were largest differences were observed between crop classes. Additionally, more attention should be paid in the future to the establishment of the crop legend to classify, since mixed classes such as legumes, other crops or grasslands have shown to be too heterogeneous, resulting not only in their poor identification but also in the confusion with other categories, affecting the overall results of the classification. Finally, further research is required to compare the results obtained using radar backscatter data versus optical reflectance on crop classification strategies like this based on image time series data. Furthermore, a comparison with state-of-the-art machine learning classification algorithms (e.g., Random Forests or Support Vector Machines) will provide interesting information for the selection of the best classification approach in each case study.

Conclusions
This paper presented a supervised crop classification technique based on the temporal signatures extracted from Sentinel-1 (VH, VV and VH/VV) time series, and applied it to a large dataset, framed in the CAP inspection process in a region with high agro-climatic diversity.
The results showed that crops whose temporal signatures depicted singularities along the growing season achieved accurate classification results (F1-score~75%), e.g., winter cereals or rice. VH and VV temporal signatures proved to be sensitive to various phenological events where the structural characteristics or the water content of the canopy varied. In contrast, VH/VV were sensitive to vegetation growth, allowing the determination of whether a field was vegetated or not but providing poor information for crop identification. The combination of VH, VV and VH/VV time series as input features provided accurate results (OA > 70%).
A detailed analysis of the results indicated that field size strongly influenced the results with large fields (>1 ha) achieving~14% higher accuracy than small ones (<0.5 ha) for the same class. Agro-climatic diversity was also crucial, with results improving when classifications were stratified for local agricultural regions, in particular, for legumes, alfalfa or asparagus. Additionally, crop diversity, variability in terms of management techniques and a high proportion of fallow fields negatively affected the obtained results per region. The definition of the crop legend should avoid heterogeneous crop classes (e.g., other crops or legumes) that were found to be difficult to identify, and affected the UA of some classes with good PA. Whenever possible, these classes should be divided into more homogeneous ones.
Altogether, the obtained results suggest that similar approaches based on Sentinel-1 time series could be implemented operationally in regions with frequent cloud cover, in the framework of CAP inspection, or with any other purposes like crop acreage estimation or ensuring food security. Further studies comparing these results with those obtained using optical image time series, or using state-of-the-art machine learning classification algorithms might confirm this conclusion.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-4292/12/2/278/s1, Table S1: Ground truth data. Number of agricultural fields and their area in the declarations and inspections (Regions 1-7). Figure S1. Median temporal signatures of crops for VH, VV and VH/VV (Regions 1-7), Table  S2: Confusion matrix of g and R 2 classification for cross-validation (Regions 1-7), Figure S2: F1-score, PA and UA for the different polarization channels in Cross-validation and External validation (Navarre and Regions 1-7), Figure S3: Overall Accuracy of External validation based on field size: <0.5 ha (left), 0.5-1 ha (middle) and >1 ha (right), Figure S4: F1-score, PA and UA of g and R 2 classification based on field size for external validation (Regions 1-7). Funding: This work was supported by the Spanish Ministry of Economy and Competitiveness and the European Regional Development Fund (MINECO/FEDER-UE) through a project (CGL2016-75217-R) and a grant (BES-2017-080560). It was also partly founded by project PyrenEOS EFA 048/15, that has been 65% cofinanced by the European Regional Development Fund (ERDF) through the Interreg V-A Spain-France-Andorra programme (POCTEFA 2014-2020).