4.2. Quantitative Classification Evaluation
The total test area covers around 254 thousand hectares including the background class. Of this area, background, spring barley, and winter wheat cover 84,917, 39,712, and 35,219 hectares, respectively (Table 1
). Thereby, those three classes take up around 62.76% of the total test area. In addition, the areas covered by all 15 classes except green grain spring barley were larger than 1000 hectares. The area covered in this study is, therefore, more than 96 times bigger than that used in [21
], which demonstrates the generalizability of our approach. Figure 6
shows a normalized confusion matrix of the predictions on the test set. The average pixel-based accuracy is 86%, showing the success of the proposed method in recognizing 15 different classes using multi-temporal C-band SAR Sentinel 1 images.
The highest accuracies were achieved for winter rapeseed, winter barley, winter wheat, spring barley, and sugar beet, achieving 95%, 94%, 93%, 90%, and 90% accuracies, respectively. The network, however, had trouble classifying green grain spring barley and permanent grass correctly. Green grain spring barley was mostly confused with spring barley, which makes sense since the only difference between the two is the time of harvest. The only difference is, therefore, the last images from the season. Permanent grass is mostly confused with the background class, which is believed to be caused by the fact that land areas not registered as farmland have permanent grass. Thereby, the background class also contains permanent grass, which confuses the classifier.
One of the important and interesting classes is the background class, which contains all regions that are not registered as farmland such as forest, buildings, roads, lakes, and sea. The pixel-based accuracy for the background class is 88%. One of the reasons for the high accuracy of this class, despite its high intra-class variability, could be the difference in temporal development of the background compared to the other classes. Hence, the pixel intensity of the different crops undergoes essential changes in the four-month main growth period; no such changes were found for background class pixels in the same period of time. The positive impact of temporal information can be seen in Figure 7
, where the performance of our approach in July and August was significantly better than that in May.
The reason for the better performance at the end of the growth season is probably due to more temporal information. From the error column in Figure 7
, it appears that limited information in the Sentinel 1 data was provided for the model before June, which is in line with the phenology of the various crop types that are approximately similar in the early growing season. Therefore, the information available from Sentinel 1 in the early months is not sufficient for the model to separate all the different classes from each other with high accuracy. However, winter wheat, spring barley, and winter rapeseed can be distinguished in May. Furthermore, by adding multi-temporal data from June, new crops such as winter barley, maize, and sugar beet were predicted with more than 86% pixel-based accuracy as compared with the remaining classes (Table 2
and Figure 8
In Table 2
, it is shown that the IoU for winter barley and potato, respectively, were 0.42 and 0.09 in May. These figures improved significantly in June to 0.73 and 0.68, respectively. The phonological profiles for winter barley and wheat were close in May, but in the next month phonological changes for these two crops started to differ. Thus, the model could separate winter barley from winter wheat well when including the data in June (Figure 8
and Figure 9
). Another interesting class is spring oats because the accuracy of this crop was only 1% by the end of May, but reached 71% in July (Table 2
). Overall, the accuracy and IoU indices for all classes did not change significantly from the end of July till the end of August (Figure 8
and Figure 9
). Therefore, by using only the multi-temporal data from May, June, and July, the proposed model could recognize 15 different classes with the accuracies presented in this study and we could omit multi-temporal images in August from the training procedure (Table 2
shows the IoU for all classes. This index is extensively used in the semantic segmentation and indicates the degree of overlap between the reference image and the output of the network; thus, the IoU alongside the pixel-based accuracy reveals the performance of the proposed network clearly. The mean value of the IoU index was found to be 0.64; however, as it can be seen in Figure 10
, the IoU varied for different classes based on the amount of complexity or the degree of dissimilarity within each class. Winter rapeseed with 0.90 had the highest IoU; green grain spring barley had the lowest IoU of 0.24. Only a few fields of green grain spring barley were predicted correctly and this crop was thus the most complicated case in the current research (Figure 10
). This could be because green grain spring barley has a growth pattern similar to spring barley; furthermore, the harvest of this crop was conducted only two weeks before the harvest of the spring barley crop.
Winter triticale is a hybrid of winter wheat and winter rye, and it can be seen in the confusion matrix as well. The largest number of misclassified pixels belong to these two classes. Therefore, there are reasonable causes for the rather poor ability of the network to separate winter triticale from other crops.
From Figure 11
and Figure 12
, it appears that the confidence is lowest in the boundary area of the different fields while there are fewer misclassified pixels inside the fields compare with the borders. Similar results have been reported by the literature [33
]. The main errors of our approach were in the boundary regions of fields (Figure 11
and Figure 12
) and only a few fields were classified incorrectly, which could be due to field dissimilarity. The network’s errors in the field boundaries are logical as the reference images might not be fully aligned with the fields. Moreover, the spatial resolution of 10 m/px also introduced noise from shelter-belts or neighboring fields in the boundary areas. Similar results were reported by [33
]. During the preprocessing steps of the Sentinel 1 data, different types of errors appeared, so we cannot expect to accurately extract the field borders from the reference images.
One of the main advantages of the proposed network is the identification of fields that are likely annotated incorrectly in the reference images. As an example, in Figure 13
, three different fields in the reference image were annotated as winter rapeseed, spring oats, and permanent grass, while our approach classified them as maize, spring barley, and background, respectively, with high confidence and sharp field boundaries. Thus, we may be able to improve the performance of the network if all the reference data are annotated correctly (Figure 13
Overall, by evaluating the results obtained from our approach, we can conclude that this method is able to identify 15 different classes of agricultural crops (including background) from each other with 86% accuracy. Moreover, in the future work, the goal is to utilize both Sentinel 1 and 2 images to study the capability of combining SAR and multi-spectral data to recognize 14 crop types and background using our approach.