Fusion of Sentinel-1 with Ofﬁcial Topographic and Cadastral Geodata for Crop-Type Enriched LULC Mapping Using FOSS and Open Data

: Accurate crop-type maps are urgently needed as input data for various applications, leading to improved planning and more sustainable use of resources. Satellite remote sensing is the optimal tool to provide such data. Images from Synthetic Aperture Radar (SAR) satellite sensors are preferably used as they work regardless of cloud coverage during image acquisition. However, processing of SAR is more complicated and the sensors have development potential. Dealing with such a complexity, current studies should aim to be reproducible, open, and built upon free and open-source software (FOSS). Thereby, the data can be reused to develop and validate new algorithms or improve the ones already in use. This paper presents a case study of crop classiﬁcation from microwave remote sensing, relying on open data and open software only. We used 70 multitemporal microwave remote sensing images from the Sentinel-1 satellite. A high-resolution, high-precision digital elevation model (DEM) assisted the preprocessing. The multi-data approach (MDA) was used as a framework enabling to demonstrate the beneﬁts of including external cadastral data. It was used to identify the agricultural area prior to the classiﬁcation and to create land use/land cover (LULC) maps which also include the annually changing crop types that are usually missing in ofﬁcial geodata. All the software used in this study is open-source, such as the Sentinel Application Toolbox (SNAP), Orfeo Toolbox, R, and QGIS. The produced geodata, all input data, and several intermediate data are openly shared in a research database. Validation using an independent validation dataset showed a high overall accuracy of 96.7% with differentiation into 11 different crop-classes.


Introduction
Global food insecurity is on the rise again [1]. Current and future challenges evolve from a growing world population with an increasing nutrition demand under climate change conditions [2]. Therefore, ref. [3] demand a higher crop yield from agricultural production. To achieve this efficiency increase, the decision makers in this domain can use information from agricultural monitoring systems based on satellite remote sensing data [4].
However, ref. [5] identified crop-type maps as one missing yet essential part of the current global systems. In addition, spatial crop-type data are critical for modeling matter fluxes in soil-vegetation-atmosphere systems [6]. While on a local scale, crop-type information is needed and available for agricultural management decisions (e.g., [7]), on regional, national, or continental scales, such crop-type data are missing [8], especially on an annual basis. This data gap lowers the capabilities of agroecosystem models [9] and results in less information about the current state of the agricultural production for decision makers.
crop types were produced ( Figure 1) and have been available via the TR32 project database (TR32DB) since 2008 [31].  [14] with optical satellite data and external data. Screenshot from the online available WebGIS of the TR32 project database (TR32DB).
The present study used open remote sensing data from the SAR satellite Sentinel-1, and external data from open.NRW to perform a LULC crop-type classification. We designed the whole workflow using FOSS to follow the demands of TOP. The used data models, all input, and the output data, including the labeled reference data set from our mapping campaign, are shared openly in a scientific data repository, the TR32DB. This combination allows others to access, use, change, evaluate, reproduce, and even refine or improve the present study's outcomes.

Rur Catchment
This study was performed within the collaborative research project TR32. The project has a defined study area situated at the German borders with Belgium and the Netherlands (compare Figure 1). For the present study, only those parts that lie within the German borders were considered. The extent of the area is about 2500 km 2 . The area is characterized by fertile loess soils and humid, temperate climate. It is intensively used for agricultural production. Ref. [14] describe the study area in detail.

Sentinel-1 Open SAR Data
The positive effects of open data have been seen by the remote sensing community, as the opening of the optical Landsat archive in 2008 by the United States Geological Survey (USGS) Landsat [32] had a positive impact on how satellites images are used for scientific purposes [33]. Consequently, the prominent statement from [34] was to "make earth observations open access." The European Space Agency (ESA) followed the USGS example by distributing all satellite observations of their current satellite program Copernicus Sentinel as open data [35]. Hence, the Sentinel-1 radar satellite, which was used in the present study, is the first operational radar satellite, with an open data policy.
The two Sentinel-1 SAR satellites work in a constellation to provide a repeat cycle of six days for the same imaging properties. The revisit time for different image properties is shorter and varies depending on the geographic location. Over land, the satellites monitor continuously with a spatial resolution of 5 m × 20 m [36].
For this study, we acquired 70 Sentinel-1 images, for the growing period of 2017, between January and August. As can be seen in Table 1, the images are two time series from the relative orbits 88 and 37. Only the images covering the entire AOI of approx 2500 km 2 were considered. Table 2 shows the individual acquisition dates. Notably, the two chosen relative orbits from the two satellites offer at least one image acquisition per week. Even more images would be available that only partly cover the AOI.
The Sentinel-1 SAR images were downloaded from ESA's Scihub in prepossessed ground range detected (GRD) form. The advantages of this server-side preprocessing are the smaller download sizes and reduced speckle. The disadvantages are a decreased spatial resolution and loss of the phase information, which are used for SAR interferometry and polarimetry [36]. As the Sentinel-1 images are provided with high geometric accuracy [37] a multitemporal image classification is possible without further coregistration. All used Sentinel-1 scenes can be downloaded from the TR32DB [38][39][40][41].  Table 2. Sentinel-1A (S1a) and Sentinel-1b (S1b) acquisitions of the study period, each acquisition covers the whole AOI. As can be seen, there is at least one acquisition for each week. rel. orbit 39 S1a S1b 88 S1a S1b

Crop Distribution Mapping of 2017
Over 1200 agricultural fields were visited and mapped in a ground survey campaign [42]. After transferring the mapping results to the geographical information system (GIS), the areas were checked for plausibility using the remote sensing Sentinel-1 datasets described above. To exclude the field edges from analysis, an inner buffer of 20 m to all mapped fields was applied. Furthermore, only the fields within the AOI were used. Detailed information on the area statistics of the final 775 fields that were used for the present study can be found in Table 3. In addition to the typical crops of the region such as maize, sugar beet, rapeseed, potato, wheat, and pasture, we found 19 pea and eight carrot fields. Consequently, we additionally included those crops in our classification scheme.
The ground data were divided into independent training and validation fields. To equally split the pixels as well as the number of fields into training and validation, the fields were first sorted by crop type and field size. Then they were alternately assigned to either validation or training, starting with the tallest field to validation. As can be seen in Table 3, this procedure results in a homogeneous composition of training and validation with only slightly higher area statistics for validation. All data from the ground campaign [42] and the pre-processed independent training and validation datasets [43] are distributed under an open data policy via the TR32DB.

Authorative Official Data from Open.NRW
For preprocessing of remote sensing data, and SAR data in particular, using a digital elevation model (DEM) is advised [12]. In this study, we used the high-resolution, high-precision, openly available elevation data from open.NRW. The DEM is produced from LIDAR data with a point density of at least 4 points per m 2 and updated every six years. The final spatial resolution of 1 m has an absolute height error of less than 40 cm in most areas [44]. The newest version of the DEM can be found online [44], and a preprocessed version over the AOI of the DEM can be acquired via the TR32DB [45]. For compatibility reasons with the radar processing software Sentinel Application Platform ( SNAP) the DEM was projected to WGS84 and the spatial resolution reduced to 5 m [46].
For the delineation of the arable land, we exploited the real-estate register Amtliches Liegenschaftskataster-Informationssystem (ALKIS), which is freely available for the state Northrhinewestfalia (NRW) from Open.NRW [47]. ALKIS contains, in addition to other information, the usage of each of the 9 million property parcels in NRW. To identify the area of annually changing crops the agricultural parcels with the attribute "arable land" were selected. Based on the selection a crop/non-crop mask was calculated [48].

Preprocessing of the Sentinel-1 Radar Data Using the SNAP Toolbox
The preprocessed GRD images were individually processed using the SNAP Toolbox [49]. The following tools were executed on each acquisition: 1. As a first step, a subset of the images was calculated by cropping the images to the extent of the AOI. 2. To enhance the geometric accuracy, the precise orbit files were auto-downloaded from the ESA server and applied to the images. The precise orbit files are calculated within two weeks after the image acquisition and significantly enhance the geometric accuracy of the Sentinel-1 images. 3. Next, the images were calibrated to beta0, which is the measured radar brightness [50], and a prerequisite for the next step. 4. The highly accurate DEM from Open.NRW was used to perform a Radiometric Terrain Correction to gamma0. Thereby, based on the DEM, the terrain-induced radiometric effects are eliminated, and the signal is normalized for the local illuminated area [50].
5. All SAR images inherit a salt-and-pepper-like noise [51]. A Gamma Map Speckle Filter with a 3 × 3 moving window was applied to reduce it. 6. To project the images from slant range to ground range, a Range Doppler Terrain Correction was performed using the DEM from Open.NRW. [51]. Notably, a higher accuracy of the DEM translates into a higher horizontal accuracy of the projected image. The resampling of the preprocessed DEM to the image system was performed using Bicubic Interpolation. The calculation of the new image pixels in the final grid was done using nearest-neighbour resampling to avoid unnecessary mixing with neighbouring pixels. The final pixel spacing was set to 10 m and the reference system is UTM 32 N with WGS 84 as the reference ellipsoid. 7. For better data handling, conversion of the raster values from linear to a decibel (dB) scale backscatter coefficient was applied. 8. To reduce the amount of disc space being used for the images and to accelerate classification, the pixel-depth was reduced to unsigned integer with a linear scaling using slope and intercept of the histogram.
The graph to apply those steps in the SNAP software [52], and the final stacked image composite [53] can be downloaded via the TR32DB.

Supervised Random Forest Classification
The 70 individual Sentinel-1 images were stacked and a supervised pixel-based classification was performed using the independent training data from the mapping campaign. The Random Forest (RF) algorithm was used as the classifier, as it had already proved beneficial in other SAR-based crop classification scenarios [15,20,21]. The advantages of the RF classifier are its capabilities to handle high dimensional data and the ability to work without normally distributed data. While there are more advanced algorithms, such as the one developed by [22], previous studies have found the RF classifier to be highly accurate [54].
In the implementation of the RF classifier, 2000 pixel samples were randomly selected per class from all training fields. Next, those samples were randomly split for training and validating each tree. Two-thirds of the samples were used for training and one third for validation. The tuning parameters of the RF classifier left unchanged to the defaults of the R-package. This means that 500 trees with an unlimited node size built, and the variables tried at each split are set to the number of classes (eleven).
Validation of the gained classifications was conducted using the fields from the mapping campaign that had not been used for training the classifier. The resultant error matrix is the basis for the class specific accuracy measures, user's accuracy, producer's accuracy [55], and F1-Score [22]. For assessing the general accuracy of the classification, the overall accuracy [55] was calculated.

Real Estate Cadastre and Post-Classification Filtering
As can be seen in Figure 2, the raw ALKIS cadastre data downloaded from Open.NRW, were imported into a PostGIS database using the NorGIS software. Thereby, all different thematic geodata contained in the cadastre is available in QGIS. A selection of all agriculture parcels having agricultural as usage on the ALKIS cadastre data made it possible to acquire a crop/non-crop mask.
Only after that step is a post-classification filter reasonable. Otherwise, non-crop pixels would be considered in the filtering process, possibly degrading the classification quality. We used a majority filter with a circular (Von Neumann) neighbourhood, setting the center pixel to the majority value of the pixel values within the neighbourhood [56]. The filtering was conducted twice: the first one with a neighbourhood radius of three pixels, the second one with two pixels.

Open Source Software Used in this Study
One of the principles of the present study was to rely solely on Open Source Software. Preprocessing of the radar images was conducted using (SNAP) [49], which is developed by ESA and therefore, particularly suited to process ESA sensors, such as the Sentinel-1 used in this study. The actual multitemporal random forest classification was performed in R [57] (Version 3.4.3) using a freely available R-script [57] from [58] that uses the following R-packages: randomforest [59], Geospatial Data Abstraction Library (GDAL) [60], Raster [61], Maptools [62], and SP [63]. For postprocessing including the Error Matrix generation and the post classification filter, we used the Orfeo Toolbox [64], which provides a number of state-of-the-art remote sensing tools and has an active community. Map-making, integration of the ALKIS, cropping of the raster data, and preprocessing of the crop distribution maps was conducted in QGIS [65], one of the leading open-source GIS. The ALKIS data were imported to a PostGIS [66] geospatial database, which is based on PostgreSQL [67], using the free software ALKISimport [68]. The preprocessing of the DEM was achieved with GDAL [69].

Results
Using the proposed approach made it possible to classify 11 different crops with an accuracy of around 95%. The final crop classification map is presented in Figure 3. It covers the entire 2500 km 2 of the AOI at a spatial resolution of 10 m. It is available for download in the TR32DB [70].
As can be seen in Tables 4 and 5, the accuracy of all crop classes was in the acceptable accuracy range, as all user and producer accuracy measures were beyond 80%, with one exception: −72% producer accuracy of the class potato, which was mixed up with sugar beet.  Table 5. Accuracy measures of the proposed classification, shown in Figure 3.  Integrating the external ALKIS data allowed crop areas to be focused on, as all non-crop areas were masked out. Thereby, applying the two times majority filter became feasible, which resulted in a 1.7% accuracy gain (Overall Accuracy: 96.69%). A map of the final classification is shown in Figure 3. Although 1.7% might not seem impressive, the advantages from this procedure go beyond the pure number. Most importantly, pixels values classified as a crop type and not within the feature class "agricultural land" of ALKIS are deleted, and the correct ALKIS land use class is assigned. Consequently, no agricultural land use is present in the final LULC map. However, if a map including the attributes of the ALKIS together with the crop type is needed, the creation of enhanced LULC maps, as demonstrated in Figure 4, is feasible.   To follow the principles of TOP, the workflow of the current study was designed and implemented with FOSS 2. All of the necessary steps to perform the final crop classification could be successfully conducted in the following software environments:

•
Transferring the ALKIS into a PostGIS database was performed with ALKISimport [68].

Discussion
This paper presents an open-data and open-source remote sensing workflow to derive crop type for a region in west Germany, the area of the Rur Catchment. The all-weather capability of the used SAR sensor Sentinel-1 makes the results independent of the cloud coverage that is typical for the study region [14]. External data in the form of a height model and cadastre data [30] assisted the classification process. The final classification of 11 different crops shows a high accuracy of approx. 97% overall accuracy on a spatial resolution of 10 m.
A comparison with the LULC analysis based on optical data, shown in Figure 1, revealed merely 56% agreement of the two classifications within the agricultural area. As that dataset is available for download project internally [71], the differences could be further analyzed: • 11% of the differences originate from incomplete disaggregation to the crop level in the optical classification. Merely superior classes such as agricultural field, or summer crop are given in the optical classification. • Another 10% stems from the class rye, which is dissolved in the winter wheat class in the optical classification and correctly differentiated in the classification of this study. • About 9% difference is due to roads and tracks that are modeled into the optical MDA classifications [14]. It is debatable whether that area is representative of the fields in the study area.
In addition to those shortcomings of the optical classification, the error matrix, shown in Table 6, reveals more confusion than the one from the current study shown in Table 4. Consequently, the overall accuracy is about 5% lower than that of the present study, although fewer classes were considered. Finally, the spatial resolution is increased from 15 m to 10 m, providing more details of the crop distribution. In summary, a superiority of the present study's classification can be inferred in almost all aspects. Table 6. Error Matrix of the multi data approach (MDA) land use / land cover (LULC) classification with optical data shown in Figure 1 [71]. The classification was performed using the MDA described by [14], Overall Accuracy: 91.44%.

Validation (Ground Data)
User Another comparison was performed with the results of a recent study by [22]. He also used multitemporal Sentinel-1 images to distinguish similar crop types in another study region also situated in Germany. In general, the results of this study are consistent with the study by [22], who concluded that dense time series of SAR images provide a high crop-separation potential. The final crop classifications are not publicly available. Hence, the comparison had to be conducted with the accuracy numbers given in the publication. Table 7 shows the direct comparison of the user and producer accuracies of both studies. The accuaries from [22] are taken from his most sophisticated crop classification, which uses information about the crop's phenology. As can be seen, compared to the present study, there is a consistency on the high accuracy of pasture, maize, sugar beet, and wheat. Both studies revealed challenges to correctly classify potatoes, which is probably due to the alignment of the potato hills and various phenology due to varying planting dates [21]. The classes rye, and especially spring barley, was significantly better classified in the present study. This confusion could stem from fewer mapped fields and fewer Sentinel-1 images in the study by [22]. Table 7. Comparison of the Producer's Accuracy (PA) and User's Accuracy (UA) (in %) of crop classes of the study carried out by [22] and of the present study. Unsatisfactory results below 80% are marked in red ( 80%-70% , 70%-60% , below 60% ). The classes Oat, Pea, and Carrot, appeared merely in one of the classifications and were left out. Although the current study's results show less confusion, the algorithm of [22] seems more sophisticated, as it includes crop phenology information. However, it is not possible to compare the algorithms, the input data, or the obtained results as neither the source code nor the data is publicly shared.

Bargiel
That last aspect highlights the innovation of this study, which lies in the unique implementation of the workflow: All datasets used in the process, provided by ESA and open.NRW, are distributed as open data by the data providers, as well as in the TR32DB. Also, the ground reference of the study, about 1200 labeled agricultural fields, is shared. Furthermore, since the whole workflow is designed with FOSS, there are no additional costs for software and the source code is open. The combination of open data and FOSS allows reproducibility of the study, which enables other scientists to build upon this study's results and evaluate their approaches with our data.
Next, crop-type classifications on larger scales are to be pursued and can be integrated into global agricultural systems [5]. In doing so, such systems can provide better outputs to enable the principles of agricultural intensification to be following, resulting in lower environmental impacts and higher food security.
However, upscaling the approach brings additional challenges. One is the availability and quality of external data. Geodata is often not available in such high precision as the geodata provided by open.NRW. For DEMs, that problem could be solved by relying on global data sets, such as the TanDEM-X derived DEM [72], which has recently been made freely available for scientific purposes in 90 m resolution. However, releasing the full resolution as open data would be favorable.
In the case of including external cadastre data into the classification process [30] or for post-classification fusion (compare Figure 4) a high spatial accuracy cannot be anticipated in many areas of the world. In such cases, ref. [73] present a smart way to improve the accuracy of external data, using a composite of multitemporal TerraSAR-X images as a spatial reference. As Sentinel-1 has a similarly high spatial accuracy [37], the approach could be adapted to areas where merely external geodata of lower spatial accuracy is available.
As shown above, the workflow's implementation was performed in six different FOSS environments. Each environment has its characteristics, which involves a highdemand of technical abilities necessary to execute the whole workflow. One way of coping with that issue is to create comprehensive documentation, user forums, and user mailing lists. It would also be possible to develop new software based on the environments used or to extend existing environments to meet the requirements of SAR-based crop classification in one environment.

Conclusions
This study demonstrates the feasibility of multitemporal microwave c-band SAR data from Sentinel-1 to distinguish crop types in our study site in western Germany. The final classification was evaluated with high accuracy, which was reached through the innovative integration of publicly available open data from Open.NRW. One of them was the high-resolution and high-precision DEM, which assisted the SAR preprocessing. The other one was the spatially highly accurate real estate register enabling to exclude the non-and special crop areas using the MDA. To overcome the problem of limited radar applications due to the complexity of radar data, all data used and produced in this study are openly available in the TR32DB. Additionally, the processing was done solely with FOSS. Consequently, all the results are reproducible without any additional data or software costs. Hence, the current study makes a substantial contribution to science in the context of microwave-based crop classification.