2.1. Cloud Infrastructure
We deployed our Sentinel-1 datacube on a collaborative, multi-owner infrastructure (
Figure 1) managed by the Earth Observation Data Centre for Water Resources Monitoring (EODC). The EODC is an organisation that was founded to foster the cooperation between public and private partners in order to develop the scientific and technical capabilities needed to take full advantage of the wealth of Earth observation data brought by Copernicus and other space programmes [
10]. Since its foundation in 2014, the EODC and its partners have built a federated IT infrastructure capable of hosting, accessing, and processing worldwide satellite data collections. Besides Sentinel-1, the EODC also hosts collections of Sentinel-2, Sentinel-3, and climate data.
The central component of the infrastructure is a petabyte-scale storage system that holds all public and private data. It is a two-tiered storage system with currently about 10 PB hard disk drive (HDD) storage and about 10 PB tape storage. The data are backed up on a second tape library situated in a separate building. Depending on the use case, the data can be accessed with the same file logic from three computing environments: a cloud platform for data exploration and scientific analysis, a supercomputing facility for large scale processing, and a cluster dedicated to operational near-real-time (NRT) processing. Each of these computing environments plays a role in the creation, maintenance, and use of the Sentinel-1 backscatter datacube:
2.2. Sentinel-1 Data
Sentinel-1 belongs to the space component of Copernicus, the European Union’s Earth Observation Programme, and primarily serves environmental monitoring applications. The mission has been developed and is being operated by the European Space Agency (ESA). To meet the needs of operational users, Sentinel-1 acquires C-band (5.4 GHz) SAR imagery in a systematic fashion, whereas all data are sequentially processed and distributed within 24 h [
1]. Each of the (currently) two satellites is flown in a near-polar sun-synchronous orbit with a 12-day repeat cycle and local crossing times at ∼6 a.m. (descending orbit) and ∼6 p.m. (ascending orbit). Over land, the Sentinel-1 sensors are per default operated in Interferometric Wide-swath (IW) mode that covers a 250 km wide swath at two polarisations (usually VV and VH) with a spatial resolution of 20 × 5 m
2. The IW images are distributed as either Single Look Complex (SLC) data (with typical file sizes of 8 GB/product) for SAR interferometric applications or Ground Range Detected (GRD) images (∼1 GB/product) for backscatter intensity applications. Both SLC and GRD data are useful inputs to create Sentinel-1 datacubes containing e.g., backscatter intensity and interferometric coherence data [
14]. Yet, given the much greater processing and storage demand of the SLC data, we have so far used only IW GRD data.
Sentinel-1 IW GRD data can be obtained from several Copernicus data hubs, including the Copernicus Open Access Hub that serves the general user community and hubs dedicated to the Copernicus services [
15]. These hubs provide access to recent Sentinel-1 data through their rolling archives (12 months in case of the Copernicus Open Access Hub), but do not allow the download of older data. Therefore, if a user, for instance, would like to access Sentinel-1 data from the complete mission record, he or she needs to resort to cloud platform services that host the required data. Worldwide Sentinel-1 data archives are available from, for instance, the Copernicus Data and Information Access Service (DIAS) cloud platforms, GEE, and Amazon Web Services. As we need the historic data for reprocessing activities, and repeated transfer of such huge data volumes over the internet is not feasible, we also keep a worldwide Sentinel-1 IW GDR data record on the EODC storage, which users can access along with the backscatter datacube.
2.3. Data Preparation
Given that the Sentinel-1 IW GDR images are provided in swath geometry and are not referenced to a fixed Earth grid, directly comparing two or more Sentinel-1 images is not possible. Therefore, when one would like to work with time series, the data have to be firstly co-aligned in an Earth-fixed reference system. Given our requirement to be able to read and process multi-year Sentinel-1 backscatter time series fast and efficiently, our approach to creating the Sentinel-1 backscatter datacube system has been to preprocess the Sentinel-1 images and store the generated image tiles as GeoTIFF files. The upfront costs for creating such a preprocessed file-based datacube system are large, but there are important practical advantages for the users: (i) there are essentially no constraints on the type and complexity of algorithms to be applied on the Sentinel-1 datacube, and (ii) accessing the complete and consistent time series of Sentinel-1A and -1B acquisitions is fast, irrespective of working with e.g., individual data points or km2 large tiles.
The preprocessing workflow for creating the Sentinel-1 backscatter datacube from the Sentinel-1 IW GRD image collection was written in Python and makes use of the open-source Sentinel Application Platform (SNAP) toolbox [
16] (
Figure 2). For optimising performance, we integrated open libraries such as
gdal [
17] and
numba [
18] in our workflow. Its overall setup is similar to ARD workflows for creating SAR backscatter datacubes as described by [
14,
19], with the big exception that, so far, we have only computed the commonly used backscattering coefficient Sigma Nought (
) and not the terrain-flattened Gamma Nought coefficient (
) as introduced by Small [
20]. The latter representation of backscatter is superior to the former in mountainous regions because it accounts for terrain-related changes in the radar illumination area. This minimises radiometric terrain effects, which is beneficial to SAR applications particularly over undulating and rugged terrain [
21,
22]. Therefore, the Committee of Earth Observation Satellites (CEOS) has selected
over
in its ARD standard for normalised radar backscatter. However, the computation of
takes much longer than for
(in SNAP, a factor of 2–3 longer), which has so far prevented us to carry out this additional preprocessing step in our workflow.
Besides the Sentinel-1 IW GRD images, our preprocessing workflow requires as inputs Sentinel-1 orbit files and a Digital Elevation Model (DEM). The orbit files can be downloaded from the Copernicus Precise Orbit Determination (POD) service, whereas for the near-real-time updating of the datacube we use the so-called restituted orbits (RESORB) and in our reprocessing campaigns the precise orbit files (POEORB). Fortunately, already the restituted orbits are very accurate (RMS ∼10 cm), making it possible to mix data processed with these two types of orbit files in one datacube. As for the DEM, we have initially used the 90 m Shuttle Radar Topography Mission (SRTM) terrain model, which is called for by SNAP by default. In our latest reprocessing as reported here in this paper, we have used the 30 m Copernicus DEM that was released to the public in late 2020 [
23]. To use this new DEM in SNAP, we had to mosaic it into one global DEM file. Due to the DEM’s varying sampling along latitude, we first merged single files with the same pixel spacing to homogeneous latitude bands. After resampling each band to the highest sampling all bands were finally concatenated to one single file with global extent. Before using it as an external DEM in SNAP, we transformed its orthometric height values—which are given in the Earth Gravitational Model 2008 (EGM2008)—to ellipsoid heights.
Figure 2 shows the complete workflow for the processing and ingestion of one Sentinel-1 IW GRD scene. The first step performs a cutout of the Copernicus DEM to the relevant Sentinel-1 scene extent. Then, elevation, orbit, and backscatter data are fed into our own Sentinel-1 preprocessing chain embedding several operators of SNAP’s Graph Processing Framework (GPF) [
24]:
(a) Apply-Orbit-File (
orbit correction)
(b) ThermalNoiseRemoval (
thermal noise removal)
(c) Calibration (
radiometric calibration)
(d) Subset (e) Slice-Assembly (f) Terrain-Correction (
Range-Doppler terrain correction). In between, border noise effects, which are not removed fully by SNAP, are eliminated with the bidirectional all-samples approach (
border noise removal) described by Ali et al. [
25]. Such noise removal operations affect the border of the Sentinel-1 scene, which results in narrow data gaps (a few pixels wide) between adjacent scenes after geocoding. As an effective workaround, we add a buffer to the scene border by utilising SNAP’s
Subset and
Slice-Assembly operators with respect to the neighbouring scene in flight direction (
slice gap filling).
Finally, the geocoded images are projected with
gdalwarp’s bilinear resampling onto the Equi7Grid as introduced by Bauer-Marschallinger et al. [
26] and cut into tiles to create manageable image stacks (
Figure 3). The advantage of such a tiling system is that any pixel block or pixel location can be addressed by a simple equation that defines the name and location of the file, and the array-indices within [
7]. While all internal processing steps run on 10 m grids, the output of the workflow is twofold:
km
2 large backscatter images with 10 m sampling and
km
2 large images with 20 m sampling. The 20 m images are resampled representations of the 10 m images using
gdalwarp’s cubic spline resampling and feature significantly less noise and speckle.
The Equi7Grid was specifically designed to host large-scale land monitoring applications based on high-resolution satellite datasets [
26]. It is based on azimuthal equidistant projections for seven continental areas that avoid—other than undivided-global and continental-scale equal-area projections—large pixel deformations in the border regions. In this respect, the Equi7Grid was recently found to preserve the accuracy of geometric-analytical measures around the globe, being most beneficial for terrain analysis [
27]. The Equi7Grid’s geometric fidelity has another important consequence, as its oversampling over land is minimal (only 2% on average), which is a significant advantage when, for example, compared to global latitude-longitude grids (cf. 35% on average). In respect to the Universal Transverse Mercator (UTM) system that is used for Sentinel-2, the advantage of the Equi7Grid is the reduction from 62 zones to 7 continental areas, which eases the handling and processing of larger areas, and avoids significant duplication of data for images and products covering more than one UTM zone. Depending on the geographic location, the actual data duplication stemming from the overlaps reaches 30–50% for the Sentinel-2 Level-1C data shipped as UTM tiles [
28]. Moreover, considering that even a small country like Austria is covered by 3 UTM zones, a continental zoning approach instead eases day-to-day operations and reduces the processing overhead. The Equi7Grid and its tiling system are open-source and can be accessed via GitHub [
29].