A Sentinel-1 Backscatter Datacube for Global Land Monitoring Applications

: The Sentinel-1 Synthetic Aperture Radar (SAR) satellites allow global monitoring of the Earth’s land surface with unprecedented spatio-temporal coverage. Yet, implementing large-scale monitoring capabilities is a challenging task given the large volume of data from Sentinel-1 and the complex algorithms needed to convert the SAR intensity data into higher-level geophysical data products. While on-demand processing solutions have been proposed to cope with the petabyte-scale data volumes, in practice many applications require preprocessed datacubes that permit fast access to multi-year time series and image stacks. To serve near-real-time as well as ofﬂine land monitoring applications, we have created a Sentinel-1 backscatter datacube for all continents (except Antarctica) that is constantly being updated and maintained to ensure consistency and completeness of the data record over time. In this technical note, we present the technical speciﬁcations of the datacube, means of access and analysis capabilities, and its use in scientiﬁc and operational applications.


Introduction
Sentinel-1 is the first multi-satellite Synthetic Aperture Radar (SAR) mission that has been providing global coverage with up to 9 local observations per 12 day repeat cycle (depending on region) at high spatial resolution (20 m) [1]. Like the more recently launched Radarsat Constellation Mission [2], Sentinel-1 acquires backscatter imagery at C-band in multiple polarisations undisturbed by cloud cover and lightning conditions. The Sentinel-1 satellites achieve with their dedicated observation scenario an unprecedented ground coverage and allow capturing dynamic land surfaces changes and processes over large areas, without missing out on important events as was the case with past SAR missions. This is not only of utmost importance for practical applications, it is also key to realise better scientific algorithms that are trained using dense and long time series.
C-band backscatter as measured by Sentinel-1 is highly sensitive to the dielectric and geometric properties of the land surface [3]. In particular, changes in the distribution of water or its phase transitions may quickly modify the local dielectric properties, and consequently, the Sentinel-1 backscatter may exhibit a high variability in both space and time. This high natural variability is both a hurdle to and a chance for using Sentinel-1 data in land cover classification and biogeophysical retrievals. As exemplified by many recent Sentinel-1 studies (e.g., [4][5][6]), the most promising approach is to use dense and long backscatter time series as the basis for the scientific analysis and the development • The datacube represents a complete collection of Sentinel-1 data over land surfaces and covers all continents except Antarctica; • The system enables both offline analyses of multi-year time series and near-real-time image-based applications; • There should be maximum flexibility regarding the type of scientific algorithms to be deployed on the data; • It shall be accessible and usable for a large number of users with different backgrounds and interests; • Reprocessing of the complete petabyte-scale data collection must be possible to ensure that the data are consistent and comply with the latest processing standards.
This technical note is a revised and extended version of our contribution to the 2021 Conference on Big Data from Space [9], providing a more in-depth presentation of the technical specifications of our datacube and its scientific and operational use cases.

Cloud Infrastructure
We deployed our Sentinel-1 datacube on a collaborative, multi-owner infrastructure ( Figure 1) managed by the Earth Observation Data Centre for Water Resources Monitoring (EODC). The EODC is an organisation that was founded to foster the cooperation between public and private partners in order to develop the scientific and technical capabilities needed to take full advantage of the wealth of Earth observation data brought by Copernicus and other space programmes [10]. Since its foundation in 2014, the EODC and its partners have built a federated IT infrastructure capable of hosting, accessing, and processing worldwide satellite data collections. Besides Sentinel-1, the EODC also hosts collections of Sentinel-2, Sentinel-3, and climate data. The central component of the infrastructure is a petabyte-scale storage system that holds all public and private data. It is a two-tiered storage system with currently about 10 PB hard disk drive (HDD) storage and about 10 PB tape storage. The data are backed up on a second tape library situated in a separate building. Depending on the use case, the data can be accessed with the same file logic from three computing environments: a cloud platform for data exploration and scientific analysis, a supercomputing facility for large scale processing, and a cluster dedicated to operational near-real-time (NRT) processing. Each of these computing environments plays a role in the creation, maintenance, and use of the Sentinel-1 backscatter datacube: ). This environment is particularly suited for scientific analysis, code development, and testing. Larger processing jobs, involving e.g., the analysis of the entire Sentinel-1 period for a few tiles, are possible. Nonetheless, for very large processing activities, e.g., covering bigger countries or whole continents, moving to supercomputers may be necessary. • High-Performance Computing (HPC) [13]: Thanks to dedicated high-throughput I/O connections (InfiniBand and OmniPath), it is possible to process the Sentinel-1 data on one of the HPC-clusters of the Vienna Scientific Cluster (VSC) facility. Normally, two supercomputers are operational at the same time. So far, Sentinel-1 data processing has taken place using the oil-cooled VSC-3 cluster and its air-cooled VSC-3+ extension. At present, Sentinel-1 processing is being moved to the VSC-4, the current flagship that reaches a performance of 2.7 PFlop/s with its 790 water-cooled nodes. The EODC storage can be accessed from VSC in the same logic, but with less visualisation/development functions than on the cloud platform. The benefit is that processing of Sentinel-1 images at hundreds of compute nodes in parallel is possible. Nonetheless, tailoring of the processing routines to balance I/O, storage, and compute resources is usually required. • Operational Processing Cluster: This dedicated cluster serves operational near-real-time (NRT) applications and is used for fully automatic updating of the global datacube as soon as new Sentinel-1 images become available.

Sentinel-1 Data
Sentinel-1 belongs to the space component of Copernicus, the European Union's Earth Observation Programme, and primarily serves environmental monitoring applications. The mission has been developed and is being operated by the European Space Agency (ESA). To meet the needs of operational users, Sentinel-1 acquires C-band (5.4 GHz) SAR imagery in a systematic fashion, whereas all data are sequentially processed and distributed within 24 h [1]. Each of the (currently) two satellites is flown in a near-polar sun-synchronous orbit with a 12-day repeat cycle and local crossing times at ∼6 a.m. (descending orbit) and ∼6 p.m. (ascending orbit). Over land, the Sentinel-1 sensors are per default operated in Interferometric Wide-swath (IW) mode that covers a 250 km wide swath at two polarisations (usually VV and VH) with a spatial resolution of 20 × 5 m 2 . The IW images are distributed as either Single Look Complex (SLC) data (with typical file sizes of 8 GB/product) for SAR interferometric applications or Ground Range Detected (GRD) images (∼1 GB/product) for backscatter intensity applications. Both SLC and GRD data are useful inputs to create Sentinel-1 datacubes containing e.g., backscatter intensity and interferometric coherence data [14]. Yet, given the much greater processing and storage demand of the SLC data, we have so far used only IW GRD data.
Sentinel-1 IW GRD data can be obtained from several Copernicus data hubs, including the Copernicus Open Access Hub that serves the general user community and hubs dedicated to the Copernicus services [15]. These hubs provide access to recent Sentinel-1 data through their rolling archives (12 months in case of the Copernicus Open Access Hub), but do not allow the download of older data. Therefore, if a user, for instance, would like to access Sentinel-1 data from the complete mission record, he or she needs to resort to cloud platform services that host the required data. Worldwide Sentinel-1 data archives are available from, for instance, the Copernicus Data and Information Access Service (DIAS) cloud platforms, GEE, and Amazon Web Services. As we need the historic data for reprocessing activities, and repeated transfer of such huge data volumes over the internet is not feasible, we also keep a worldwide Sentinel-1 IW GDR data record on the EODC storage, which users can access along with the backscatter datacube.

Data Preparation
Given that the Sentinel-1 IW GDR images are provided in swath geometry and are not referenced to a fixed Earth grid, directly comparing two or more Sentinel-1 images is not possible. Therefore, when one would like to work with time series, the data have to be firstly co-aligned in an Earth-fixed reference system. Given our requirement to be able to read and process multi-year Sentinel-1 backscatter time series fast and efficiently, our approach to creating the Sentinel-1 backscatter datacube system has been to preprocess the Sentinel-1 images and store the generated image tiles as GeoTIFF files. The upfront costs for creating such a preprocessed file-based datacube system are large, but there are important practical advantages for the users: (i) there are essentially no constraints on the type and complexity of algorithms to be applied on the Sentinel-1 datacube, and (ii) accessing the complete and consistent time series of Sentinel-1A and -1B acquisitions is fast, irrespective of working with e.g., individual data points or 300 × 300 km 2 large tiles.
The preprocessing workflow for creating the Sentinel-1 backscatter datacube from the Sentinel-1 IW GRD image collection was written in Python and makes use of the open-source Sentinel Application Platform (SNAP) toolbox [16] (Figure 2). For optimising performance, we integrated open libraries such as gdal [17] and numba [18] in our workflow. Its overall setup is similar to ARD workflows for creating SAR backscatter datacubes as described by [14,19], with the big exception that, so far, we have only computed the commonly used backscattering coefficient Sigma Nought (σ • ) and not the terrain-flattened Gamma Nought coefficient (γ • rt f ) as introduced by Small [20]. The latter representation of backscatter is superior to the former in mountainous regions because it accounts for terrain-related changes in the radar illumination area. This minimises radiometric terrain effects, which is beneficial to SAR applications particularly over undulating and rugged terrain [21,22]. Therefore, the Committee of Earth Observation Satellites (CEOS) has selected γ • rt f over σ • in its ARD standard for normalised radar backscatter. However, the computation of γ • rt f takes much longer than for σ • (in SNAP, a factor of 2-3 longer), which has so far prevented us to carry out this additional preprocessing step in our workflow. Besides the Sentinel-1 IW GRD images, our preprocessing workflow requires as inputs Sentinel-1 orbit files and a Digital Elevation Model (DEM). The orbit files can be downloaded from the Copernicus Precise Orbit Determination (POD) service, whereas for the near-real-time updating of the datacube we use the so-called restituted orbits (RESORB) and in our reprocessing campaigns the precise orbit files (POEORB). Fortunately, already the restituted orbits are very accurate (RMS ∼10 cm), making it possible to mix data processed with these two types of orbit files in one datacube. As for the DEM, we have initially used the 90 m Shuttle Radar Topography Mission (SRTM) terrain model, which is called for by SNAP by default. In our latest reprocessing as reported here in this paper, we have used the 30 m Copernicus DEM that was released to the public in late 2020 [23]. To use this new DEM in SNAP, we had to mosaic it into one global DEM file. Due to the DEM's varying sampling along latitude, we first merged single files with the same pixel spacing to homogeneous latitude bands. After resampling each band to the highest sampling all bands were finally concatenated to one single file with global extent. Before using it as an external DEM in SNAP, we transformed its orthometric height values-which are given in the Earth Gravitational Model 2008 (EGM2008)-to ellipsoid heights. Figure 2 shows the complete workflow for the processing and ingestion of one Sentinel-1 IW GRD scene. The first step performs a cutout of the Copernicus DEM to the relevant Sentinel-1 scene extent. Then, elevation, orbit, and backscatter data are fed into our own Sentinel-1 preprocessing chain embedding several operators of SNAP's Graph Processing Framework (GPF) [24]: (a) Apply-Orbit-File (orbit correction) In between, border noise effects, which are not removed fully by SNAP, are eliminated with the bidirectional all-samples approach (border noise removal) described by Ali et al. [25]. Such noise removal operations affect the border of the Sentinel-1 scene, which results in narrow data gaps (a few pixels wide) between adjacent scenes after geocoding. As an effective workaround, we add a buffer to the scene border by utilising SNAP's Subset and Slice-Assembly operators with respect to the neighbouring scene in flight direction (slice gap filling).
Finally, the geocoded images are projected with gdalwarp's bilinear resampling onto the Equi7Grid as introduced by Bauer-Marschallinger et al. [26] and cut into tiles to create manageable image stacks ( Figure 3). The advantage of such a tiling system is that any pixel block or pixel location can be addressed by a simple equation that defines the name and location of the file, and the array-indices within [7]. While all internal processing steps run on 10 m grids, the output of the workflow is twofold: 100 × 100 km 2 large backscatter images with 10 m sampling and 300 × 300 km 2 large images with 20 m sampling. The 20 m images are resampled representations of the 10 m images using gdalwarp's cubic spline resampling and feature significantly less noise and speckle. The Equi7Grid was specifically designed to host large-scale land monitoring applications based on high-resolution satellite datasets [26]. It is based on azimuthal equidistant projections for seven continental areas that avoid-other than undivided-global and continental-scale equal-area projections-large pixel deformations in the border regions. In this respect, the Equi7Grid was recently found to preserve the accuracy of geometricanalytical measures around the globe, being most beneficial for terrain analysis [27]. The Equi7Grid's geometric fidelity has another important consequence, as its oversampling over land is minimal (only 2% on average), which is a significant advantage when, for example, compared to global latitude-longitude grids (cf. 35% on average). In respect to the Universal Transverse Mercator (UTM) system that is used for Sentinel-2, the advantage of the Equi7Grid is the reduction from 62 zones to 7 continental areas, which eases the handling and processing of larger areas, and avoids significant duplication of data for images and products covering more than one UTM zone. Depending on the geographic location, the actual data duplication stemming from the overlaps reaches 30-50% for the Sentinel-2 Level-1C data shipped as UTM tiles [28]. Moreover, considering that even a small country like Austria is covered by 3 UTM zones, a continental zoning approach instead eases day-to-day operations and reduces the processing overhead. The Equi7Grid and its tiling system are open-source and can be accessed via GitHub [29].

Production
The production of the Sentinel-1 backscatter datacube has so far been taking place on the VSC-3 and VSC-3+ clusters. We have carried out numerous computing experiments to find suitable setups for massive parallel processing that balance-amongst other technical aspects-file sizes, I/O, Random-Access Memory (RAM), and the number of processing cores. In our first experiments several years ago, we used SAR image collections acquired by the Sentinel-1 predecessor instrument, the Advanced Synthetic Aperture Radar (ASAR) flown on board ENVISAT. One challenge with the ASAR Global Monitoring (GM) and Wide Swath (WS) image collections turned out to be the large number of relatively small files, which could quickly lead to an overstretching of the read/write capacities of the system [30]. After packaging of the data, it became possible to reprocess the complete ASAR GM and WS mission archives in a matter of days. This experience has taught us to avoid small files for massive parallel processing on the EODC infrastructure, which is why we use relatively large tiles for the Sentinel-1 backscatter datacube, i.e., 10,000 × 10,000 pixels at 10 m sampling and 15,000 × 15,000 pixels at 20 m sampling.
In our first Sentinel-1 experiments on the VSC-3, the average processing time of Sentinel-1 IW GRD image collections was in the order of 2.5-3 s per MB [30], meaning that the average processing time of one IW GRD scene was about 45-50 min. With successive versions of our workflow and SNAP, this number improved steadily to ∼2.0 s/MB [31], with an increase to ∼2.3 s/MB for the latest release of our workflow, which performs resampling to 20 m and several quality checks in addition to previous versions. However, we could improve the throughput to ∼1.1 s/MB by processing two scenes in parallel at the VSC-3 compute nodes. A factor of three does not sound much, but one needs to consider that some improvements in our processing line, in particular the replacement of the 3 arc-seconds (90 m) SRTM with the 30 m Copernicus DEM, enhanced the output quality but counteracted the speed-up. In our most recent re-processing campaign, which took place from April to September 2021, we processed the complete Sentinel-1 IW GRD mission archive for the period January 2015 to June 2021, comprising more than 1,800,000 Sentinel-1 acquisitions with a total storage volume of about 1.4 PB. For the geocoding of the IW GRD images on a 10 m grid, we first used SNAP 7.0 and later SNAP 8.0 (after checking for identical backscatter results).
Updating this latest version of the datacube is now done fully automatic using EODC's operational processing cluster and a so-called "Hubwatcher". The latter tool monitors and cross-checks different data resources and hubs, most importantly the Copernicus Services Data Hub [32] and the Collaborative Data Hub [33]. New Sentinel-1 images are fetched as soon as they become available on one of the Copernicus hubs. The average time for fetching the latest IW GDR images is about 2 h, and processing (including potential queuing times) about 1 h. Thus, all in all, newly released Sentinel-1 images are added to our datacube within a time frame of only 3 h.

Technical Specifications
Our Sentinel-1 data preparation workflow outputs are tiled Sentinel-1 backscatter images stored as GeoTIFF files with a 10 m and 20 m pixel sampling. For our global datacube, we only keep the 20 m images, while the 10 m images are used for creating monthly composites and are then deleted except if needed for specific applications such as regional datacubes. This setup with a base pixel sampling of 20 m reduces costs (less disk space) and makes working with the data easier (shorter processing times, less stringent IT requirements as regards I/O, RAM, etc.). A further benefit is that the radiometric quality of the 20 m images is better than of their 10 m counterparts due to reduced speckle and noise. Nonetheless, the size of the worldwide 20 m datacube is still significant (Table 1), with more than 300 TB for the 2015-2020 period and an increasing yearly volume (in particular since 2017, when both Sentinel-1A and Sentinel-1B have been in operation). The image stacks defining our datacube system are subject to the hierarchical structure of the Equi7Grid, which orders them by continent and tile name. Each Equi7Grid tile folder contains the final data files at the lowest level. Each filename follows a naming convention enabling an intuitive interaction on the file system and with GIS software. This naming convention prescribes the specification of several spatiotemporal, semantic, and traceability attributes of our specific (and also generic) EO datasets: (a) variable name (b) start timestamp (c) stop timestamp (d) band/polarisation (e) orbit pass direction and relative orbit number (f) tile name (g) continent and pixel spacing (h) data version (i) sensor/product name identifier. A specific example is given below:

SIG0_20190113T051705__VH_D095_E048N015T3_EU020M_V1M1R1_S1BIWGRDH.tif
For the Sentinel-1 datacube, the physical unit of the pixel values is the backscattering coefficient σ • (SIG0) in dB, which we encoded through scaling the values by factor 10 and then converting to Int16. This encoding has proven to be a good trade-off between disk usage and radiometric accuracy. The timestamp attached to an image relates to the exact start time of the corresponding Sentinel-1 data acquisition. A stop timestamp is only provided for temporally aggregated data products, e.g., for our monthly aggregated ("MAG") mean images, as listed below:

SIG0-MAG-MEAN_20200201T010000_20200301T010000_VH__E048N015T1_EU010M_V1M1R1_S1IWGRDH.tif
Similar to the Level-1 GRD input data, we do not group different polarisations in one file and keep them separated as single-band GeoTIFF files. For assuring traceability and consistency, a data versioning system is a prerequisite. The workflow described in Section 2.3 is tagged with a certain software version, which is incorporated into the data version, where "V" stands for a major-, and "M" for a minor-version. "R" is the run number, which is an increment steered by the data producer in case the software stays the same, but the input data (e.g., DEM) or the output data encoding (e.g., data compression) changes. The tail of the file name is a placeholder for the name of the sensor, the acquisition mode, and the input product type.

Access
Access to our Sentinel-1 datacube is designed to be available at many levels and can be chosen depending on a user's needs. For access to the actual data, EODC offers connections via its OpenStack cloud platform, and via the VSC (cf. Section 2.1). Both platforms reveal the datacube in its puristic form on the file system (cf. Section 3.2) and do not hide anything from the user. Compared to complex and abstract data structures, like those utilised in GEE for instance, this approach yields full flexibility, especially when analysing and processing data with standard open-source software (Python, QGIS, etc.). Currently in development, the primary starting point for interacting on a meta-level with the global datacube will be a catalogue service for the web (CSW) application programming interface (API) [34], which will enable a performant (pre-)filtering of a huge amount of files.
To facilitate the efficient data tiling and ordering of our Sentinel-1 datacube, we have been developing a dedicated open-source software called yeoda [35]. yeoda stands for your earth observation data access and has been designed to support straightforward and generic access to manifold EO datasets available as GeoTIFF or NetCDF files. In addition, it provides lower and higher-level datacube classes allowing to filter, split, and load data independently from the way they are structured on the hard disk.
In respect to other datacube architectures, e.g., Open Data Cube [36,37], yeoda does not rely on a database containing stringently defined datasets in the background. Instead, it stays closer to the data by interpreting the file names and metadata of each file. The great advantage is that users can derive higher-level products from the basic σ • backscatter data and can immediately access it with yeoda, since it does not need to be ingested in a database before. Figure 4 shows a hands-on example tailored to our Sentinel-1 datacube providing a brief glimpse on what one can do with yeoda. First, one needs to prepare the input data, i.e., a list of file paths, which either can be retrieved by crawling through the file system or by making use of the upcoming CSW API. The file naming scheme presented in Section 3.2 is implemented in the YeodaFilename class, which decodes each part of the file name and assigns a certain identifier. A selection of these identifiers (dimensions), the file paths, and the file naming class serve as input to the subsequent initialisation of the SIG0DataCube class, which provides the necessary wrapper functions to properly en-and decode σ • backscatter data. The instance generated this way is then used to show the file-based content of the datacube, to select data only from the year 2020, and to create two datacubes for both polarisation VV and VH. Finally, we load the decoded σ • backscatter time series for a certain location and visualise it. [1]: import os, osr, glob import matplotlib.pyplot as plt from datetime import datetime # import TUW packages from yeoda.products.preprocessed import SIG0DataCube from geopathfinder.naming_conventions.yeoda_naming import YeodaFilename [3]: # full directory path to a tile folder, where all files are contained tile_dirpath = r"/eodc/products/eodc.eu/S1_CSAR_IWGRDH/SIG0/V1M1R1/EQUI7_EU020M/E048N015T3" # our dimensions of interest in compliance with the file naming convention dimensions = ['time', 'band', 'extra_field', 'sensor_field'] [4]: # collecting all GeoTIFF images filepaths = glob.glob(os.path.join(tile_dirpath, "*.tif")) [  This small use-case exemplifies the straightforward and user-friendly access to our Sentinel-1 datacube using yeoda, but naturally does not cover all of its functionality. For more details and examples, we would like to point the interested reader to yeoda's documentation [38]. Future releases will contain several additional features: • datacube instances may persist in memory for consecutive access • it will be possible to jointly load data beyond the tile boundaries of the Equi7Grid • more efficient data management in the background • coupling of datacubes to a database for more performant queries, mimicking Open Data Cube's software architecture

Applications
Our Sentinel-1 datacube system has already served many scientific investigations covering topics such as rice mapping [39], vegetation monitoring [40], soil moisture retrieval [41], forest type classification [6], and building height estimation [42]. For an ESA funded study, we used the first version of our worldwide Sentinel-1 datacube system to perform global-scale data aggregation and image mosaicking. The goal was to create a normalised 10 m global backscatter model in support of the design, testing, and verification of future C-band radar missions (Sentinel-1C/D, Sentinel-1 Next Generation, Harmony, etc.), related SAR-processor performance simulations, raw data compression optimisation, and for visualisation purposes [43]. Furthermore, it has been the basis for generating the Sentinel-1 soil moisture data products [41] provided by the Copernicus Global Land Service (CGLS) [44] and the upcoming Sentinel-1 Global Flood Monitoring (GFM) service to be provided by the Copernicus Emergency Management Service [45].
A key feature of our Sentinel-1 datacube system is that it enables offline training of complex retrieval algorithms and usage of calibrated models for NRT processing. This is illustrated by Figure 5 which shows the potential layout of operational processing lines, which transform incoming Sentinel-1 IW GRD scenes (Level-1) into biogeophysical datasets (Level-2) such as soil moisture, snow depth, vegetation optical depth, or flood extent. All these Level-2 datasets have in common that their retrieval is normally impossible when just based on single Sentinel-1 images. Instead, state-of-the-art biogeophysical retrieval algorithms for Sentinel-1 are based on some form of change detection algorithm or require the training of physical models or machine learning methods on the basis of Sentinel-1 backscatter time series [46][47][48][49]. Therefore, in their assessment of the system requirements for a fully automatic flood monitoring system based on Sentinel-1, an expert group organised by the Joint Research Centre of the European Commission recommended using a datacube architecture for the implementation of such a potential monitoring system [50]. While such an architecture is demanding in terms of storage and compute capacities, it allows training offline advanced change detection or machine-learning methods and using them directly thereafter for seamless online processing ( Figure 5). Such offline and online workflows are both important for achieving high accuracy, transferability, and data quality characterisation of the Level-2 data products [51].

Discussion
Our solution for a Sentinel-1 backscatter datacube system has been strongly shaped by our scientific interest in improving the understanding of the interaction of the Sentinel-1 co-polarised (VV and VH) pulses with the land surface. We aim to develop and operate scientific algorithms (change detection, radiative transfer models, neural networks, etc.) that are fit for retrieving biogeophysical variables on continental to global scales. It is important to us to directly work on the data with the algorithms of our choice, and retain a complete understanding of how the Sentinel-1 data are preprocessed or otherwise manipulated.
Our file-based datacube system meets these requirements. Nonetheless, to reach a larger and more diverse user community, additional mechanisms to access and work with the Sentinel-1 datacube are needed. There are many routes to achieve this, including the use of advanced datacube software such as the Open Data Cube or rasdaman [52,53], or on a more basic level, through APIs. While we have successfully deployed the Open Data Cube in combination with JupyterHub [54] and GeoServer [55] for serving Sentinel-1 applications over Austria (Austrian Data Cube) [51,56], we have focused on the use of the openEO API [57] as an additional access mechanism to our worldwide datacube. This API standardises contracts between local clients (R, Python, and JavaScript) and cloud service providers regarding data access and processing, mimicking the functionalities of a virtual EO raster datacube independent of the providers' data storage system. In other words, the openEO API grants users simplified access to the datacube, using the local clients' Python, R, or JavaScript, hiding the complexity of accessing and processing the Sentinel-1 data in the cloud.
One of the first openEO use cases that we tested was the compositing of Sentinel-1 backscatter images extracted from GEE and our 10 m Sentinel-1 datacube for Austria. Despite the very different nature of the two back-ends, openEO allowed us to use identical code at both back-ends for generating Sentinel-1 RGB (red-green-blue) composite images. As illustrated in Figure 6, thematically, they provide visually intuitive feedback on the local land-use and vegetation status, depending on the ratio between the two SAR acquisition channels for VV and VH polarisation. The two individually extracted RGB composites are almost identical, with the only notable discrepancies appearing on closer inspection at the hills in the west of Vienna, stemming most likely from different DEMs used during Sentinel-1 preprocessing. GEE's Sentinel-1 backscatter datacube used the worldwide 90 m SRTM DEM, whereas for the Austrian Data Cube available in the EODC back-end we could profit from a 10 m sampled DEM based on a national airborne laser scanning campaign. In general, GEE follows the strategy to store the ingested satellite data in their original format in order to preserve their original information [8]. Re-projected or otherwise transformed data products are normally calculated only on-demand without storing intermediate or final data layers. As described by Gorelick [8], when satellite images are ingested into GEE, the "... images are cut into tiles in the image's original projection and resolution and stored in an efficient and replicated tile database. A tile size of 256 × 256 was chosen as a practical trade-off between loading unneeded data vs. the overhead of issuing additional reads. In contrast to conventional "datacube" systems, this data ingestion process is information-preserving: the data are always maintained in their original projection, resolution and bit depth, avoiding the data degradation that would be inherent in resampling all data to a fixed grid that may or may not be appropriate for any particular application".
However, other than we what assumed in our BiDS contribution [9], for Sentinel-1 GEE deviates from this strategy by firstly preprocessing and only then tiling the Sentinel-1 images (Gore lick, personal communication, 28 August 2021). The preprocessing is carried out using the Sentinel-1 Toolbox (which is part of SNAP) and involves thermal noise removal, data calibration, multi-looking, and range-doppler terrain correction [58]. Hence, at the very basic level, Google's approach for offering Sentinel-1 via its Earth engine platform is very similar to our Sentinel-1 datacube system presented in this paper, and access to Sentinel-1 time series is also quite fast on GEE [59,60]. Nonetheless, GEE should face similar issues such as large upfront costs caused by the need to preprocess the data upon ingestion and the need for re-processing campaigns in case of algorithmic updates or changes in the data.

Conclusions
In this technical note, we have presented a Sentinel-1 backscatter datacube system designed for enabling global land monitoring applications. Like the Google Earth Engine, it solves the problem of providing fast and efficient access to Sentinel-1 backscatter time series by projecting the Sentinel-1 IW GRD images onto a fixed Earth grid before tiling. This is a costly but nonetheless necessary step given that the Range-Doppler terrain correction of IW GRD images simply takes too long to be carried out on demand when covering larger regions and/or longer time periods. This problem could be avoided if the Sentinel-1 SAR images-similarly to the optical Sentinel-2 images-would be provided in a fixed Earth grid. That said, we would not recommend the UTM system used for Sentinel-2, given that this grid system leads to a significant duplication of the data. In contrast, the Equi7Grid that we use for the Sentinel-1 datacube minimises the data volume, works with only 7 continental regions (instead of 62 UTM zones), and in contrast to equal-area projections, it minimises signal degradation from pixel deformations and thus allows shape-sensitive operations.
An important improvement to our Sentinel-1 backscatter datacube would be to use the γ • rt f backscattering coefficient as proposed by Small [20] and recommended by CEOS as an ARD standard for normalised radar backscatter [61]. As the processing of γ • rt f with SNAP takes about 2-3 times as long as for σ • , we have not yet implemented this. However, investigations are ongoing to reduce the processing times by taking benefit of the high stability of the repeating Sentinel-1 orbits.
Our Sentinel-1 backscatter datacube is open to all interested users (but not free-ofcharge). At present, accessing and working with the data is only possible within the EODC cloud environment, but efforts are ongoing to improve the accessibility to the Sentinel-1 backscatter datacube also for other users by implementing additional access mechanisms and analysis capabilities. Our present work focuses on further developing the openEO API within the ESA funded openEO Platform project [62]. The longer-term vision is the formalisation of the data and service access by a larger user community, e.g., by making it available through the European Open Science Cloud or through other platforms such as the WEkEO DIAS [63].

Acknowledgments:
We authors want to thank our colleagues Alena Dostalova, Florian Roth, and Mark Edwin Tupas at TU Wien GEO Department, and Stefan Reimond at EODC, for maintaining and inspecting the global datacube. Vahid Freeman advanced our concepts for global processing and structuring satellite data. Special thanks go to Noel Gorelick for giving us insights into GEE. Open Access Funding by TU Wien Bibliothek.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following specific abbreviations are used in this manuscript: