A Sentinel-2 Dataset for Uganda

: Earth observation data provide useful information for the monitoring and management of vegetation- and land-related resources. The Framework for Operational Radiometric Correction for Environmental monitoring (FORCE) was used to download, process and composite Sentinel-2 data from 2018–2020 for Uganda. Over 16,500 Sentinel-2 data granules were downloaded and processed from top of the atmosphere reﬂectance to bottom of the atmosphere reﬂectance and higher-level products, totalling > 9 TB of input data. The output data include the number of clear sky observations per year, the best available pixel composite per year and vegetation indices (mean of EVI and NDVI) per quarter. The study intention was to provide analysis-ready data for all of Uganda from Sentinel-2 at 10 m spatial resolution, allowing users to bypass some basic processing and, hence, facilitate environmental monitoring. DOI: 10.5878/bc12-w579.


Uganda
Uganda covers about 241,500 km 2 whereof ∼197,000 km 2 is land and the rest is water and permanent wetlands (Figures 1 and 2). The current (2021) population of about 44 million is expected to reach about 90 million in the year 2050 and about 137 million in the year 2100 [1], indicating an increasing pressure on resources based on ecosystem services such as food, feed, fuel, and fibre [2]. This stresses the importance of land resource monitoring, supporting efficient and sustainable land use related to forestry and agriculture [3,4], natural conservation, and the preservation of biodiversity and ecosystem services.
A range of environmental changes in Uganda have recently been reported, including land use/land cover changes [5,6], vegetation changes [7,8], drought [9], and negative soil changes [10]. Flooding, land slides, mud flows, flash floods [11][12][13] and related disasters are also significant risks in Uganda's mountainous regions. These phenomena can partly be effected by vegetation changes due to forest fires [12], land cover changes and deforestation [14].

Earth Observation
Earth observation (EO) is an efficient tool for the monitoring of land use, land cover and vegetation, as well as related changes (e.g., drought, flooding, deforestation), as it is the only tool with full global cover at high temporal (daily-weekly) and spatial (10-30 m) resolution. The Sentinel-2 mission comprises two polar-orbiting satellites, Sentinel-2A and Sentinel-2B, launched in 2015 and 2017, respectively. Each satellite has a return time of about 10 days at the equator and 5 days for the combination of both missions. It is equipped with a multispectral instrument (MSI) configured as a push-broom scanner. It includes 13 spectral bands in three spatial resolutions (four bands at 10 m spatial resolution, six bands at 20 m, and three bands at 60 m, Table 1, [15]). Sentinel-2 is suitable for mapping vegetation, land cover, and phenology, as well as associated changes over time [16].
All Sentinel-2 data are freely available from ESA [17] or from other data repositories [18,19] and about 4 TB of Sentinel-2 data are published daily [20]. The handling and processing of such large datasets can be facilitated by access to high-performance computing resources or cloud-based computing resources. Such handling and processing may be slow and inconvenient using standard desktop PCs. An intention here is to provide analysis-ready data (ARD) that can help to avoid some of this basic processing.

Aim
This study aims to present, describe and provide access to analysis-ready Sentinel-2 data covering Uganda for the period from 2018 to 2020. Sentinel-2 data (Level 1C, top of the atmosphere reflectance) for 2018 (4900 granules), 2019 (4944 granules) and 2020 (4929 granules) for 45 tiles over Uganda ( Figure 1) were downloaded from Google Cloud Platform (https://console.cloud.google.com/storage /browser/gcp-public-data-sentinel-2 (accessed on 1 November 2020)). Each tile covers approximately 100 × 100 km and about 600 MB of data in the original UTM/WGS84 projection. The tiling definition follows the US-MGRS (Military Grid Reference System) approach [22]. It is based on the standard 6 • longitude by 8 • latitude UTM zones, divided into 100 × 100 km tiles. The two first characters of the tile name (e.g., 36NTK) indicate the UTM zone (35 and 36 for Uganda); the third character indicates the 8 • latitude band (M and N for Uganda). Characters 4 and 5 denote the 100 × 100 km tile within the grid zone [23].

Land Cover
As an additional dataset (not used during the Sentinel-2 processing presented here) the 20 m spatial resolution S2 prototype land cover map for Africa was used, with reference year 2016 [21] (Figure 2). The land cover map was resampled to 10 m spatial resolution using nearest neighbour, and with identical tiling and map projection parameters to the Sentinel-2 data (Figure 3). This additional dataset may be useful in further analysis of the Sentinel-2 data for Uganda presented here. Figure 3. The SRTM 30 m spatial resolution elevation data used in the topographic calibration of the Sentinel-2 data. Overlaid is the non-overlapping output grid, which is part of the GLANCE grid for Africa [24]. Water areas in blue.

Digital Elevation Data
Digital elevation data are required for the topographic correction of EO data in Framework for Operational Radiometric Correction for Environmental monitoring (FORCE). The 1-arc-second (about 30 m spatial resolution) digital elevation model (DEM) derived from the shuttle radar topography mission (SRTM) was downloaded from http://e4ft l01.cr.usgs.gov/ (accessed on 3 November 2020) using the 30-Meter SRTM Tile Downloader [25] and mosaiced into a virtual mosaic using the Geospatial Data Abstraction Library (GDAL) [26] (Figure 3).

Processing
All processing of Sentinel-2 data was performed using FORCE (v 3.4-3.6) [27,28]. FORCE "is an all-in-one solution for the mass-processing and analysis of medium-resolution satellite image archives for large area + time series applications" and an open source software under the terms of the GNU General Public License [29]. FORCE can be downloaded from GitHub [30]. Figure 4 gives an overview of the processing flow.  This study (grey shaded boxes) include a set of measures of NDVI and EVI (min., max., average, standard deviation, 5%, 50% and 95% quantiles), calculated annually, and average NDVI and EVI, calculated quarterly. Clear sky observations (CSO) and best available pixel (BAP) are produced annually. The 20 m resolutions bands are resampled to 10 m spatial resolution with the ImproPhe method. Additionally available (but unused in the this study) FORCE process sub-models utilizing the FORCE data cube are labelled FORCE. Other software tools suitable for time series analysis (e.g., TIMESAT [31]), trend analysis (e.g., Poly_Trend [32]) and change detection in time series (e.g., DBEST [33] and BFAST [34]) can be utilized, but these normally require a time series of level 2 data.
The hardware used was a Dell PowerEdge R730 with two sockets carrying Intel Xeon (2.6 GHz) CPUs, allowing up to 72 processes to be executed in parallel. Two Redundant Array of Independent Disks (RAID-6) systems, of 73 TB each, were available for the storing of data. In the CENTOS-7-based multiuser system, resources were shared with other users, and the number of parallel processes was limited to about 50-70% of the available capacity.

Radiometric, Atmospheric and Topographic Corrections
All downloaded Sentinel-2 data were level 1C, i.e., top of the atmosphere (TOA) reflectance [35] and processed to level 2, bottom of atmosphere reflectance (BOA) using FORCE. The basic processing scheme is described by Frantz et al [36]. Cloud masking is based on a modified version of Fmask, as detailed in [36]. The radiometric correction includes a radiative transfer atmospheric correction and the aerosol optical depth is estimated over dark objects. Topography normalization in FORCE is performed by a modified C-correction [36,37]. The spatial resolution of the Sentinel-2 20 m bands was improved using the original 10 m bands as targets.
The atmospheric correction performance of FORCE was evaluated and compared to other processors in a recent inter-comparison exercise [38], which stated, 'The results of the APU (accuracy, precision, and uncertainty) analysis . . . indicating that LaSRC, FORCE, and MACCS provided accurate and robust surface reflectance estimates for all the cases', (LaSRC = Landsat 8 Collection 1 Land Surface Reflectance Code, MACCS = multi-sensor atmospheric correction and cloud screening spectro-temporal processor). Further details are available in [28] and references therein. In total, 16,773 level 1C (TOA) Sentinel-2 granules were downloaded and processed to level 2 (BOA).

Geometric Correction
FORCE uses a data cube concept, where the level 2 and higher levels of output are reprojected into a common coordinate system and organized in non-overlapping tiles ( Figure 3). Each output tile is defined as a square of 15,000 rows (lines) by 15,000 columns (pixels), corresponding to 150,000 by 150,000 m. Tiles (X0042_Y0025, X0042_Y0025, X0046_Y0028 and X0046_Y0029) outside the Uganda national border were not processed. All output data are created in a Lambert Azimuthal Equal Area projection corresponding to the Global LANd Cover mapping and Estimation (GLANCE) grid for Africa [24]). The spatial resolution of the 20 m Sentinel-2 bands ( Table 2) was improved to 10 m using the native 10 m bands and the ImproPhe method [39]. Hence, all output bands were produced with a 10 × 10 m spatial resolution.

Clear Sky Observations
For each year, the number of clear sky observations (CSO) were calculated [40]. The number of CSOs may, for example, influence the possibility of detecting the time of harvest and similar events, and can hence be used when planning EO studies. CSO may also be useful in planning time series analysis for the assessment of phenology or other phenomena, as well as to decide if a multi-sensor dataset is needed, for example, Sentinel-2 in combination with Landsat [41].

Best Available Pixel Composite
FORCE Level 3 compositing was used to generate seamless and gap-free composites of reflectance from temporal aggregations of level 2 data using the best available pixel (BAP) compositing [42]. BAP composites were produced using static target dates [43] for each year.
For each composite, two meta data files were produced, in addition to the reflectance data in the BAP composites: one file with compositing information (INF) and one file with a compositing score (SRC). The INF product contains information about the selected observation in the BAP product, as specified in Table 2. It is a multi-band image including pixel-wise information on data quality, and the day and year of included observations and sensor (Sentinel-2 A/B) of the best observation.
The Quality Assurance Information (QAI) is a score of the quality indicators used to create the BAP by selecting the pixel with the highest score for each location. All available observations for a specific pixel are assessed in terms of their suitability for the composite [42,44] using seven quality indicators. These indicators concern missing data, clouds (several indicators), cloud shadows, snow, and where the spectral reflectance is estimated to be <0 or >1.

Output Format and Naming Convention
All level 2 data were produced in the ENVI format, including a binary data file (*.dat) and a separate file with header information/metadata (*.hdr) [47]. Due to the size of the level-2 data (>50 TB), these were not made available in any repository.
All level-3 or higher output data are stored as 16-bit GeoTIFF files, each covering one tile of 15,000 rows ×, 15,000 columns, with 10 m spatial resolution and 10 bands ( Figure 3, Table 1). All Sentinel-2 output data are scaled with a factor 10,000 (a reflectance of 0.5 is represented by 5000). Missing values are represented by −9999. Output files were organized as described in Appendix A and named according to the FORCE naming conventions described in Appendix B.

Dataset Structure and Reproduction
The dataset is spatially structured as a data cube with non-overlapping tiles, allowing one or a set of tiles to be processed when a sub-national study area is desired. Temporally the dataset is structured into quarterly (average NDVI and EVI) and annually integrated variables (CSO, BAP and descriptive measures of NDVI and EVI; see Section 2.2.5), allowing flexible temporal sub-setting independently from, or integrated with, the spatial sub-setting.
The completed level 2 dataset can be reproduced from the original level 1C data using the FORCE parameter files that determine processing details (available on request from the author).

Results
The output of the level-2 and level-3 processing is tiled (Figure 3), where each tile is 150 × 150 km, corresponding to 15,000 rows by 15,000 columns for the 10 × 10 m Sentinel-2 pixels.

Level-2 Data
Level-2 data contain BOA reflectance for all ingested level 1C data. Each level-2 output file includes 10 wavelength bands (Table 1). Further specifications for the Sentinel-2 bands and radiometric properties can be found in [15] and in Table 1.

Vegetation Indices
Average EVI and NDVI were created for each quarter of each year, as well as average EVI and NVDI for each year, as exemplified in Figure 5.

Clear Sky Observations (CSO)
The number of available clear-sky observations (CSO) vary strongly spatially but show a similar pattern from year to year (Figure 7).

Data Availability and Download
Data can be downloaded from the URLs in Table 3.

Discussion
Information derived from EO data can support scientific and applied investigations, as well as the monitoring and planning of land-use/land-cover-related activities, and hence contribute to the sound management of ecosystem services [2,48]. The provision of analysisready data can be a suitable starting point for deriving such information.
FORCE, the free software used here, is highly suitable for the mass processing of EO data such as Sentinel-2 or Landsat. As the Sentinel-2 data and the other data used are also free, basic processing, in principle, can be carried out by users. However, the Sentinel-2 data presented here comprise several TBs, and the processing can be demanding and time-consuming when using standard computing. FORCE also includes a range of tools for higher-level processing [49], including time series analysis, machine learning, and landscape metrics, allowing for further analysis of the analysis-ready data.
The idea behind and aim of this study was to provide analysis-ready data from Sentinel-2 for Uganda, allowing users to avoid some basic data handling and processing. This aim has been fulfilled. These data provide a potential basis for further national and regional studies based on EO in Uganda, and can potentially increase the use and applications of Sentinel-2 data. The current data, provided here, can be expanded for use of, for example, older Landsat data, allowing retrospective studies of land cover and land cover changes. They can also serve as a basis for future assessment efforts or near real-time monitoring [50]. Figure 8 shows the difference (increase or decrease >0.1) in mean EVI from 2018 to 2020 for a part of Uganda (Mount Elgon) as an example of how the output data can be used to identify environmental changes. EVI are strongly related to carbon assimilation [51,52] and provide an estimate of the productivity of the target vegetation studied. Therefore, regions of increasing and decreasing vegetation productivity can be detected either using time integrated measures, as in Figure 8, or using trend analysis [32,53] and detection of changes [33]. A recent review of land-use change and land-cover change in Uganda provides an overview of the leading driver of such changes [54]. EO based monitoring of biomass [55] and terrestrial carbon stocks [5] further show potential applications of the dataset for the quantification of resources related to crucial ecosystem services [2,56]. The data sets presented are suitable for the classification of land use, land cover and related vegetation properties, especially properties associated with NDVI and EVI (as quarterly data are provided), but any vegetation index or band combination can be computed from the annual BAP composites, as they include all ten wavelength bands (Table 1). Change detection, on as annual basis, can be performed in any spectral band combination covered by the data and on quarterly basis using NDVI and EVI. A further outline of applications of this dataset is outside the scope of the paper, but the author is open for collaboration and suggestions of case studies.
The uncertainties of this dataset are related to the ESA's processing up to level 1C and to processing within FORCE. ESA reports monthly on geometric and radiometric performance [57], stating absolute geolocation without ground control points to be <11 m (95% confidence) and absolute radiometric uncertainty for from band 1 to band 12 (excluding band 10) < 3% ± 2% [58]. Uncertainties originating from FORCE related to radiometry are comparable to other similar methods [38]. Hence, the quality of the Sentinel-2 data set is good and ESA will further increase the geometric quality when the planned processing baseline 03.00 is introduced [58], covering both Europe and Africa.
The CSO (Figure 7) indicates the number of clear-sky observations for various loci, hence supporting the planning of what type of analysis could be suitable in various areas. In mountain regions of Uganda, it can be difficult to obtain cloud-free observations annually, even with satellites passing as often as every fifth day. Hence, there can still be missing data due to cloud and cloud shadows in the BAP composites ( Figure 6). The vegetation indices are functions (average, quantiles, etc.) of several observations and, hence, less directly influenced by clouds and missing data.
There are currently no direct plans to continue to maintain the level 2 and level 3 analysis-ready data presented beyond 2020.
Whether the future handling and analysis of EO data belong to cloud-based computing and storage services or local services is still uncertain [59,60]. The answer is probably that both strategies will be valid, for the near future at least, even if the produced datasets grow faster than the corresponding ability to analyse the data [61], especially on local platforms. Platforms like Digital Earth Africa https://www.digitalearthafrica.org/ (accessed on 1 March 2021) may provide support for the transition from local-based to cloud-based, assisting in handling the ever-growing data that are produced. Using FORCE and similar tools in cloud-based computing services could further increase data-handling efficiency, avoiding the shuffling of TBs of data, and support environmental monitoring. Acknowledgments: Thanks to David Frantz and co-workers, Humbolt University, Berlin of for making the FORCE software freely available. The SRTM elevation data provided by NASA and the ESA climate change initiative S2 prototype LC 20m map of Africa 2016 provided by ESA, are both acknowledge as useful contributions.

Conflicts of Interest:
The author declares no conflict of interest.

Appendix A. File Organization
All level 3 or higher-level products (CSO, BAP, and vegetation indices) are organized in one directory per FORCE-tile ( Figure 3) and one directory with virtual mosaics [26]. Hence, each directory include files with the same file names, separated by the directory they are stored in and corresponding geo-location. Each directory is compressed with tar -zcvf archive-name.tar.gz directory-name where the archive-name is the same as directoryname. This means that all files belonging to directory ./X0042_Y0027 are stored in the file X0042_Y0027.tar.gz. Extract the files with tar -zxvf archive-name.tar.gz or equivalent functionality. A list of directories (and compressed files) is given below. In order to access the virtual mosaics in the ./mosaic folder, all directories must be downloaded and uncompressed, or a new virtual mosaic created [26]. A definition of the data cube projections and setting used is available in a plain-text file called datacube-definition.prj with the standard format used by FORCE. The BAP files are named according to the FORCE 29-digit naming convention [62]. For example 20190629_LEVEL3_SEN2L_BAP.tif denoting a level 3 BAP product from Sentinel-2 with 26 June 2019 as the target date (Table A1).

. Elevation and Land Cover Data
All digital elevation data files (one file for each FORCE tile) are named Uganda_DEM.tif and the virtual mosaic file is named Uganda_DEM.vrt All land cover files (one file for each FORCE tile) are named ESACCI-LC_uganda.tif virtual mosaic file is named ESACCI-LC_uganda.vrt.

Appendix B.3. CSO
The CSO files are named according to the FORCE 37-digit naming convention for CSO [63]. For example 2018-2018_001-365-12_HL_CSO_SEN2L_NUM.tif is the number of CSO for the period DOY 1-DOY 365 in the year 2018 in GeoTiff format (Table A2). The vegetation indices files are named according to the FORCE 42 to 46 digit naming convention [64]. For example is 2020-2020_001-365_HL_TSA_SEN2L_EVI_FBQ_QUARTER-4.tif average EVI for year 2020 quarter 4 (October-December) in GeoTiff format. HL stands for higher level and TSA for time series analysis, the FORCE sub-module used. (Table A3).