10 m Sentinel-2 A Cloud-Free Composite – Southern Africa 2016

The processing of cloud free geo-referenced imagery is one of the preliminary processing step of any land application. This letter describe the methodology developed to obtain a seamless cloud free composite of Africa for 2016 using Sentinel-2A data at 10 meters resolution freely available from the European Space Agency. The method is based on an hybrid method resulting from the merging of the two most robust time series methods namely the "darkest pixel" and the "maximum NDVI" previously developed with AVHRR time series.


Introduction
The study described in this article intends to demonstrate and promote the Sentinel-2A (S2A) capacity to produce a Top Of Atmosphere (TOA) yearly cloud-free composite over Southern Africa up to 8° S at 10 meters resolution, including Blue (490 nm), Green (560 nm), Red (665 nm) and NIR (842 nm) spectral bands.
The S2A mission offers an unprecedented combination of systematic global coverage of land coastal areas, a high revisit under the same viewing conditions, high spatial resolution, and a wide field-of-view (FOV) for multispectral observations from 13 bands in the visible, near infrared and short wave infrared range of the electromagnetic spectrum [1].
As reported in the conclusion of the WorldCover 2017 conference [2], the main objective of Land Cover Community is to have a global high resolution land cover map, and the first step to obtain this is to generate HR cloud-free composites.
The S2A data used in this activity have been acquired from 1st January 2016 to 5th December 2016, the end date does not coincide to the end of the year due to the change in the S2A data format.The input data have been processed with a simple 'two-steps' algorithm based on three-monthly darkest pixel selection to remove clouds and successively yearly composite based on max-NDVI method to further remove cloud shadows.The area of interest (AOI) covers a surface of about 7430000 km² and has been processed from 740 S2A L1C granules (110km x 110km, at 10m spatial resolution).
The time series of S2A images has been processed using CalESA system, described in the next paragraph.The yearly composites are based on S2A L1C tiling grid (110km x 110km) and then mosaicked using GDAL utilities and Python routines based on the geo-location information associated to the pixels included in the metadata.

System description
Due to the quantity of S2A data over the AOI, 15 TB, a system for semi-automatic generation of cloud-free composite has been developed.This approach allows us to use the same resources for data storage and processing.Processing of the S2A data was done on CalESA system, a cluster composed of 21 server PC, 2 Tower server PC, 1GB network switch, that summarise 120 CPU Core, 580 GB shared Memory and 386 TB storage space.CalESA system comprises the cluster powered by Hadoop, the EO data processing system and the Distributed File System, that stores S2A data.The processing system comprises a processor adapter allowing to integrate Unix or python scripts and SeNtinel Application Platform (SNAP) processor.
The system is based on Hadoop 2.7 and Calvalus with SNAP-core.Logical system hierarchy is as follow: 1 Master PC for process management and HDFS supervision, 20 nodes (slaves) PC for processing and HDFS storage and 2 additional PC for downloading and products ingestion (and products translation for viewer).
Basic platform components: The S2A products acquired before the 5th October 2016 contain more than one S2A L1C granule (110km x 110km), used as basis for the cloud-free composite, therefore the first action was to extract each of them in order to create a stack of data based on S2A L1C tiling grid.
Due to the abundance of S2A data, in order to reduce the processing time, we decided to apply a preliminary screening based on cloud coverage information reported in the metadata (less than 5%), except for the areas with high persistency of clouds where all the available pixels are needed.
The original S2A L1C products are in Universal Transverse Mercator (UTM) coordinate system (CS) based on WGS-84 ellipsoid, this system divides the Earth between 80°S and 84°N latitude into 60 zones, each 6° of longitude in width but it cannot be applied as CRS to map entire Africa.For this reason we re-projected all S2A L1C products in Geographic Latitude and Longitude CS.

Darkest pixel method
Usually, to generate cloud-free maps complex algorithms for clouds detection and removal are applied, such as Automated Cloud-Cover Assessment Algorithm (ACCA) [3] or F-mask [4].
To simplify the processing, we decided to use a simpler method that chooses the darkest value for the 10 meters bands (VIS and NIR) for each pixel of the three-monthly stack of S2A images, as shown in Figure 1.As described by Saunders [5], in the histogram of visible reflectance the land and sea peaks are well defined at the 'dark' end with cloudy reflectance producing a broad higher reflectance tail.
The three-monthly intermediate composites replace cloudy pixels with cloudless ones but also cloud shadows pixels are included.

Maximum NDVI method
The Normalised Difference Vegetation Index (NDVI) measures the spectral response of the vegetation and it is defined as difference between near-infrared and red reflectance divided by the sum of the two (NIR-Red)/(NIR+Red).As reported by Holben in the Maximum-Value Composite (MVC) procedure [6], any attenuation of the spectral signals will reduce the NDVI value from that of an unattenuated signal.Therefore, the best possible pixel value for a particular location is achieved by choosing the highest pixel value from multi-temporal data (Figure 2).Peer-reviewed version available at Remote Sens. 2017, 9, 652; doi:10.3390/rs9070652

Combining method
Comparing the output of the two methods described above, we observed that each method taken individually is insufficient, and for this reason we combined them in the most efficient manner considering the processing time and final results.
STEP-1 -To remove the clouds, for each quarter (January-March, April-June, July-September, October-December) darkest pixel selection from the stack of S2A L1C TOA images identified as input.
STEP-2 -From the quarterly composites choose the best pixel applying the "max-NDVI" method replacing the pixels affected by cloud shadow with clear ones, if present.

Post-processing
To visualise the final product we developed a series of python scripts that combine different GDAL utilities described by R. Simmons [7], applying histogram stretching to the RGB bands, input data compression and generating an overview at different zoom levels of each S2A L1C TOA tile.
The output of the post-processing is the RGB mosaic at 10m spatial resolution available at http://2016africacomposite10m.esrin.esa.int/.
To display and interact with the map on the web page we used Leaflet an open-source JavaScript library.

Results
The final result shown in Figure 3, obtained applying our simple processing chain, is comparable to the ones generated using more complicated algorithms.

Discussion
The HR cloud-free composites are of crucial importance for the generation of HR land cover maps that represent key point to support monitoring of changes to vegetation within the growing season, to improve the efficiency of management of agricultural resources and to avoid emerging threats to the future of agriculture (e.g.land degradation, scarcity of land and water resources, biodiversity loss and climate change).
The near-future activities in the frame of the study reported in this article will be focused to generate the cloud-free composite over entire Africa for 2016 at 10 meters spatial.

Conclusions
The strong points of the 'two steps' method described in this study are the easy implementation, the quick computation time and the possibility to export and to adapt it easily on cloud processing system expanding the AOI and the time series.