Automated Mosaicking of Sentinel-2 Satellite Imagery

: Repeat frequencies of optical remote sensing satellites have been increasing over the last 40 years, but there is still dependence on clear skies to acquire usable imagery. To increase the quality of data, composited mosaics of satellite imagery can be used. In this paper, we develop an automated method for clearing clouds and producing different types of composited mosaics suitable for use in cloud-affected countries, such as New Zealand. We improve the Tmask algorithm for cloud detection by using a parallax method to produce an initial cloud layer and by using an object-based cloud and shadow approach to remove false cloud detections. We develop several parametric scoring approaches for choosing best-pixel composites with minimal remaining cloud. The automated mosaicking approach produced Sentinel-2 mosaics of New Zealand for ﬁve successive summers, 2015/16 through 2019/20, with remaining cloud being less than 0.1%. Contributing satellite overpasses were typically of the order of 100. In comparison, manual methods for cloud clearing produced mosaics with 5% remaining cloud and from satellite overpasses typically of the order of 20. The improvements to cloud clearing enable the use of all possible Sentinel-2 imagery to produce automatic mosaics capable of regular land monitoring, at a reasonable cost.


Introduction
Optical remote-sensing satellites have been observing the Earth for over 40 years [1], providing a consistent, long-term, large-area data record [2]. These data have been used for many applications, including monitoring crop production [3], land cover and land-use change [4], deforestation [5], ecosystem health [6], and water quality [7]. Recently, the commissioning of the Sentinel-2 satellites by the European Space Agency has significantly increased the frequency of low-cost optical data acquisition of the Earth [8]. Yet despite increased repeat frequencies, optical sensors are still dependent on clear skies to acquire usable imagery, with clouds potentially masking out changes, or developments, at crucial times. To increase the quality of data and to make it more user-ready, composited mosaics of satellite imagery can be used [3].
Temporal compositing of medium-resolution satellite imagery has only recently become viable [9,10]. Most compositing approaches follow a strategy for selecting the best pixel. Simple strategies include choosing pixels with maximum NDVI [11] or median band values [12] but are conceptually problematic when compositing over narrow temporal intervals [3]. Parametric scoring approaches choose pixels with the highest sum of scores of a range of parameters [9,13]. These depend strongly on the quality of the cloud masking and atmospheric correction to avoid high levels of artifacts. An alternative to the best-pixel approach is the synthesis of new values, such as weighted averaging [14], mean value compositing [15], time-series fitted harmonic models [16], or time-series trajectory templates [17]. These approaches can produce mosaics that are homogeneous in appearance, but they do not represent physical observations [3].
In cloudy countries, such as New Zealand, many images are required to create cloud-free composites. Typically, a period of 3 months is necessary to create a satellite composite for national mapping [18], and much cloud masking is required. Automated cloud masking has not generally been accurate enough, so manual drawing of clouds has been used. This is labour-intensive and only suitable for producing a small number of composites. With the advent of Sentinel-2 satellites giving coverage every 5 days, at no cost to users, it is now practical to perform temporal analysis of imagery for producing more accurate land cover maps. This requires automated and accurate cloud masking to minimise the artefacts that compromise temporal statistics. The Tmask algorithm [19] is an automated and accurate cloud-masking algorithm that uses deviation from a trend of reflectance to identify clouds. However, we have found in New Zealand that many land cover changes also have deviations from the trend and are erroneously detected as clouds. Here, we further improve the Tmask algorithm by using the parallax method [20] to produce an initial cloud layer, and we use an object-based cloud and shadow approach to remove false cloud detections. We develop several parametric scoring approaches for choosing best-pixel composites with minimal remaining cloud and discuss their use in mapping applications.

Methods
Sentinel-2A and -2B satellite imagery is readily available on a 5-day repeat cycle; however, it remains difficult to use this imagery in cloudy areas as accurate cloud and cloud-shadow classification is challenging. Cloud objects are wide and varied in their shape and size. They can be confused with other bright targets or can be difficult to detect when they are thin or hazy in nature. The problem is then compounded when these cloud features are cast onto the Earth's surface as shadows over often complex terrain. Classification of cloud and cloud shadow sufficiently accurate to generate automatic cloud-free mosaics has required a novel multi-stage combination of algorithms ( Figure 1).  Figure 1. Conceptual diagram of cloud and cloud-shadow classification. The algorithm improves the initial cloud and cloud-shadow mask through the parallax method and object-based classification, and improves the final cloud and cloud-shadow masks with an object-based morphology check. Novel additions to current algorithms are highlighted in yellow boxes.

Parallax-Based Cloud Identification
The first phase of our cloud classification process novelly combines current algorithms to establish an initial single-date cloud and cloud-shadow mask. We use the parallax method of Frantz et al. [20] to provide a more accurate initial map of cloud features. The parallax method involves the calculation and thresholding of the Cloud Displacement Index (CDI). This code was adapted from the Framework for Operational Radiometric Correction for Environmental monitoring (FORCE) open-source code [21]. We simplify the cloud routine from FORCE to create a binary mask of cloud features confirmed by the parallax method and then use that layer as an improved input into the Python version [22] of the Fmask code [19,23]. We found that the Python implementation [22] is simple to use, easy to adapt, and gives an accurate estimation of cloud-shadow features, provided an appropriate cloud search distance for New Zealand conditions is applied (2 km). If the Fmask shadow detection is allowed to search too far (such as the default search distance of 12 km), many cloud-shadow false positives are found over water (a dark target) as much of the New Zealand land mass is near the coast or lakes. Subsequent to our implementation, the Python Fmask code [22] has been adapted to incorporate the Frantz et al. [20] parallax extension as an optional flag.
The Fmask process was run without additional buffering as this can accentuate false positives. Instead, we use the efficient segmentation algorithm of Shepherd et al. [24] to segment the top-of-atmosphere sentinel imagery to features of at least 0.75 ha (75 pixels) in size. These features are then classified as either clear, cloud, or cloud shadow by majority of the initial parallax/Fmask result. Sensitivity to cloud and cloud shadow is then increased by also selecting any remaining clear features that contain 20% or more cloud or cloud shadow. The 0.75 ha size threshold was chosen to be large enough to provide a practical number of features, but less than the commonly used minimal mapping unit of land-cover mapping within New Zealand [25,26]. This novel segmentation approach has the impact of infilling features only partly classified by the parallax/Fmask methods but still ensures crisp feature edges, unlike processes based on buffering.

Tmask Implementation
The cloud and cloud-shadow masks outlined in Section 2.1 are a useful starting point but do not meet the standard able to be achieved by manual digitising. In particular, cloud shadow proves difficult to accurately classify from methods based on a single image. To improve detection, we look to a temporal process and use the single-date cloud and cloud-shadow masks as input into the Tmask algorithm of Zhu and Woodcock [27]. The Tmask algorithm attempts to further improve the results of initial masking by using temporal information. It is based on the idea that most of the cloud and cloud shadow has already been identified in the first step, and by using an empirically estimated time series models for the rest of the "clear" observations, it is possible to model or "predict" top-of-atmosphere (TOA) reflectance with observations to better detect clouds and cloud shadows. Zhu and Woodcock [27] also consider snow detection in some detail, but we focus solely on cloud and cloud shadow as significant snow cover is relatively rare over most of New Zealand. The Tmask time series function fitted to "clear" observations is given in Equation (1): where d is the julian date, T is the number of days per year (365), N is the number of years (integer, rounded up), a 0,i is the mean value for Sentinel band i TOA reflectance, a 1,i , b 1,i are coefficients that describe the intra-annual change of Sentinel band i TOA reflectance, a 2,i , b 2,i are coefficients that describe the inter-annual change of Sentinel band i TOA reflectance.
To aid with interpretation of the time series coefficients, we can transform Equation (1) into a relationship of amplitude (R) and phase (θ) where In contrast to the coefficients in Equation (1), the amplitude and phase coefficients in Equation (2) are independent of each other and therefore would be better suited if required in a classification involving a temporal sequence of imagery.
Zhu and Woodcock [27] use the green, near-infrared (NIR) and short-wave infrared (SWIR) bands when fitting Landsat TOA reflectance data. We apply the fitting to the equivalent bands of Sentinel-2 TOA reflectance data, namely bands 3, 8, and 11 (see [28]). Zhu and Woodcock [27] also recommend a Robust Iteratively Reweighted Least Squares (RIRLS) method be used to fit Equation (1) to the temporal TOA reflectance sequences to minimise the influence of the inevitable outliers of unidentified cloud and cloud shadow. We use the GNU Scientific Library (GSL) [29] function gsl_multifit_robust_bisquare to implement the regression as suggested. The Tmask process involves fitting the model to every pixel over a large enough time period to ensure enough observations: 15 clear observations is considered to be a minimum to obtain sensible values for the the 5 coefficients a 0 , a 1 , a 2 , b 1 , b 2 . We also adopted the minimum 3-year time period used by Zhu and Woodcock [27] for our New Zealand Sentinel-2 datasets and generated a suite of coefficient rasters fitted over the period of 2016-2018. The resulting coefficient rasters can then be used to describe the expected clear-sky temporal reflectance of any pixel and therefore provide a basis to test for cloud or cloud shadow as significant departures from this time series model. The continuing stream of data from temporal imagery makes it tempting to add more data to the coefficient fitting process as soon as it becomes available. However, per-pixel fitting is computationally intensive when dealing with many observations from multiple years. We found it more practical to perform scheduled updates of the coefficient rasters and use the previously fitted model to predict in between. This provides an interim temporal cloud classification in real-time that can be further improved at a later date. Despite performing scheduled updates, computer processing is still large. So, to reduce processing time we implemented the fitting in parallel using the Message Parsing Interface (MPI) on a high performance computer, typically using 200 cores and taking less than 12 h for all of New Zealand.
Zhu and Woodcock [27] found that a difference threshold of 0.04 (TOA reflectance) was a good compromise to balance the omission and commission errors for detecting cloud and cloud shadow. Pixels were classified as cloud when the green TOA reflectance was brighter than the time series model by 0.04 or more, or cloud shadow was classified if both the NIR and SWIR reflectances were lower than the modelled value by at least 0.04. We found this threshold worked well for cloud detection but lacked sensitivity in cloud-shadow areas, particularly over darker targets such as mature forest and water bodies. These targets already have low reflectance, and the 0.04 threshold represents a large proportion of the signal, so it is unlikely that a shadow cast from a cloud would cause enough additional darkening to meet the change threshold. We changed the shadow classification to a proportional reduction of 20% of the signal in both the NIR and SWIR. This test better matches the physics involved as the signal of an area in cloud shadow would be proportional to the reflectance expected under a clear sky. In addition, it seems reasonable that a cloud blocking the direct light of the sun would darken an area by at least 20% as direct irradiance typically dominates that of the diffuse sky. We did find that the 20% reduction was too sensitive over very dark targets, such as water, so we retained a minimum reduction threshold of 0.02 to limit this problem. To minimise per-pixel noise in the final result, we employed the same segment-based majority approach outlined in Section 2.1.

Object-Based Morphology Checking
The Tmask cloud and cloud-shadow classification process outlined in Section 2.2 uses a temporal imagery sequence but is solely pixel-based. This means that the local spatial context is not used in the detection process, whereas it is a critical part of manual interpretation. Therefore, specific areas of the image that change brightness for valid reasons other than cloud effects often become classified as cloud or cloud shadow. Examples of this include pastoral management of paddocks, often brighter than expected and erroneously classified as cloud, and variations in river course or water level, which are darker than expected and can be classified as cloud shadow. To deal with this over-classification, we novelly apply a set of object-based checks to ensure a consistent logic between cloud and cloud-shadow features. To minimise the chance of missing a cloud or its corresponding shadow due to granule processing boundaries, we combine all of the granules from a given pass over New Zealand and apply the following steps to the entire pass.
We first retain any Tmask-classified cloud feature that has pixels confirmed by the parallax method (Section 2.1). Then we check cloud features for a corresponding cloud shadow by searching within the mask in the reverse solar azimuth direction (away from the sun) up to a defined maximum distance; if a shadow is present, the cloud feature is retained. As outlined in Section 2.1, we found 2 km to be an appropriate search distance for New Zealand conditions. For cloud-shadow features, we perform a similar search in the solar azimuth direction (towards the sun) for a corresponding cloud, retaining the shadow feature if a cloud is present in the search area. While we would expect to find a corresponding cloud for each observed cloud shadow, the reverse is not always true: cloud shadows are not observed if there is additional cloud obscuring the search area. If we find cloud in the search area while searching for shadow, we reserve the decision until the cloud in the search window has been confirmed. If the search window cloud is confirmed, then we also retain the earlier cloud in a subsequent iteration. We apply these checks to potential cloud and cloud-shadow features under 200 ha in size as these form the bulk of features incorrectly classified due to land-cover change at normal management or paddock scales (typically a few hectares). Large features such as these are computationally time-consuming to match and much less likely to be real land-cover change.

Priority-Based Mosaicking
The process for cloud and cloud-shadow detection outlined in Sections 2.1-2.3 produces a single raster mask for each Sentinel-2 satellite pass and maps the cloud-free observation areas available for a mosaic. To automatically generate mosaics, we need to prioritise these valid observations from a candidate stack of these single-date masks. As imagery mosaics are often used for land-cover classification and associated land management policy, it is important that observations are from a known date and are as spatially consistent as possible, while still producing as full a cloud-free coverage as possible of a desired management area. For typical use-cases, we find two scenarios for prioritisation of inputs into an imagery mosaic.

Best Mosaic over a Date Range (Prioritise for Quality)
This process is used when you want the highest-quality imagery within a range of useful dates, such as the best mosaic of a given summer season. We take imagery over New Zealand from consecutive months from November to February inclusive to make a summer mosaic. We choose the areas of interest for the desired mosaic, in our case various island areas of New Zealand, as raster masks, and we input the cloud mask layers mentioned earlier into an iterative prioritisation algorithm. The algorithm calculates an image score by weighting the cloud-free area contributed to the mosaic by the solar elevation, to represent quality. This is particularly important in a mountainous country such as New Zealand to minimise the impact of terrain shadowing. The highest-ranked image is then selected as the top priority, its contributing pixels are set as filled within the raster mask, and the weightings are then iteratively recalculated with the remaining area to be filled and images available. This algorithm is designed to prioritise the large and spectrally homogeneous objects. In this situation, not all images will necessarily contribute as their cloud-free area may well have been provided from another image, and some particularly cloudy areas of the mosaic may remain without content even after all available imagery has been tried.

Closest Mosaic to a Given Date (Prioritise for Date)
When a particular date is important, the image priority calculation is simpler: we take the closest cloud-free image to the desired date within a maximum allowable range. If there are multiple images close to the desired date, we take the highest solar elevation to minimise the effects of shadowing in steep terrain. The high likelihood of cloudy conditions over New Zealand can mean it is difficult to achieve a cloud-free mosaic with a tight date range.
Once a priority list of candidate Sentinel-2 images is produced by either prioritisation method, a raster mask is created showing which image contributes information at which pixel. This "control" mask then controls the mosaicking process and provides valuable metadata, showing actual acquisition date at each location and any remaining cloudy or unimaged area in the final mosaic.

Mosaic Types
There are two distinct use cases for imagery mosaics: a mosaic suitable for manual interpretation (human readable), and a mosaic suited to automatic classification (machine readable).

Manual Interpretation
To produce a mosaic suitable for manual interpretation, we perform atmospheric correction to surface reflectance and correction of the angles of acquisition to a standard viewing geometry [30]. This processing has the effect of providing a consistent spectral reflectance to the user across the mosaic. We choose to not apply topographic correction at this stage as the topographic features provide useful context for manual interpretation. In our experience, we find users prefer a cloud-minimised mosaic. We therefore insert the most appropriate cloud information in areas of the mosaic that would otherwise appear black, having no available cloud-free image content. In some cases, it is also possible to interpret additional information within these cloudy areas as the clouds may be thin or the area may be partially cloud-free and was generalised as part of the cloud or cloud-shadow classification process. To automatically insert appropriate cloud information requires further processing of the "control" mask to replace cloud areas with the image information. The algorithm we use takes the remaining cloudy areas and produces unique raster clumps for each remaining cloud location with a consistent set of possible candidate images. Then we perform a neighbourhood search around these sites to see which of the possible candidates at the current location have the greatest length of perimeter contact. This neighbour is then the best "cloud" to assign to the current location as it will fit in well with the surrounding imagery. We then apply this algorithm iteratively until completion as some combinations only have "cloudy" perimeters until the infill process progresses. The new "control" mask then has cloudy areas "eliminated" and is then used to provide the basis for mosaicking the cloud-minimised mosaic.

Automatic Classification
Automatic classification will be most accurate when satellite spectral signatures relate directly to land cover without the confusing influence of atmosphere, view and illumination geometry, and associated topographic effects [30,31]. We therefore perform topographic correction as an additional step to the workflow mentioned earlier for manual interpretation. This presents a consistent spectral reflectance to the computer independent of topographic influence (whereas users often prefer to see topography as part of an interpretation process). In the case of automated classification, any cloudy areas remain set as "no data" as a land-cover classification should not be attempted in these locations.

Results
We have used this automated workflow to generate regular national mosaics of New Zealand for land-cover mapping and land-use monitoring. Of key importance to the success of these mosaics is being able to automatically generate cloud and cloud-shadow masks that are equal to or better than those achievable by manual digitising methods. The automated cloud clearing enables us to use all available imagery at a reasonable cost, which is important in a country like New Zealand that is often cloudy due to its mountainous terrain and maritime location. Figure 2 shows a typical Sentinel-2 overpass of New Zealand, which has to be used to produce mosaics sufficiently clear of clouds for land monitoring. The left image is false-colour with bands B08 (near-infrared, NIR), B11 (short-wave infrared, SWIR), and B04 (Red) mapped to red, green, and blue. The right image is the classified cloud and cloud-shadow mask, with clouds in magenta and shadows in yellow (inset area shown more clearly in Figure 3). Panel (a) of Figure 3 shows a subset of the Sentinel-2 image in Figure 2. It is displayed with near-infrared, short wave infrared, and red bands assigned to red, green, and blue. There is much cloud (appears white) and associated shadow. Panel (b) shows the result of the parallax method for cloud detection, with the clouds shown in magenta and the shadows in yellow. The parallax method is successful but does not classify the cloud shadow. It has an overall classification accuracy of cloud and cloud-shadow of 75%. Panel (c) is the result of inputting the parallax detection of clouds (i.e., panel (b) into the Fmask method)and then summarised by objects. Note that the clouds and shadows are now well-defined. Overall the classification accuracy of cloud and cloud-shadow is 91%. Panel (d) shows the final result after the Tmask method, which uses temporal processing informed by the full record of single-date results. The classification accuracy of cloud and cloud-shadow is 98%. Figure 4 presents automated Sentinel-2 mosaics of all New Zealand for every summer since the launch of Sentinel-2A. Table 1 shows that the mosaics are consistently better than 99.9% cloud-free. This compares favourably with the two mosaics produced from manual cloud clearing (these are labour-intensive and costly to produce), which achieve between 94 and 97% cloud-free mosaics. In the manual method, the number of contributing overpasses needs to kept low in order to minimise the manual effort involved, and for that reason the full image sequence is not used. Figure 5 shows control masks of which images are used to make the 2015/2016 summer mosaic-panels (a) and (b) for the manual method and panels (c) and (d) for the automated method. Figure 6 is an inset from Figure 5, which demonstrates how the mosaic improves by the automated method being able to make use of gaps between small clouds by using 49 contributing overpasses.

Discussion
We have developed an automatic mosaicking system that is able to reliably provide 99.9% cloud-free summer mosaics for national monitoring. It was not possible with manual cloud digitising, as it becomes less and less cost-effective to cloud-mask increasingly cloudy imagery, to obtain the benefit of all imagery. To achieve this required a workflow for cloud and cloud-shadow masking that was as effective (or more so) than manual operators, but automated. The use of the parallax method led the way to improving cloud masking [20] as it enabled discrimination between clouds and other bright objects. This is similar to how the thermal band was used in Landsat cloud-masking applications, but the method is not sensitive to low or thin cloud.
Cloud-shadow masking is often more challenging than masking the clouds themselves as they are projected features at surface level over often complicated topography, and it is for this reason and a need for more sensitivity to low and thin cloud that we incorporated the Tmask algorithm [27] to classify departure from modelled reflectance expectations. Most areas of New Zealand have many more cloudy than clear observations in their temporal record. Therefore, effective modelling of their reflectance expectation requires a good first approximation for cloud and cloud shadow to minimise the outliers in the temporal record of that pixel. We found the Fmask algorithm with parallax-based cloud initialisation met this need, but a key development was summarising by object using the efficient segmentation algorithm of Shepherd et al. [24]. The object-based summarising was also applied to the the final results of the Tmask algorithm, which can have a tendency to produce some noisy individual pixel results. Finally, we noticed that classification of features based on departure of surface reflectance from that expected from fitted models is sensitive to actual land-cover change. This true change is often the very reason that remote sensing is employed, so it is not helpful to mask these features as "cloud" or "cloud shadow" depending on whether the land-cover change has a brightening (e.g., deforestation, cultivation) or a darkening effect (e.g., afforestation, flooding). To ensure we were only masking true cloud and cloud shadow, we incorporated object-based checking of features to ensure there was a sensible pattern of cloud and associated shadows.
To produce mosaics, we used different methods for prioritisation. For applications of wall-to-wall mapping, it is important to produce near-complete coverage with as few dates as possible so that objects of change are homogeneous in spectral information. For this reason, we prioritise for quality in an acceptable date range. The algorithm calculates an image score by weighting the cloud-free area contributing to the mosaic by the solar elevation to represent quality. This is done iteratively on an image-by-image basis. The method is totally automated and produces high-quality mosaics with minimum complexity of the control mask. This is required for national land-cover and land-use mapping where annual updates are produced. For other applications where the timing of phenological events is important, such as beech masting [32,33] and weed mapping [34][35][36][37], it is important to capture a particular date where flowering or leaf growth is occurring. Here, we prioritise for date. We take the closest cloud-free image to the desired date within a maximum allowable date range. If there are multiple images close to the desired date, we take the highest solar elevation to minimise the effect of shadowing in steep terrain. This may include images from more than one year to ensure a mosaic with minimal cloud cover.
The combination of various existing cloud detection techniques with specific improvements and object-based implementation has produced cloud-clearing results that enable us to use all possible Sentinel-2 imagery to produce automatic mosaics, at a reasonable cost, in support of regular land monitoring. We produced two mosaic types: (i) for manual interpretation that has no topographic correction, so that topographic information is readily interpreted with reflectance information, and it is also cloud-minimised so that pixels under thin cloud can be viewed to maximise possible information; (ii) for automated analysis, topographic correction is applied and cloud is set to no-data. These automated methods are now able to be used to produce cloud-minimal mosaics appropriate for a greater range of mapping applications in cloud-affected countries.

Conclusions
This paper has outlined a new technique for the automated mosaicking of remotely-sensed imagery. The method relies heavily on high-quality automated cloud and cloud-shadow detection using a combination of spectral, temporal, and object-based techniques. It is being used operationally at national scales across New Zealand, providing imagery for land-cover mapping [25], land management [38], and policy implementation [39]. The technique can be consistently applied across a large range of geographic areas and data types. The ability to create the most cloud-free Sentinel-2 mosaics for any desired date range opens up many possibilities for monitoring and management. This remote sensing potential has previously not be realised due to the cost of available imagery and the time taken to manually perform cloud-clearing.
Author Contributions: J.D.S., J.S., and J.R.D. were responsible for algorithm development. J.D.S. and J.S. were responsible for algorithm application. All authors have read and agreed to the published version of the manuscript.