Harmonization of Landsat and Sentinel 2 for Crop Monitoring in Drought Prone Areas: Case Studies of Ninh Thuan (Vietnam) and Bekaa (Lebanon)

Proper satellite-based crop monitoring applications at the farm-level often require near-daily imagery at medium to high spatial resolution. The combination of data from different ongoing satellite missions Sentinel 2 (ESA) and Landsat 7/8 (NASA) provides this unprecedented opportunity at a global scale; however, this is rarely implemented because these procedures are data demanding and computationally intensive. This study developed a robust stream processing for the harmonization of Landsat 7, Landsat 8 and Sentinel 2 in the Google Earth Engine cloud platform, connecting the benefit of coherent data structure, built-in functions and computational power in the Google Cloud. The harmonized surface reflectance images were generated for two agricultural schemes in Bekaa (Lebanon) and Ninh Thuan (Vietnam) during 2018–2019. We evaluated the performance of several pre-processing steps needed for the harmonization including the image co-registration, Bidirectional Reflectance Distribution Functions correction, topographic correction, and band adjustment. We found that the misregistration between Landsat 8 and Sentinel 2 images varied from 10 m in Ninh Thuan (Vietnam) to 32 m in Bekaa (Lebanon), and posed a great impact on the quality of the final harmonized data set if not treated. Analysis of a pair of overlapped L8-S2 images over the Bekaa region showed that, after the harmonization, all band-to-band spatial correlations were greatly improved. Finally, we demonstrated an application of the dense harmonized data set for crop mapping and monitoring. An harmonic (Fourier) analysis was applied to fit the detected unimodal, bimodal and trimodal shapes in the temporal NDVI patterns during one crop year in Ninh Thuan province. The derived phase and amplitude values of the crop cycles were combined with max-NDVI as an R-G-B false composite image. The final image was able to highlight croplands in bright colors (high phase and amplitude), while the non-crop areas were shown with grey/dark (low phase and amplitude). The harmonized data sets (with 30 m spatial resolution) along with the Google Earth Engine scripts used are provided for public use.


Introduction
In recent decades, advances in technology and algorithms have allowed remote sensing to play an important role in crop monitoring [1,2], which relies on continuous and dense time series. However, the cloud detection and optimization are reported as the main issues in the NASA's harmonized product [11].
As a consequence, the exciting and unprecedented opportunity to improve the spatio-temporal resolution of EO imagery at a global scale by harmonizing L7, L8 and S2 images, is rarely implemented because these procedures are data demanding and computationally intensive.
Emerging cloud computing platforms such as the Google Earth Engine (GEE) (which has the planetary-scale archives of remote sensing data including L7, L8 and S2), reduces substantially the work for data management and speed up the analyzing process [26]. Built-in functions and algorithms within the GEE platform help simplify many pre-processing steps allowing to focus more on the interpretation of the core algorithms [27,28].
The objective of this study is to develop a robust stream processing for the harmonization of L7, L8 and S2 in GEE; therefore, harnessing the benefit of coherent data structure, built-in functions and computation power in the Google Cloud. The procedure generates consistent surface reflectance images that have the same pixel resolution and map projection and are suitable for crop condition monitoring. In this study, we adapt the BRDF (MODIS-based fixed coefficients c-factor) [29,30] and the topographic correction model (modified Sun-Canopy-Sensor Topographic Correction) [31]. These models were implemented in GEE by Poortinga et al., (2019) [32]. We adjust the Landsat Top of Atmosphere (TOA) bands (blue, green, red, nir, swir1, swir2) according to S2 level 1C (TOA) values using cross-sensor transformation coefficients derived from Chastain et al., (2019) [33] (Table 1). In Section 3.1 we describe several tests to assess and evaluate the performance of each followed step. Finally, we show an application of the harmonized dataset by mapping the dynamics of seasonal crops in Ninh Thuan (Vietnam).

Study Regions and Input Data
To test this framework over different settings, we chose Ninh Thuan province (3366 km 2 ), located in Vietnam (South East Asia) and the Bekaa agricultural scheme (898 km 2 ) located in Lebanon (Middle East) (Figure 1). For Ninh Thuan province, we processed a total of 97 TOA satellite images gathered from 18 images of L7 (PATH 123, ROW 52), 18 images of L8 (PATH 123, ROW 52), and 61 images of S2 (TILE 49PBN and 49PBP). For the Bekaa agricultural scheme, we processed in total 120 TOA images gathered from 34 images of L7 (PATH 174, ROW 36 and ROW 37), 19 images of L8 (PATH 174, ROW 37), and 67 images of S2 (TILE 36SYC). Ninh Thuan is the most drought-prone province in Vietnam [36]. To cope with water shortage throughout the dry season (from Jan to Aug), rainwater is collected during the rainy season (from Aug to Nov) with more than 20 small to medium-size reservoirs. The extension of the cropland area is largest during the winter-spring season, then it reduces during the summer because of water shortage. Moreover, the crop during rainy season can be vulnerable to floods [37].

Workflow Overview
We summarized the workflow into three main steps: (1) pre-processing, (2) sensors harmonization, and (3) post-processing as illustrated in Figure 2. In the pre-processing step (red dashed line), we converted the TOA images to surface reflectances (SR) also known as bottom-of-atmosphere (BOA) using one AC model, exclude images that have greater than 30% cloud coverage over the studied area, then mask out all identified cloudy pixels. We applied the AC model via Python API, all the other tasks were Code Editor based. Because the BRDF and topographic correction models require digital elevation model (DEM) data, they are applied only when the images had been coregistered. The harmonization step (green dashed line) refers to re-projection, re-scaling, and re-alignment (co-registration) of the L7, L8 and S2 images. Finally, the post-processing step (blue dashed line) stacks all the harmonized images into a database of the GEE assets. This step also calculates composite images (mean) of every 10-day period (dekadal) and exports them to Google Drive, making it a shareable geospatial dataset for non-GEE users. The GEE scripts used in the study and links to the generated harmonized datasets that contain the SR images (bands blue, green, red, nir, swir1, swir2, and ndvi at 30 m) over both areas can be found in the Appendix A.

Atmospheric Correction
To reduce residual errors from using different AC methods [24,25], the same AC model called Py6S was applied to all L7, L8 and S2 TOA images. Py6S, was developed by Wilson (2012) [38], is a Python interface of the Second Simulation of the Satellite Signal in the Solar Spectrum or 6S [39] radiative transfer model to reduce the time of setting up numerous input and output data. Results produced from Py6S are the same as the results obtained with 6S [38]. Kotchenova and Vermote, (2007) [40] tested the performance of 6S and found the overall relative error was less than 0.8 percent.
This study implemented Py6S in GEE based on the code shared by Sam Murphy (2019) [41] which was executed via Python API and Docker container. In the model, the view zenith angle was hardcoded to "0".

Cloud Mask for Landsat Images
For L7 and L8, cloud and cloud shadow are masked using the Quality Assessment Band (BQA) (30 m resolution) [10], which was generated using the CFMask algorithm. This algorithm has shown the best overall accuracy among many state-of-the-art cloud detection algorithms [42].

Cloud Mask for Sentinel 2 Images
According to the assessment by Coluzzi et al., (2018) [43], the S2 L1C cloud mask band (QA60, available at 60 m resolution), which is generated based on the blue band (B2) and short-wave infrared bands (B11, B12) [44], generally underestimates the presence of clouds. On the contrary, Clerc et al., (2015), [45] reported that QA60 cloud masks are adjusted to minimize under-detections, which leads to over-detection. In either case, the performance of the L1C cloud mask is low, especially under critical conditions.
In the GEE environment, the authors in Poortinga et al., (2019) [32] applied a cloud scoring algorithm (ee.Algorithms.Landsat.simpleCloudScore [46]) to mask clouds in L8 and S2 images. The algorithm exploits the spectral and thermal properties of bright and cold of clouds that excluding the snow [47]. However, through an exhaustive visual analysis we observed that this Landsat based algorithm did not yield satisfactory results for S2 images over Ninh Thuan, Vietnam. This is likely due to the complex atmospheric condition (e.g., high water vapor content) [43] in Ninh Thuan region and the lack of the thermal band in S2 images.
Following the conclusions of Zupanc (2017) [48], who demonstrated a machine learning approach can outperform current state-of-the-art threshold-based cloud detection algorithms such as Fmask, Sen2Cor, or even MAJA (a multi-temporal approach for cloud detection [49]), this study applied a supervised classification using the QA60 band as covariate to identify cloud contaminated pixels. For every S2 scene, we trained a random forest classifier (ee.Classifier.randomForest()) using stratified random sampling (ee.Image.stratifiedSample()) and the QA60 band was used as the 'classBand' argument [50,51]. The stratified sampling function extracted 10 random points from each class contained in the QA60 band which has three classes including 'opaque clouds' class (pixel value = 1024), 'cirrus clouds' class (pixel value = 2048), and non-cloud-cirrus class (pixel value = 0). In addition, we used a threshold of below than 0.5 for the Normalized Difference Snow Index (NDSI) to prevent snow from being masked [32]. We used a thorough visual inspection to check the performance of this procedure, which showed promising results in such complicated atmospheric conditions like the observed in Ninh Thuan, Vietnam. Figure A1 demonstrated how the cloud was masked in a cloudy S2 scene over Ninh Thuan, Vietnam.

Cloud Shadow Detection
Cloud shadows can be predicted using the cloud's shape, height and sun position at that time [52]. However, this method depends on the cloud identification ability and poses large uncertainty while projecting the cloud's shadow on the earth's surface. This study used a Temporal Dark Outlier Mask (TDOM) method which greatly improves the detection of cloud shadow by obtaining dark pixel anomaly [47]. The TDOM method is based on the idea that cloud shadows appear and disappear quickly as the cloud moves. The implementation of TDOM in GEE was adapted from Poortinga et al., (2019) [32].

Co-Registration between Landsat and Sentinel 2 Images
The misalignment (or misregistration) between L8 and S2 images varied geographically and can exceed 38 m [19] which is mainly due to the residual geolocation errors in the L8 framework which were based upon the Global Land Survey images. In GEE, we used displacement() function to measure the displacement between two overlapped S2 and L8 images which were captured at the same time over the studied regions. Then, the displace() function is used to align ("rubber-sheet" technique) the L8 image with the S2 image [53]. Because the L8-S2 misalignment is reported stable for a given area and additionally, the S2 absolute geodetic accuracy is better than L8 [19], this study aligned all Landsat images (same PATH, ROW) using a common base S2 [54]. We also assumed that the misalignment among the same satellite images is neglectable. The co-alignment step described here is purely an image processing technique. It differs from geo-referencing or geo-correcting which involves aligning images to the correct geographic location through ground control points. At the moment, GEE documentation does not explain clearly the underlying of displacement() and displace() algorithms; however, the authors in Gao et al., (2009) [55] described in great detail a similar tool called AROP which is an open-source package designed specifically for registration and orthorectification of Landsat and Landsat-like data.

Re-Projection and Scaling
Because each band can have a different scale and projection [56], band's projection was therefore transformed according to the red band of S2 (WGS84) and band's resolution was rescaled to 30 m using 'bicubic' interpolation [57,58]. In fact, unlike other GIS and image processing platforms, the scale of analysis in GEE is quite flexible; the scale is determined from the output, rather than the input via a pyramiding policy [59].

BRDF Correction
The Bidirectional Reflectance Distribution Functions (BRDF) model is applied to reduce the directional effects due to the differences in solar and view angles between L7, L8 and S2 [11]. The implementation of BRDF correction in GEE was developed by Poortinga et al., (2019) [32] based on results from different studies [29,30]. This BRDF correction is MODIS-based fixed coefficients c-factor, originally developed for Landsat but proven that it works for S2 as well [21,29,30]. The view angle is set to nadir and the illumination is set based on the center latitude of the tile [11].

Topographic Correction
Topographic correction accounts for variations in reflectance due to slope, aspect, and elevation. It is not always required but can be essential in mountainous or rugged terrain [60,61]. The implementation of topographic correction in GEE was developed by Poortinga et al., (2019) [32]. The method is based on the modified Sun-Canopy-Sensor Topographic Correction as described in [31]. The DEM used is the SRTM V3 product (30 m SRTM Plus) which includes a void-filling process using open-source data (ASTER GDEM2, GMTED2010, and NED) provided by NASA JPL [62].

Band Adjustment
Although efforts have been made in the radiometric and geometric calibration of the Landsat and S2 missions so that their bands could be compatible [20], small spectral differences still exist [11,20,33]. We adjusted the six Landsat bands (blue, green, red, nir, swir1, and swir2) using cross-sensor transformation coefficients (Table 1) [33] used absolute difference metrics and major axis linear regression analysis over 10,000 image pairs across the conterminous United States to obtain these transformation coefficients.

Cropland Detection Using Harmonic Analysis of Time Series
The extended cropland land cover is valuable for the Ninh Thuan province's Irrigation Management Company (IMC) to calculate water distribution volume and predict water demand for the next season. However, because of seasonal variation, mixed crop rotation and data scarcity, it is difficult for the province to obtain the up-to-date and accurate seasonal extended croplands extension. To improve the accuracy of cropland detection, it is desirable to apply a temporal signature approach that can utilize data acquired at different growth stages; and therefore, increasing the dimensionality of information [63]. As the harmonic (or Fourier) analysis has proven to be useful in characterizing seasonal cycles and variation in land used/land cover types [9,[64][65][66][67], this study applied an harmonic analysis on dense NDVI time series, obtained from the harmonized dataset (L7, L8, and S2), to mapping seasonal croplands in Ninh Thuan during 2018. As illustrated in Figure 1 in [65], harmonic (Fourier) analysis permits a complex curve to be expressed as the sum of a series of cosine waves and an additive term. Each wave is defined by a unique amplitude and a phase angle where the amplitude value is half the height of a wave, and the phase angle (or simply, phase) defines the offset between the origin and the peak of the wave over the range 0-2 π. Therefore, high seasonal variation in NDVI of crop pixels will be characterized by high amplitude values and phase angles.

Design of the Evaluation Experiments
This study used several transformation models from other studies, for example, BRDF and topographic correction models from Poortinga et al., (2019) [32]; band adjustment coefficients from Chastain et al., (2019) [33]; and image co-registration built-in function in the GEE [53].
However, some studies suggested that site-specific models may be required for specific areas of study due to inconsistent regression coefficient values obtained across different study areas [21,68,69].
In addition, the sequences of processing steps as described in Figure 2 could be evaluated to generate the most optimized results. For example, we could compute band adjustment at the beginning of the workflow because the cross-sensor transformation coefficients that we used (Table 1) were calculated for TOA images. This adjustment might also improve the co-registration step later on since it is based on the comparison of the values of the pixels in two images. Therefore, it is desirable to understand and evaluate the effects of each processing/transformation step and compare the initial images with the final results. In this study, we applied several tests on two overlapped S2 and L8 images which were captured at the same time over the studied regions, before and after each processing step. Rectangular areas without cloud, cirrus, or saturated pixels were selected for analysis.
Although 30 m is the spatial resolution for every band in the harmonized data set, we evaluated the effect of each processing/transformation at 10 m resolution, we believe that at 30 m, the evaluation could have yielded better results.
The image IDs, date, and the acquisition time of the tested images are presented in Table 2. In Section 3.2, we estimated the reduction in sensor misregistration, while in Section 3.3, we calculated the spatial Pearson's correlation for each band, and finally, in Section 3.4, we assessed the temporal correlation of NDVI time series.  Figure 3 shows the offset differences in the tested areas, measured by the magnitude of the vector formed by dX and dY [53] before and after the overlapped pair of L8-S2 images were co-registered using the method described in Section 2.7. For Bekaa, the offset differences were reduced substantially from 22-32 m meters to less than 8 m (less than 2 m on average). For Ninh Thuan, Vietnam, the offset differences were reduced from 12 m (maximum) to less than 2 m in most of the pixels. These results are in agreement with Storey et al., (2016) [19] who also found misalignments that geographically varied between L8-S2. Further analysis (see Table 3) showed that the co-registration step contributed the most to the improvement in band-to-band spatial correlation.

Improving Band-to-Band Spatial Correlation
We analyzed the band-to-band correlation over two separated domains, a flat agricultural area (Figure 4a), and a mountainous area (Figure 4b). Each domain has an extent of 0.3 km 2 , without cloud, cirrus, or saturated pixels.  Table 3 compares the Pearson correlation values (r) of the red, nir, and ndvi bands when each processing/transformation step was applied (P0 to P5), over both the flat and the mountainous area. From P1, we rescaled L8 to the same resolution with S2 (10 m) using bicubic interpolation function in GEE. After each processing step, NDVI was calculated from the processed band Red and Nir. For the flat area, correlation values increased substantially from 0.67, 0.75, 0.79 to 0.93, 0.95, 0.96 in the red, nir, and ndvi bands, respectively. For the mountainous area, r increased from 0.56, 0.45, 0.63 to 0.77, 0.72, 0.80 in the red, nir, and ndvi bands, respectively. Table 3 also indicates that the co-registration step (P3) contributed the most in the band-to-band spatial correlation improvement. Table 3. L8-S2 cross-comparison of the Pearson correlation values (r) in the red, nir, and ndvi bands for each processing/transformation step when the process was applied over the flat area (a) and the mountainous area (b).  The correlation was higher in the flat area compared to the mountainous area, likely due to the impacts of untreated hill shadow or hill's slope. This result is in agreement with Colby, (1991) [60] and Vanonckelen et al., (2013) [61] who emphasized the importance of properly topographic correction in mountainous or rugged terrain. In addition, in Table 3, while both Red and nir correlations improved with topo correction (step P4) over the mountainous region, the NDVI correlation decreased. Based on this observation, we suggeste that further analysis should be made to better optimize results for the mountainous area. We will discuss some possible solutions in Section 3.4.
The comparison of the differences before and after processing in all bands are indicated in Figure 5. This is an analysis over the flat domain in Bekaa (Figure 4a) using the first pair of L8-S2 images provided in Table 2. Figure 5 presented per-pixel scatter plots of all seven bands (blue, green, red, nir, swir1, swir2, and ndvi), compared (using the r, bias, and RMSE) before and after the overlapped L8-S2 images were harmonized. These plots showed all bands are in good agreement. Band SWIR1 reached the highest correlation (r = 0.97) and band Blue has the lowest (r = 0.87). Figure 5. Scatters plots of all seven bands per pixel (blue, green, red, nir, swir1, swir2, and ndvi) for the flat domain in Bekaa, provided N (total number of the pixels), r, bias, and RMSE (Root Mean Square Error). The black and red dots represent the images before and after the overlapped L8-S2 images were harmonized. Figure 6 showes NDVI time series at a typical crop pixel (lat = 36.01, long = 33.83) before and after the application of the band adjustment. We observe that before the band adjustment, the NDVI values of L8 were systematically lower than those of S2; however, after the band adjustment, the two datasets matched chronologically. The third one shows the final harmonized ndvi time series which used data from all sensors. There are gaps in the time series because cloudy covered images were automatically eliminated from the process. As previously reported in Section 3.3 and Table 3, the Pearson cross-correlation of the different bands is low in the mountainous region due to the impacts of the hill's slope or remaining of untreated hill shadow or unknown reasons. This problem is further visualized in Figure 7, which showed the NDVI time series of a pixel located in a mountainous area (lat = 36.04, long = 33.81). After the processing, Landsat's NDVI values were seen systematically lower than those of S2. These results suggestes that we may need to apply different treatment for the mountainous or rugged terrain areas than in the flat agricultural areas. The domain for mountainous regions could be defined using slope or aspect threshold derived from DEM data. Modifying the band adjustment coefficients could play an important role. One can re-calculate these coefficients so that they are more adaptable to these particular studied areas.

Assessing the Dynamic Cropland Variation in Ninh Thuan, Vietnam
As described in Section 2.12 and following the methodology described in Ghazaryan et al., (2018) [9], combined with implementation of the harmonic model in GEE by Nicholas Clinton, (2017) [70], we fitted the time series of NDVI data in every pixel. Figure 8 shows the NDVI time series and the fitted values of regions that have one crop (unimodal greeness pattern), two crops (bimodal greeness pattern), and three crops (trimodal greeness pattern) per year in Ninh Thuan region. The exact locations of the plotted pixels are labeled as (a), (b), (c) in Figure 9. The phase (angle from the starting point to the cycle's peak) and amplitude (half of the cycle's height) values, which were derived from the harmonic models, were used to express the temporal signature of NDVI. Since the first harmonic term represents the annual cycle [64], the cropland's variation was identified using a composite image of phase, amplitude (of the first harmonic term), and the max NDVI ( Figure 9). Because NDVI at cropland pixels were characterized with high temporal variation, high angle or sharp turn at the peak of crop growth, and high max NDVI values, were highlighted in Figure 9 as bright colored pixels, while black or gray pixels represent non-cropland classes.
Although the derived amplitude and phase values are sufficient to differentiate croplands versus non-cropland pixels, Max-NDVI was added as a third band to give us an additional option to highlight the color of the cropland pixels by scale up their band's values. Furthermore, Max-NDVI is useful in a decision rule set to further separate crop types [9]. However, in this study, we only have interest in detecting the main croplands.

Conclusions
In the presented study, we demonstrated a robust workflow in GEE to generate harmonized L7, L8, and S2 images for two agricultural schemes in Bekaa, Lebanon, and Ninh Thuan, Vietnam. We also evaluated the performance of several pre-processing steps needed for the harmonization including image co-registration, BRDF correction, topographic correction, and band adjustment. Although, the band adjustment showed little impact on the Pearson cross-correlation results; it is an important process to match temporal spectral time series. The offset difference between L8 and S2 images in the Bekaa region was as large as 32 m, and if not treated, it poses a great impact on the quality of the harmonized dataset. Although a topographic correction model was applied, the lowest performance was observed in mountainous areas.
The use of a high amount of observations also omits the need for data smoothing techniques. We also demonstrated an application of the harmonized dataset by mapping the extended cropland via harmonic analysis for Ninh Thuan province in 2018.