Fusion of MODIS and Landsat-Like Images for Daily High Spatial Resolution NDVI

One of the obstacles in monitoring agricultural crops is the difficulty in understanding and mapping rapid changes of these crops. With the purpose of addressing this issue, this study aimed to model and fuse the Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) using Landsat-like images to achieve daily high spatial resolution NDVI. The study was performed for the period of 2017 on a commercial farm of irrigated maize-soybean rotation in the western region of the state of Bahia, Brazil. To achieve the objective, the following procedures were performed: (i) Landsat-like images were upscaled to match the Landsat-8 spatial resolution (30 m); (ii) the reflectance of Landsat-like images was intercalibrated using the Landsat-8 as a reference; (iii) Landsat-like reflectance images were upscaled to match the MODIS sensor spatial resolution (250 m); (iv) regression models were trained daily to model MODIS NDVI using the upscaled Landsat-like reflectance images (250 m) of the closest day as the input; and (v) the intercalibrated version of the Landsat-like images (30 m) used in the previous step was used as the input for the trained model, resulting in a downscaled MODIS NDVI (30 m). To determine the best fitting model, we used the following statistical metrics: coefficient of determination (r2), root mean square error (RMSE), Nash–Sutcliffe efficiency index (NSE), mean bias error (MBE), and mean absolute error (MAE). Among the assessed regression models, the Cubist algorithm was sensitive to changes in agriculture and performed best in modeling of the Landsat-like MODIS NDVI. The results obtained in the present research are promising and can enable the monitoring of dynamic phenomena with images available free of charge, changing the way in which decisions are made using satellite images.


Introduction
The techniques and technologies available through the science of remote sensing have become indispensable for monitoring changes in the terrestrial surface [1][2][3][4][5][6][7]. Such techniques are necessary when the objective is to follow agricultural development and make decisions about crop-related issues [8,9]. Sensors aboard satellites capture images from the Earth's surface at various spatiotemporal resolutions. They are often divided into sensors that capture images at low, medium, and high spatial resolutions [10]. Generally, high spatial resolution images have smaller dimensions, covering smaller portions of the Earth's surface and resulting in a lower temporal resolution [4,[11][12][13]. On the other hand, high temporal resolution images tend to cover larger portions of the Earth's surface, resulting in a poor spatial resolution. This fact is considered a technical limitation and known as a trade-off between the temporal and spatial resolution [14,15], which is due to the relationship between the scanning swath and image pixel size [11]. Based on these assumptions, there is currently no single orbital constellation that captures images with a high/medium temporal resolution and high spatial resolution, at least for free [16].
The lower temporal resolution hampers the monitoring of phenomena such as the phenological cycle of crops and crop-growth, which requires frequent observations [12,17]. These situations are commonly faced by researchers who use Landsat platforms, which have a 30 m spatial resolution and 16 day temporal resolution, when the sky is free of clouds [18,19]. On the other hand, the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor, which is aboard the AQUA and TERRA satellites, is available with a temporal resolution of one to two days, but with a lower spatial resolution (250, 500, and 1000 m) [20,21]. This challenges the use of this sensor for precision agriculture, since such agriculture requires detailed information on the farm's processes at a high spatial resolution [10].
To overcome the obstacles of the absence of a single sensor that provides images free of charge contemplating both characteristics (high frequency and detailed monitoring) [10,13], it is necessary to develop techniques that combine images with different resolution characteristics and generate hybrid products at a high/medium spatial resolution and high temporal resolution [22]. Hybrid products would be able to meet the expectations of monitoring phenomena with rapid transitions and aid decision-making in a timely manner.
The integration of data with complementary characteristics has been frequently addressed in research. Researchers have developed methodologies seeking to solve the difficulties of acquiring images with high frequency at detailed resolutions [3,4,11,12,16,[23][24][25][26][27]. Generating such information can be performed by techniques that use multiple sensors and downscale images [10,11,23,24,26,28]. In addition, data fusion techniques have also been proposed to combine images from multiple sensors with different spatial and temporal characteristics [29][30][31]. In most studies, images captured by MODIS have been used as a source of data at a course spatial and high temporal resolution and Landsat-like images have been employed as a source of data at a high/medium spatial and low temporal resolution [31][32][33][34].
Data fusion methods present a range of approaches and bases, which make some methods more difficult to apply or even more efficient than others [22]. According to Zhu et al. [22], those methods can be stratified into five groups: weight function-based, unmixing-based, Bayesian-based, learning-based, and hybrid methods. They also reported that all of these methods require at least one pair of coarse and medium/high spatial resolution images on temporally close days and coarse images on prediction days. For the weight function-based group, Gao et al. [17] proposed a method called the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM), fusing the MODIS daily 500 m surface reflectance product (MOD09GHK) and Landsat images with the aim of producing a synthetic daily Landsat image of surface reflectance. This method is basically a pair-based approach, and represents one of the first models developed for this context [22,30]. In this same line, Hilker et al. [35] also presented a data fusion model to produce hybrid images and detect changes in the surface. This methodology was denominated as the Spatial Temporal Adaptive Algorithm for mapping Reflectance Change (STAARCH). Other important research in this sense is the work of Zhu et al. [15], who developed an enhanced spatial and temporal adaptive reflectance fusion model (ESTARFM) based on the work of Gao et al. [17].
In terms of the unmixing-based data fusion methodology, the important work of Atzberger et al. [36] should be noted, which proposed unmixing approaches for combining datasets with different spatial resolutions. This work made it possible to use the historical (long-term average) crop profiles derived from the Satellite Pour l'Observation de la Terre-VEGETATION (SPOT-VGT) with endmembers derived from Project for On-Board Autonomy-Vegetation (PROBA-V).
Huang et al. [37], representing the Bayesian-based group, proposed unified fusion generating simultaneously high-resolution synthetic spatial-temporal-spectral Earth observations. With respect to learning-based methods, Zhu et al. [22] used machine learning algorithms to understand the relationship between observed coarse-fine image pairs and then predicted the non-observed fine images. Another example of a learning-based method can be seen in the work of Liu et al. [38], who proposed a spatiotemporal fusion method using a powerful learning technique-the extreme learning machine (ELM). The aim of this research was to improve the quality of data fusion with a fast computational model.
In terms of the hybrid group, Xie et al. [39] developed an improved STARFM with the help of an unmixing-based method, which they called unmixing-based STARFM (USTARFM). This proposed model's main objective is to improve the performance of STARFM in heterogeneous areas. Another hybrid method was presented in the work of Xu et al. [40], which proposed a method based on the traditional spatial unmixing technique, but with a regularized term approach, in order to ensure that the spectrum classes were not very different from the previously predefined classes, ensuring an accurate unmixing result. Gevaert and García-Haro [41], also in the line of hybrid modes, proposed a unique Bayesian formulation to restrain the unmixing process.
Previous work has shown that data fusion techniques are an important alternative for crop monitoring with spatial and temporal detailing; more specifically, for crop phenology monitoring [42,43] and biophysical parameters [44,45]. Data fusion models have been tested for various agricultural crops, i.e., rice [46], soybean [29], maize [29], and cotton [31], among others. Indeed, many studies have been published with the purpose of making the monitoring of surface more detailed and more information available. In addition, also seeking a greater availability of surface information, the research of Vuolo et al. [47] dealt with the generation of high revisit Landsat and a high spatial resolution without the requirement for coarse data. We can also consider the papers of Immitzer et al. [48] and Atzberger and Rembold [49], where the use of data fusion was completely avoided and the fractional coverage of crops was obtained using high-resolution training data fed with low-resolution predictor variables.
Multi-sensor coupling and data fusion are essential in monitoring the condition of vegetation and the dynamics of agricultural crops. The vegetation index is an important indicator for obtaining crop status information and it is widely used in monitoring the growth of crops [13]. The vegetation index estimated with multi-sensor coupling and data fusion provides a closer look at the variability of vegetation vigor, thus facilitating several decision-making processes and leading to more efficient and precise management for farmers.
In spite of the fact that a lot of research has made use of multisensory coupling and data fusion approaches with the purpose of increasing the spatiotemporal resolution of monitoring, both approaches have rarely been dealt with in the same research. This could deeply increase the chances of success in cases where high spatiotemporal monitoring is needed. Therefore, the possibility of generating hybrid products becomes paramount in the acquisition of ideal monitoring characteristics, especially in countries where agriculture is the main economic activity.
Due to the relevance of improving the spatial resolution and temporal frequency of images for agricultural monitoring, the objective of the present study was to model and fuse the MODIS Normalized Difference Vegetation Index (NDVI) using images from the Sentinel-2A, Sentinel-2B, Landsat-8, and Landsat-7 satellites (denominated here as Landsat-like) to achieve daily high spatial resolution NDVI. To achieve this, we tested seven regression algorithms on a commercial farm of irrigated maize-soybean rotation, using the intercalibrated Landsat-like images as the input for the regression approach and the MODIS NDVI as the target variable.

Study Area
The study area is located in the western region of the state of Bahia, Brazil ( Figure 1). The area has the following coordinate pairs as boundaries: X1-411,958; Y1-8,630,320 (upper left corner); X2-441,487; The entire area is 49,123.09 ha, and consist of different land uses: native forest, pastures, rainfed crops, and irrigated crops with the last being represented by the central pivots of Figure 2. We chose this area for the study because of its land use and because it is part of a large grain production region (Western Bahia), which could possibly benefit from the results of our research. In the survey of the 2017/2018 crop, this region was responsible for planting 2.415 million hectares, with the main crops being soybean, cotton, maize, and coffee [50]. The present work considered the whole area in the process of training and validation of the models; however, we gave prominence to the results for irrigated crops.
All of the irrigated area (2842.45 ha) is irrigated using center pivot systems ( Figure 2). The area has 17 center pivots, of which 16 are towable; however, the equipment is only moved from one crop season to another, for example, the pivot is positioned in the pivot area denominated A in the main crop season and, in the following season, the pivot is positioned in the area denominated B, according to Figure 2. The crops cultivated during the monitoring of this research were maize (seed) and soybean.

Orbital Images
The images used in the research correspond to five orbital platforms, including one with a lower spatial resolution and a higher temporal resolution (1 to 2 days) (TERRA/MODIS), and others with a lower temporal and better spatial resolution (Landsat-8/OLI, Landsat-7/ETM+, Sentinel-2A/MSI, and Sentinel-2B/MSI), which were denominated as Landsat-like. The platforms used in this study, which The entire area is 49,123.09 ha, and consist of different land uses: native forest, pastures, rainfed crops, and irrigated crops with the last being represented by the central pivots of Figure 2. We chose this area for the study because of its land use and because it is part of a large grain production region (Western Bahia), which could possibly benefit from the results of our research. In the survey of the 2017/2018 crop, this region was responsible for planting 2.415 million hectares, with the main crops being soybean, cotton, maize, and coffee [50]. The present work considered the whole area in the process of training and validation of the models; however, we gave prominence to the results for irrigated crops.
All of the irrigated area (2842.45 ha) is irrigated using center pivot systems ( Figure 2). The area has 17 center pivots, of which 16 are towable; however, the equipment is only moved from one crop season to another, for example, the pivot is positioned in the pivot area denominated A in the main crop season and, in the following season, the pivot is positioned in the area denominated B, according to Figure 2. The crops cultivated during the monitoring of this research were maize (seed) and soybean. The entire area is 49,123.09 ha, and consist of different land uses: native forest, pastures, rainfed crops, and irrigated crops with the last being represented by the central pivots of Figure 2. We chose this area for the study because of its land use and because it is part of a large grain production region (Western Bahia), which could possibly benefit from the results of our research. In the survey of the 2017/2018 crop, this region was responsible for planting 2.415 million hectares, with the main crops being soybean, cotton, maize, and coffee [50]. The present work considered the whole area in the process of training and validation of the models; however, we gave prominence to the results for irrigated crops.
All of the irrigated area (2842.45 ha) is irrigated using center pivot systems ( Figure 2). The area has 17 center pivots, of which 16 are towable; however, the equipment is only moved from one crop season to another, for example, the pivot is positioned in the pivot area denominated A in the main crop season and, in the following season, the pivot is positioned in the area denominated B, according to Figure 2. The crops cultivated during the monitoring of this research were maize (seed) and soybean.

Orbital Images
The images used in the research correspond to five orbital platforms, including one with a lower spatial resolution and a higher temporal resolution (1 to 2 days) (TERRA/MODIS), and others with a lower temporal and better spatial resolution (Landsat-8/OLI, Landsat-7/ETM+, Sentinel-2A/MSI, and Sentinel-2B/MSI), which were denominated as Landsat-like. The platforms used in this study, which

Orbital Images
The images used in the research correspond to five orbital platforms, including one with a lower spatial resolution and a higher temporal resolution (1 to 2 days) (TERRA/MODIS), and others with a lower temporal and better spatial resolution (Landsat-8/OLI, Landsat-7/ETM+, Sentinel-2A/MSI, and Remote Sens. 2020, 12, 1297

of 18
Sentinel-2B/MSI), which were denominated as Landsat-like. The platforms used in this study, which come from the Landsat constellation, have a temporal resolution of 16 days, and those belonging to the Sentinel constellation have a temporal resolution of 10 days [19,21,[51][52][53]].

Preprocessing Landsat-Like Images
The conversion of digital numbers into reflectance is required to compare instruments from a physical reference [54]. In order to use the images from Landsat-8 and Landsat-7, we converted the digital numbers into physical values (reflectance) and, concomitantly, we corrected the atmosphere's influence on these images. The images of Sentinel-2A and 2B were already in reflectance at the top of the atmosphere, so only atmospheric correction was necessary. After these processes, the spectral bands from different sensors were intercalibrated using linear regression to make them compatible.
The purpose of intercalibrating the Enhanced Thematic Mapper Plus (ETM+), MultiSpectral Instrument (MSI), and Operational Land Imager (OLI) sensors is to obtain images matching the Landsat-8 spectral resolution (Table 1) and that are compatible, thus resulting in an equivalent input for the model [55][56][57][58][59]. First, we resampled the MSI sensor data to match the 30 m spatial resolution of the OLI sensor (Table 1). Second, two pairs of images were used for the intercalibration of this procedure, with the first pair referring to the date (19 August 2017) when the passage of the MSI sensor coincided with that of the OLI sensor, and the other pair referring to the dates 10 August 2017 (ETM+) and 18 August 2017 (OLI), since these sensors do not overlap temporally. The pair of images dated 18 August 2017 were only used for intercalibration, while the ETM+ image of the day 10 August 2017 was also used in the data fusion procedure (Section Daily data fusion modeling). The Landsat-8/OLI images were used as the standard. With the purpose of calibrating the Landsat-like images, simple linear regression models were developed for both the ETM+ and MSI (independent variables) as a function of the OLI (dependent variable), as shown in Equations (1) and (2): where ρ ETM and ρ MSI are the reflectance of the ETM+ and MSI bands, respectively, and β o and β i are the intercept and angular coefficient of the equation, respectively.

Processing MODIS Images
The MODIS images used were obtained from the MOD09GQ product, which has a spatial resolution of 250 m and temporal resolution of one to two days, depending on the latitude of the image. Of the 36 bands available for this sensor, the bands used referred to the red (Band 1) and near infrared (Band 2) wavelengths-the only two bands related to the surface reflectance found in the MOD09GQ product [54,60,61].
Subsequently, we proceeded to remove the images that could not be used because of cloudy days and gaps in the study region. After the selection of images, the NDVI was calculated according to Equation (3) [60]. Then, the daily modeling was processed to perform the downscaling procedure.
where ρ nir is the reflectance of the near infrared band and ρ r is the reflectance of the red band. The NDVI has a response range from −1.0 to 1.0. For further information, consult [58,62,63].

Landsat-Like Data Inputs
Landsat-like bands referring to the blue, green, red, and near infrared electromagnetic spectrum were used to model the NDVI of MODIS (NDVI MODIS-250 ) ( Table 1). These bands were also used to generate six covariates using the Normalized Ratio Procedure between Bands (NRPB) as suggested by Filgueiras et al. [62]. This procedure calculated all the possibilities of normalized ratios between the spectral bands of the intercalibrated images, as shown in Equation (4): where ρ x and ρ y are the surface reflectances, relative to the different bands of Landsat-like images.

Regression Algorithms
In order to find a model with a greater ability to fit and estimate NDVI MODIS-250 , seven regression algorithms were tested: support vector machine with linear kernel (SVMLinear) regression, linear regression (LM), Ridge regression, Cubist regression, partial least squares (PLS) regression, principal components regression (PCR), and Generalized Boosted Regression (GBM). Methodological information on these models can be found in the publications of Kuhn and Johnson [63]. All models tested were implemented using the R programming language and environment [64].

Daily Data Fusion Modeling
The data fusion procedure was performed for images from 29 July 2017 to 13 August 2017. For this, 16 images of the tile H-13 V-10 of the MODIS sensor were downloaded, as well as three Landsat-like images, the Landsat-like image of the day 10 August 2017 (ETM+) was used in the intercalibration. To perform daily data fusion, each of the four Landsat-like images used referred to a different platform (Landsat-7, Landsat-8, and Sentinel-2A and 2B). Therefore, 16 MODIS images and four Landsat-like images were separated for the daily data fusion modeling.
When analyzing the availability of information for the agricultural property, four MODIS images could not be used because of the presence of clouds (1 August 2017 and 7 August 2017) and because of the absence of information resulting from gaps in the MODIS orbit (2 August 2017 and 9 August 2017). Therefore, 12 images of NDVI from the MODIS sensor, the MOD09GQ product, and three Landsat-like images were used in the procedure of daily data fusion modeling (Figure 3), which corresponded to the days 29 August 2017 (MSI), 3 August 2017 (MSI), and 10 August 2017 (ETM+). As it was not possible to acquire the MODIS image of the study area on 2 August 2017, it was not possible to run data fusion on that day, so the OLI image was not used.  For each modeling procedure, 5000 points were extracted from the data set and separated into two groups: a training set (70%) and validation set (30%). The modeling was performed for all cloudfree days when MODIS images were available. We started the modeling on a day (29 July 2017) when both sources of information were available, that is, MODIS images and Landsat-like images. Firstly, we upscaled the Landsat-like images to make them compatible with the MODIS images (250 m) and trained the models to predict NDVIMODIS-250 with the regression algorithms. The best-performing model during validation was chosen to predict the NDVIMODIS-30 using the Landsat-like image (30 m) as the input.
On the following day (30 July 2017), the MODIS image of the day and the Landsat-like image of the previous day (29 July 2017) were used, assuming that the vegetation in the study area was known and the land use had not changed. We used the closest Landsat-like image to predict the NDVIMODIS-30, i.e., for 30 July 2017, the models were trained with a Landsat-like image from 29 July 2017. This approach was performed experimentally over 16 days (12 days with images), as exemplified in Figure 4. We also highlight two situations which need attention in the data fusion procedure. First, preference should always be given to the Landsat-like image closest to the modeling date. Second, if For each modeling procedure, 5000 points were extracted from the data set and separated into two groups: a training set (70%) and validation set (30%). The modeling was performed for all cloud-free days when MODIS images were available. We started the modeling on a day (29 July 2017) when both sources of information were available, that is, MODIS images and Landsat-like images. Firstly, we upscaled the Landsat-like images to make them compatible with the MODIS images (250 m) and trained the models to predict NDVI MODIS-250 with the regression algorithms. The best-performing model during validation was chosen to predict the NDVI MODIS-30 using the Landsat-like image (30 m) as the input.
On the following day (30 July 2017), the MODIS image of the day and the Landsat-like image of the previous day (29 July 2017) were used, assuming that the vegetation in the study area was known and the land use had not changed. We used the closest Landsat-like image to predict the NDVI MODIS-30 , i.e., for 30 July 2017, the models were trained with a Landsat-like image from 29 July 2017. This approach was performed experimentally over 16 days (12 days with images), as exemplified in Figure 4. For each modeling procedure, 5000 points were extracted from the data set and separated into two groups: a training set (70%) and validation set (30%). The modeling was performed for all cloudfree days when MODIS images were available. We started the modeling on a day (29 July 2017) when both sources of information were available, that is, MODIS images and Landsat-like images. Firstly, we upscaled the Landsat-like images to make them compatible with the MODIS images (250 m) and trained the models to predict NDVIMODIS-250 with the regression algorithms. The best-performing model during validation was chosen to predict the NDVIMODIS-30 using the Landsat-like image (30 m) as the input.
On the following day (30 July 2017), the MODIS image of the day and the Landsat-like image of the previous day (29 July 2017) were used, assuming that the vegetation in the study area was known and the land use had not changed. We used the closest Landsat-like image to predict the NDVIMODIS-30, i.e., for 30 July 2017, the models were trained with a Landsat-like image from 29 July 2017. This approach was performed experimentally over 16 days (12 days with images), as exemplified in Figure 4. We also highlight two situations which need attention in the data fusion procedure. First, preference should always be given to the Landsat-like image closest to the modeling date. Second, if We also highlight two situations which need attention in the data fusion procedure. First, preference should always be given to the Landsat-like image closest to the modeling date. Second, if two Landsat-like images are available on the same date, preference should be given to the OLI sensor. If there is no image from the OLI sensor available, preference should be given to the alternative sensor which presented higher r 2 during intercalibration.

Statistical Analyses
After obtaining the fitted models, they were compared with the values observed by the NDVI from the MODIS sensor in a region not trained by the models (sample validation set). From the predicted (NDVI MODIS-250 estimated with Landsat-like images) and observed (NDVI MODIS) values, the following statistical metrics were calculated: determination coefficient (r 2 ), root mean square error (RMSE), Nash-Sutcliffe efficiency index (NSE) [65], mean bias error (MBE), and mean absolute error (MAE), as shown in Equations (5)-(9): where P i is the value estimated by the model, O i represents the observed values, O represents the observed mean values, and n represents the pair numbers of observations. The r 2 value indicates the extent to which the independent variables explain the variance of the dependent variable. The RMSE is an accuracy metric of the model obtained through the quadratic difference between the estimated and observed data, while the MAE provides a mean value of the absolute errors, differing from the RMSE, which gives greater importance to larger errors. This means that RMSE should be most useful when large errors are particularly undesirable; the MBE indicates possible underestimation or overestimation trends. The NSE is used to evaluate the predictive power of the model, varying from -∞ to 1; the value 1 corresponds to a perfect fit between the data estimated by the model and the data observed. Values between 0 and 1 are generally considered to indicate an acceptable level of performance, whereas values below 0 indicate that the observed mean value predicts better than the model, indicating an unacceptable performance [66][67][68][69][70][71][72][73].

Results
The intercalibration of the spectral bands for the ETM+ and MSI based on the OLI sensor obtained the following r 2 : ETM+ blue band (r 2 = 0.97); MSI blue band (r 2 = 0.96); ETM+ green band (r 2 = 0.98); MSI green band (r 2 = 0.96); ETM+ red band (r 2 = 0.99); MSI red band (r 2 = 0.97); ETM+ infrared band (r 2 = 0.93); and MSI infrared band (r 2 = 0.94). It should be noted that all p-values were significant at 0.1%. Figure 5 shows the results of the statistical metrics obtained for the validation set by the prediction models. A set of metrics was applied taking into account the findings highlighted in Chai and Draxler [74], who concluded that a composite of statistical indices is usually required to evaluate the performance of the models. It can be observed from a general analysis that the regression models applied to the prediction of the NDVI MODIS-250 showed satisfactory performances. The results of the statistical metrics show that it is feasible to estimate the NDVI MODIS-30 using the covariates, that is, information derived from the Landsat-like images.
Remote Sens. 2020, 12, 1297 9 of 18 The mean values found for the MAE are shown in Figure 5A. When considering all the dates, Cubist regression presented the lowest mean of MAE. It also presented the lowest MAE value for the entire study period, which makes this model the one that had the greatest capacity to reproduce the reality of the estimated data. By analyzing the MBE ( Figure 5B), it could be seen that the models, in general, are near the tenuous line (value ≈0) between negative and positive values. The MBE was not analyzed for predictive accuracy, and it is used in this case to know whether the models tend to underestimate (negative values) or overestimate (positive values).
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 18 The mean values found for the MAE are shown in Figure 5A. When considering all the dates, Cubist regression presented the lowest mean of MAE. It also presented the lowest MAE value for the entire study period, which makes this model the one that had the greatest capacity to reproduce the reality of the estimated data. By analyzing the MBE (Figure 5B), it could be seen that the models, in general, are near the tenuous line (value ≈0) between negative and positive values. The MBE was not analyzed for predictive accuracy, and it is used in this case to know whether the models tend to underestimate (negative values) or overestimate (positive values). The values found for r 2 ( Figure 5C) were high, which indicates that the independent variables have the capacity to explain the variance in the dependent variable. The highest mean of r 2 was that for the Cubist model (0.9093), indicating that it was the model that best explained the variability of the data observed by the MODIS sensor [75]. Figure 5D shows that the RMSE values were low, which indicates that the values fit the data well and, consequently, all the models have the capacity to explain the observed values. The RMSE was mainly used to give higher weights to less favorable conditions, being an interesting metric for revealing differences in performances [74]. The highest value of NSE was also found for the Cubist model, which was the only one tested that had an average NSE above 0.90 (0.9008). This index indicates how much of the dependent variable is explained by the model [65], and is an important metric for decisions on which model will be adopted in the current prediction of MODIS NDVI30 meters. After the analysis presented in Figure 5, we could conclude that the Cubist model was superior to the others for the prediction of MODIS NDVI30 meters, since it showed The values found for r 2 ( Figure 5C) were high, which indicates that the independent variables have the capacity to explain the variance in the dependent variable. The highest mean of r 2 was that for the Cubist model (0.9093), indicating that it was the model that best explained the variability of the data observed by the MODIS sensor [75]. Figure 5D shows that the RMSE values were low, which indicates that the values fit the data well and, consequently, all the models have the capacity to explain the observed values. The RMSE was mainly used to give higher weights to less favorable conditions, being an interesting metric for revealing differences in performances [74]. The highest value of NSE was also found for the Cubist model, which was the only one tested that had an average NSE above 0.90 (0.9008). This index indicates how much of the dependent variable is explained by the model [65], and is an important metric for decisions on which model will be adopted in the current prediction of MODIS NDVI 30 meters . After the analysis presented in Figure 5, we could conclude that the Cubist model was superior to the others for the prediction of MODIS NDVI 30 meters , since it showed the best result in all metrics evaluating the fit of the models to the observed data. Due to these results, Cubist regression was the model adopted to generate NDVI images with a 30 m spatial resolution (MODIS NDVI 30 meters ).
In Figure 6, we also highlight the spatial resolution of the MODIS images (250 m). This means that each pixel of an image of these bands has an area of 62,500 m 2 and does not allow the nuances of the surface to be observed in detail, especially when the purpose is to observe agricultural areas such as center pivots.
Remote Sens. 2020, 12, x FOR PEER REVIEW 10 of 18 the best result in all metrics evaluating the fit of the models to the observed data. Due to these results, Cubist regression was the model adopted to generate NDVI images with a 30 m spatial resolution (MODIS NDVI30 meters). In Figure 6, we also highlight the spatial resolution of the MODIS images (250 m). This means that each pixel of an image of these bands has an area of 62,500 m 2 and does not allow the nuances of the surface to be observed in detail, especially when the purpose is to observe agricultural areas such as center pivots.   Figure 7 shows the relationship between the predicted (NDVI MODIS-250 predicted with upscaled Landsat-like images) value and the observed value of MODIS NDVI-250 for each monitoring day using the Cubist model. It can be observed that the data fit the values observed well for all the days in which NDVI MODIS-250 was modeled with the parameters derived from Landsat-like images, making it possible to perceive the high density of dots (light green color) near the line, which serves as a reference for the ideal fit. The bars of errors show how small the deviation of the estimated data was, on the whole, against the observed data. Figure 7 shows the relationship between the predicted (NDVIMODIS-250 predicted with upscaled Landsat-like images) value and the observed value of MODISNDVI-250 for each monitoring day using the Cubist model. It can be observed that the data fit the values observed well for all the days in which NDVIMODIS-250 was modeled with the parameters derived from Landsat-like images, making it possible to perceive the high density of dots (light green color) near the line, which serves as a reference for the ideal fit. The bars of errors show how small the deviation of the estimated data was, on the whole, against the observed data.  Figure 8 shows the results of NDVIMODIS-30 estimated by bands and products derived from Landsat-like images with a 30 m spatial resolution. This technique, which enables monitoring of the surface to be continuous in time with spatial detailing, is only possible due to certain factors, which can be considered as the premises of this modeling.
The first of these assumptions is the coupling of sensors with a detailed resolution (Landsatlike), since this procedure substantially increases the frequency of images with a more detailed spatial resolution, as done in the current study. The mean frequency of one image every four days was achieved by coupling multi-sensors (ETM+, MSI, and OLI). This premise allows for a greater frequency of detailed images (Landsat-like images), which are the independent variables of the model. In order to perform the image coupling procedure, it is necessary to take into account factors that may hamper the intercalibration of these sensors, as reported in D'Odorico et al. [55], Fan and Liu [56], Mao et al. [57], Teillet et al. [76], and Yan et al. [77].
A second (premise) condition for performing this image data fusion procedure is the consideration that abrupt land-use changes from one day to another are known by the user at the monitored site. If, for example, harvesting of a crop takes place one day after the sensor with a better resolution passes (Landsat-like), the next day's downscaled image will display a drop in the NDVIMODIS-30 values, caused by the response of the images of NDVIMODIS-250 m. However, the result of the NDVIMODIS-30m (downscaling procedure) image will exhibit a more sensitive drop due to the fact  Figure 8 shows the results of NDVI MODIS-30 estimated by bands and products derived from Landsat-like images with a 30 m spatial resolution. This technique, which enables monitoring of the surface to be continuous in time with spatial detailing, is only possible due to certain factors, which can be considered as the premises of this modeling.
The first of these assumptions is the coupling of sensors with a detailed resolution (Landsat-like), since this procedure substantially increases the frequency of images with a more detailed spatial resolution, as done in the current study. The mean frequency of one image every four days was achieved by coupling multi-sensors (ETM+, MSI, and OLI). This premise allows for a greater frequency of detailed images (Landsat-like images), which are the independent variables of the model. In order to perform the image coupling procedure, it is necessary to take into account factors that may hamper the intercalibration of these sensors, as reported in D'Odorico et al. [55], Fan and Liu [56], Mao et al. [57], Teillet et al. [76], and Yan et al. [77].
A second (premise) condition for performing this image data fusion procedure is the consideration that abrupt land-use changes from one day to another are known by the user at the monitored site. If, for example, harvesting of a crop takes place one day after the sensor with a better resolution passes (Landsat-like), the next day's downscaled image will display a drop in the NDVI MODIS-30 values, caused by the response of the images of NDVI MODIS-250 m . However, the result of the NDVI MODIS-30m (downscaling procedure) image will exhibit a more sensitive drop due to the fact that the Landsat image is from the day on which the crop had not yet been harvested. Therefore, it is essential that the user of this methodology knows the monitored surface and has updated information about the management and land-use of that region. Ke et al. [45] pointed out a similar premise.
that the Landsat image is from the day on which the crop had not yet been harvested. Therefore, it is essential that the user of this methodology knows the monitored surface and has updated information about the management and land-use of that region. Ke et al. [45] pointed out a similar premise.  It could be observed in the monitoring of the downscaled images that the crop of the pivot denominated 1A (Figure 2) had been entirely harvested on the day 8 August 2017 ( Figure 7H). By observing Figure 7, it can be noted that the area of this center pivot showed a subtle decrease in the NDVI value between 29 July 2017 ( Figure 7A) and 8 August 2017 ( Figure 7H) and, after that date, there was a more pronounced fall in NDVI values, indicating that harvesting had occurred. This fact can also be observed in the images that underwent the downscaling process ( Figure 8).
The NDVI value of center pivot 1A between the dates 4 August 2017 and 8 August 2017 ( Figure 8E-H) showed a noticeable drop, which underscores the robustness of the data fusion methodology applied in the present study. The model fitted using previous Landsat-like images can satisfactorily model the information on the day of the MODIS sensor, thus offering a product with high-frequency images and spatial detail. The average of the NDVI MODIS for the central pivot 1A on 29 July 2017 was 0.56, while the average NDVI for 8 August 2017 (already harvested) was 0.32. By comparing the drop of 0.24 that occurred on those days with the average value found for MAE (0.03) for the Cubist model over the monitoring period, the relationship of this error with the temporal variability of NDVI can be highlighted, allowing us to understand the potentiality of the suggested model.

Discussion
We can observe that the validation metrics of the Cubist and GBM models stood out against the others for the 10th image (11 August 2017) ( Figure 5), with the Cubist model being superior to that of the GBM. On that day, the metrics obtained by all the models were farther from the study period average. We relate this to the distortion presented by the MODIS image for this date (11 August 2017), which can be seen in Figure 6J. Figure 6 shows the NDVI calculated with bands one and two of the MODIS sensor, which are related to the MOD09GQ product. We believe that the deterioration in performance on the 10th image (11 August 2017) was mainly due to the viewing angle with which the sensor captured the image and the proximity of the gap in the study area, since this image is more distorted than the others ( Figure 6J). Although the Cubist model performed better, the merit of the other models cannot be disregarded, since all of them were able to predict this parameter under the circumstances of the study with similar performance metrics. The performance of the LM model can also be highlighted; it was the simplest model but had the highest degree of interpretability. This model, like the others, was able to model the NDVI standards on all of the dates analyzed. However, in situations where abrupt changes in vegetation occur, it is expected that this model, as well as other linear models (PCR, RIDGE, PLS, and SVMLinear), will present a lower performance compared to models such as Cubist and GBM models. As we analyzed the area as a whole, we did not notice any change in the performance of the validation set for changes in crops in central pivots.
Ke et al. [78] proposed a method for downscaling the evapotranspiration product of the MODIS 8-day 1km, based on Landsat-8 images and machine learning techniques. The methodology used by these researchers is similar to that of the current research, that is, upscaling was performed for the information derived from the Landsat-8 satellite and from this, the evapotranspiration of the MODIS product was modeled using machine learning. Subsequently, they applied the fitted model to the covariates with a 30 m spatial resolution. The aforementioned paper tested three models of machine learning: Random Forest (RF), Cubist, and SVM. Among these, for the studied variable, RF was the one that obtained the smallest errors. However, these authors point out that RF showed results that were similar to, but only slightly better than, those of the Cubist model.
As found by Ke et al. [78], RF results are generally similar to those of the Cubist model and because of this and the time demanded by the RF to process the data, this algorithm was excluded from the analysis of the current research. The approach of the present study aimed to generate daily models throughout the monitoring period, requiring, besides acceptable metrics, rapidity in the analysis. This is an important point and it must be considered in the creation of methodologies. Regarding this point, the Cubist model stands out against RF.
Muhling et al. [79] used the Cubist model to predict the surface temperature and salinity in the Chesapeake basin in the United States of America. According to these authors, the Cubist model is a regressor similar to other regression tree models in the way that it divides training data into increasingly homogeneous subsets. However, what makes this model different from others is the way in which the value is predicted in the final node; instead of being a final fixed value, like those of other models based on decision trees, this algorithm promotes multiple linear regression to reach the final results of the nodes. According to these authors, the final regression in the nodes causes the Cubist model to have a greater capacity to generalize, having the possibility of extrapolating the predictions beyond the amplitude of the training set.
The research conducted by Ke et al. [78] did not include a methodology of downscaling that was temporal and spatial, but these authors highlighted the importance of applying this methodology in order to combine the spatial resolution of Landsat and the high temporal frequency of MODIS. In the research conducted by Ke et al. [45], using machine learning techniques integrated with the space-time downscaling approach were proposed to generate an evapotranspiration product every 8 days with a resolution of 30 m. This product was generated based on MODIS information and the Landsat-8 orbital platform, using the methodology of Ke et al. [78] as the basis for achieving the results. After the generation of the evapotranspiration results obtained through the downscaling process, it was verified that the results of this product show a greater precision than the product itself, demonstrating the potential of these techniques in the generation of information with greater spatial details over time, reinforcing the importance of studies of this nature.
Gu and Wylie [3] developed a methodology to integrate the growing season NDVI (GSN) product calculated with the NDVI of the MODIS sensor (spatial resolution of 250 m) with multiple dates of the Landsat constellation (spatial resolution of 30 m), with the proposal to generate a GSN product integrating these sources of information. These researchers found a strong correlation between the predicted data and the observed data (r = 0.97), and the MAE was 0.026, which is in agreement with the present study, since Gu and Wylie [3] used the Cubist model to obtain these results.
Boyte et al. [28] integrated weekly NDVI data from the MODIS sensor and Landsat-8 platform using four regression tree models, with the purpose of downscaling the MODIS images and applying them to the monitoring. The results show that the correlation coefficient was strong for all predictions (r ≥ 0.89) and, in addition to these results, the researchers emphasized the visual quality of the synthetic product MODIS NDVI of 30 m compared with the original product of 250 m. Boyte et al. [28] stated that these products with a 30 m spatial resolution contribute to understanding seasonal processes and provide important resources for land managers.

Conclusions
In this study, an NDVI downscaling methodology was developed with images from the Sentinel-2A, Sentinel-2B, Landsat-7, and Landsat-8 orbital platforms, and an NDVI product with a 30 m spatial resolution and the temporal frequency of the MODIS sensor (1 to 2 days) was generated.
Among the regression models used in the modeling of the daily NDVI at a 30 m resolution, the Cubist model was the one that showed the best fit, being the model recommended to proceed with this methodology in the study area. However, the other models also showed satisfactory results and could be used to generate the final product.
Due the performance of the methodology developed in this study, agricultural monitoring in higher detail (30 m) with a daily temporal frequency (temporal resolution of the MOD09QG) could be feasible.
The images with a daily 30 m resolution were sensitive to changes in land use, even on dates that had only information on NDVI at a 250 m spatial resolution, as occurred during the harvest of pivot 1A, or even when the MODIS images presented distortion, as shown in Figure 7J.
The methodology applied in this study can be used to generate other variables, such as evapotranspiration, the surface temperature, and soil moisture, which can be employed to make irrigation decisions at the farm level.