Sentinel-1 to NDVI for Agricultural Fields Using Hyperlocal Dynamic Machine Learning Approach

: The normalized difference vegetation index (NDVI) is a key parameter in precision agriculture. It has been used globally since the 1970s as a proxy to monitor crop growth and correlates to the crop coefﬁcient (Kc), leaf area index (LAI), crop cover, and more. Yet, it is susceptible to clouds and other atmospheric conditions that might alter the crop’s real NDVI value. Synthetic Aperture Radar (SAR), on the other hand, can penetrate clouds and is hardly affected by atmospheric conditions, but it is sensitive to the physical structure of the crop and therefore does not give a direct indication of the NDVI. Several SAR indices and methods have been suggested to estimate NDVIs via SAR; however, they tend to work for local spatial and temporal conditions and do not work well globally. This is because they are not ﬂexible enough to capture the changing NDVI–SAR relationship throughout the crop-growing season. This study suggests a new method for converting Sentinel-1 to NDVIs for Agricultural Fields (SNAF) by utilizing a hyperlocal machine learning approach. This method generates multiple on-the-ﬂy disposal ﬁeld- and time-speciﬁc models for every available Sentinel-1 image across 2021. Each model learns the ﬁeld-speciﬁc NDVI (from Sentinel-2 and Landsat-8) –SAR (Sentinel-1) relationship based on recent NDVI and SAR time series and consequently estimates the optimal NDVI value from the current SAR image. The SNAF was tested on 548 commercial ﬁelds from 18 countries with 28 crop types and, based on 6880 paired NDVI–SAR images, achieved an RMSE, bias, and R 2 of 0.06, 0.00, and 0.92, respectively. The outcome of this study aspires to a persistent seamless stream of NDVI values, regardless of the atmospheric conditions, illumination, or local conditions, which can assist in agricultural decision making.


Introduction
The normalized difference vegetation index (NDVI), which was introduced in the mid-1970s [1,2], is still, to date, the most common index used to monitor vegetation in general and specifically vegetation in agriculture using satellite imagery [3]. The United States Geological Survey (USGS) termed the NDVI as "the foundation for remote sensing phenology" [4] because it is sensitive to the biochemical and physiological properties of vegetation. Consequently, the NDVI can reveal where vegetation is thriving and where it is under stress as well as changes in vegetation due to human activities, natural disturbances, or changes in plants' phenological stage [5]. Its relative simplicity, utilizing the normalized difference between the red (~650 nm) and the near-infrared (NIR) light (~850 nm), makes the NDVI accessible, since many sensors carried aboard satellites measure the reflected light in these wavelengths. So, many researchers found the NDVI useful as a proxy to monitor crop growth [6][7][8] and correlated it to the crop coefficient (Kc) [9][10][11], leaf area index (LAI) [12][13][14], and crop cover [15][16][17]. Consequently, the NDVI (either by utilizing it directly or indirectly) is an important information source in agriculture decision-making processes such as harvest planning, irrigation scheduling, fertilization inputs, and other agrotechnical actions [18][19][20][21][22][23][24]. strengths with different crops, phenological stages, soil types, or NDVI values. Furthermore, the previously suggested SAR indices and even the models can be considered static and inflexible, not considering different local conditions and thus achieving inferior results when tested on other crops or environments. Therefore, the need for a more robust method to estimate the NDVI using SAR remains. From a practical point of view, this desired method should be like the NDVI in terms of global applicability, meaning that it can be utilized in any given local field conditions (e.g., crop or soil type, growth stage, etc.).
Building on the previous studies' findings, such a method should be non-linear, dynamic (as opposed to having fixed parameters, coefficients, or formulas), crop-agnostic, flexible, and hyperlocal to account for the inter-and intra-season changes in the SAR-NDVI relationship. In addition, the desired method should not focus on one SAR index, as none of the indices showed consistent superiority over the others. The desired method should take advantage of the ample past Sentinel-1 data by incorporating it into the method.
Hence, the objective of this study is to develop a method to estimate the NDVI from Sentinel-1 data that can be applied globally to agricultural fields and will be robust enough to work on a variety of fields, crops, growth stages, and soil types. The method is termed SNAF (Sentinel-1 to NDVI for Agricultural Fields), and it uses a hyperlocal dynamic machine learning approach. This means that the SNAF method will generate a new model per field for any new Sentinel-1 image based on the past field-specific Sentinel-1 and NDVI time series. By generating a field-and time-specific model, the SNAF accounts for the crop changes during the growing season and eliminates the need to incorporate field settings into the model because they are constant (or hardly change) considering a specific field (e.g., the soil type does not change).
Further, the SNAF's underlying assumption is that many previously developed indices have merit in estimating the NDVI, but that merit might change depending on the crop type, growth stages (or time in the season), soil type, etc., as previously found. Therefore, various indices will serve as the input for the SNAF and not as the final model.
This begs the question of how the SNAF will know which index or combination of indices should be used to estimate the NDVI for a specific field, crop, soil type, and time of the season. To answer this question, a machine learning model will find the best mix of indices based on field-specific past SAR and NDVI time series and will decide, for each field and at each point in time, which is the best combination of SAR indices in estimating NDVI, thus outputting the optimal NDVI estimation.
To prove that the SNAF is robust enough to work globally, it must be tested globally. To that end, 548 commercial fields from 18 countries including 28 crop types will be used here as the case study. It is important to note that the goal is not to replace the NDVI or optical data but rather to fill eventual gaps in the optical data time series, particularly during cloudy periods, to ensure a constant flow of NDVI values for various agricultural applications and decision-making processes.

Materials and Methods
The SNAF concept (which will be further explained in Section 2.5) is to use a machine learning model to learn the hyperlocal relationship between the time series of multiple SAR indices and the NDVI for a specific field and point in time for any available SAR image. Then, when a new SAR image exists (and only in the absence of the NDVI), it is to output an NDVI estimation based on the learned relationship. Consequently, the SNAF can be illustrated as a dynamic, "breathing" system that is always up to date, because each time a new SAR image is available, a new model is built based on the most recent relationship, which can be similar to or different from what it was before, as opposed to a static model such as an index or a predefined formula.

Study Sites
To test the SNAF, 548 commercial plots with 28 different crops from 18 countries were selected. Figure 1 and Table 1 illustrate the assortment of the fields used here, providing the distribution of the countries, field areas, soil types, irrigation systems, and crop types. The (not publicly available) source of these fields is the Manna Irrigation platform (Israel, Gvat, https://manna-irrigation.com, accessed on 25 March 2022), which is the developer of a sensorfree, software-only irrigation solution that delivers plot-specific irrigation recommendations. The soil type was determined according to the U.S. Department of Agriculture Textural Classification triangle method (see, for example, https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/ survey/?cid=nrcs142p2_054167, accessed on 26 August 2021). time a new SAR image is available, a new model is built based on the most recent relationship, which can be similar to or different from what it was before, as opposed to a static model such as an index or a predefined formula.

Study Sites
To test the SNAF, 548 commercial plots with 28 different crops from 18 countries were selected. Figure 1 and Table 1 illustrate the assortment of the fields used here, providing the distribution of the countries, field areas, soil types, irrigation systems, and crop types. The (not publicly available) source of these fields is the Manna Irrigation platform (Israel, Gvat, https://manna-irrigation.com, accessed on 25 March 2022), which is the developer of a sensor-free, software-only irrigation solution that delivers plot-specific irrigation recommendations. The soil type was determined according to the U.S. Department of Agriculture Textural Classification triangle method (see, for example, https://www.nrcs.usda.gov/wps/portal/nrcs/detail/soils/survey/?cid=nrcs142p2_054167, accessed on 26 August 2021).   Sugarcane Tall field crops 20 16 Sunflower Tall field crops 20 17 Corn Grains Tall field crops 20 18 Corn Seed Production Tall field crops 20 19 Cotton Tall field crops 20 20 Sweet Pepper Tall field crops 20 21 Corn Silage Tall field crops 19 22 Processing Tomatoes Short field crops 20 23 Potatoes Short field crops 20 24 Fresh Tomatoes Short field crops 20 25 Watermelon Short field crops 20 26 Ground Nuts Short field crops 20 27 Alfalfa Short field crops 20 28 Dry Onion Short field crops 20

NDVI Dataset
To obtain an NDVI time series per field, Google Earth Engine (GEE) [38] Python API was used. Using this tool, two-year (2020 and 2021) time series remote sensing imagery sets from Sentinel-2 level-2A (ground sampling distance-GSD 10 m) and Landsat-8 level-2 (GSD 30 m) were obtained for each field. These remote sensing imagery sets are already processed to the bottom of the atmosphere reflectance. Images with clouds, haze, cirrus, cloud shadows, snow, or ice, according to the relevant QA bands (SCL for Sentinel-2, and pixel_qa for Landsat-8), were removed from further analysis. Then, the NDVI was calculated, and all NDVI values (i.e., per-pixel NDVI) were averaged, for each image of the 548 fields, using: where NIR and RED are the surface reflectance near-infrared and red spectral bands of Sentinel-2 (bands 8 and 4, respectively) and Landsat-8 (bands 5 and 4, respectively). The harmonization process between the Sentinel-2 and Landsat-8 NDVIs will be explained in Section 2.4.

SAR Dataset
The SAR (i.e., Sentinel-1) dataset was also obtained using GEE. The images used were all in the form of Interferometric Wide Swath Mode (IW) with dual polarization (VV + VH) and were acquired under level-1 processing as ground range detected (GRD). This IW mode is the main acquisition mode over land. Level-1 GRD products consist of focused SAR data projected to ground range using the Earth ellipsoid model WGS84 and have a GSD of 10 m. GEE preprocessed each scene with the Sentinel-1 Toolbox (https://sentinel.esa.int/web/sentinel/toolboxes/sentinel-1, accessed on 15 February 2022) using the following steps:
Terrain correction using SRTM 30 or ASTER DEM for areas of a latitude greater than 60 degrees, where the SRTM is not available.

Selecting SAR Indices for the SNAF
As mentioned, the SNAF utilizes several SAR indices, through a process that will be explained in the next section, to estimate the average NDVI of a field when the optical NDVI is not available. To choose which SAR indices will be used in the SNAF, seventeen Sentinel-1 indices were initially selected ( Table 2). The correlation between the indices was calculated ( Figure 2) based on 169,192 values of each index. According to Figure 2, some indices are highly correlated with others, suggesting redundancy. Based on this analysis, six indices with low collinearity were selected to be utilized in the SNAF method ( Figure 3).

Estimating the NDVI from SAR Using the SNAF Method
The following steps describe the SNAF method and were executed for each date with a SAR image for each field. The process begins when a SAR image (SARlast_date) is available and an NDVI is not.
1. The most recent NDVI date (NDVIlast_date) is obtained. 2. The SNAF searches for all available NDVI and SAR data 365 days prior to the NDVIlast_date. Only these data are considered for further analysis. 3. The SNAF generates a time series of the average NDVI value of the field from Sentinel-2 (NDVISN2) and Landsat-8 (NDVILS8). 4. To harmonize between NDVISN2 and NDVILS8, their corresponding NDVI values are smoothed using a locally weighted regression (LWR) algorithm [42] ( Figure 4). The LWR approximates the regression parameters for each point separately by iterating over them using the entire set of points, where a weight is assigned to each point as a function of its distance from the current point. LWR starts by defining a weight function: where X is a vector containing scalars (x1, x2, …, xn) representing the dates of the images as the difference in days from the first image date (i.e., from x1). For example, if the dates of the first two images in X are 20 April 2021 and 27 April 2021, respectively, then x1 = 0 and x2 = 7. xi is the value of X at point i (corresponding to the ith iteration), and k essentially

Estimating the NDVI from SAR Using the SNAF Method
The following steps describe the SNAF method and were executed for each date with a SAR image for each field. The process begins when a SAR image (SAR last_date ) is available and an NDVI is not.

2.
The SNAF searches for all available NDVI and SAR data 365 days prior to the NDVI last_date . Only these data are considered for further analysis. 3.
The SNAF generates a time series of the average NDVI value of the field from Sentinel-2 (NDVI SN2 ) and Landsat-8 (NDVI LS8 ).

4.
To harmonize between NDVI SN2 and NDVI LS8 , their corresponding NDVI values are smoothed using a locally weighted regression (LWR) algorithm [42] ( Figure 4). The LWR approximates the regression parameters for each point separately by iterating over them using the entire set of points, where a weight is assigned to each point as a function of its distance from the current point. LWR starts by defining a weight function: where X is a vector containing scalars (x 1 , x 2 , . . . , x n ) representing the dates of the images as the difference in days from the first image date (i.e., from x 1 ). For example, if the dates of the first two images in X are 20 April 2021 and 27 April 2021, respectively, then x 1 = 0 and x 2 = 7. x i is the value of X at point i (corresponding to the ith iteration), and k essentially determines how smooth the curve will be, where a higher k corresponds to a smoother curve. The value for k was set to 21 for the SAR-derived time series and to 8 for the other time series. These values were chosen based on trial and error, where the guideline was to smooth and create seamless time series but not to the extent that the smoothed curve still represents the general pattern of the data. Equation (2) results in a weight matrix W (for the ith iteration) where the weights decrease with distance, i.e., the number of days. Using W i , we can find the model parameters (for the ith iteration) as follows: where β i denotes the model parameters (for the ith iteration), and y denotes the index values (here, it is the NDVI). Then, to obtain the smoothed values, we multiply the parameters with the x i :ŷ Consequently, a sensor agnostic seamless NDVI time series is achieved (NDVI harmonized ). This NDVI harmonized is later used as a reference for the model accuracy metric calculations.

5.
After the LWR, a daily interpolation is applied to the NDVI harmonized ( Figure 4) with the assumption that changes in crop growth are gradual during short periods [43]. This was done in order to achieve daily NDVI values, thus increasing the volume of the data for the machine learning model. 6.
Five SAR time series indices (SAR 5TS ) ( Figure 3, excluding sar_median) are calculated using the VV and VH bands of Sentinel-1. They are based on the SAR images from the last 365 days prior to the NDVI last_date. 7.
Steps #4 and #5 are applied to each of the SAR 5TS . By doing that, a higher alignment between the SAR and the NDVI time series in terms of the number of values is reached, which enables more data for the model training (step #9). 8.
The median of the five SAR indices (from step #6) is calculated, resulting in a total of six SAR time series indices (SAR 6TS ) 9.
The random forest (RF) model [44] (with default settings) from the Python Scikit-Learn package [45] was utilized for the model training. The RF is a supervised learning algorithm that fits a number of decision trees on various sub-samples of the dataset and uses averaging to improve the predictive accuracy and control over-fitting.   Figure 5 summarizes these steps in a flowchart. As mentioned, these steps were executed for each of the 548 fields for each available SAR image through 2021, thus creating real-world real-time scenarios when a SAR image is available but an optical NDVI image (i.e., one calculated from Sentinel-2 and Landsat-8) is not. It is important to mention that  Figure 5 summarizes these steps in a flowchart. As mentioned, these steps were executed for each of the 548 fields for each available SAR image through 2021, thus creating real-world real-time scenarios when a SAR image is available but an optical NDVI image (i.e., one calculated from Sentinel-2 and Landsat-8) is not. It is important to mention that an agricultural growing season is less than one year long, meaning that the NDVI estimations by the SNAF were also applied for dates before and after the growing season months of 2021. an agricultural growing season is less than one year long, meaning that the NDVI estimations by the SNAF were also applied for dates before and after the growing season months of 2021. During the SNAF testing (i.e., while executing steps #1-15), some dates had both an SAR and NDVI. In these cases, the NDVI of that date was ignored in analysis, and the preceding NDVI date was considered as the NDVIlast_date (Step #1). This was done because the goal of this method is not to replace NDVI but rather to provide a solution when the NDVI is not available. In other words, when a clear optical image is available (i.e., an NDVI is available), there is no need for any SAR data or model. Moreover, from a statistical point of view, estimating the NDVI from SAR on a date when both are available will give optimistic results that will not reflect the robustness of the method.
In addition, the RF variable (i.e., SAR indices) importance (also termed the feature importance) for every SNAF scenario was recorded. This was done by utilizing the Scikit-Learn Permutation feature importance model (with its default settings). This model measures the importance of each variable (i.e., SAR index) by calculating the increase in the model's prediction error after the variable values are randomly permuted. A variable is important if permuting its values increases the model error, because, in this case, the model relied on the feature for the prediction, and vice versa.   Figure 5. Flowchart of the SNAF method, with the number of the corresponding steps in parentheses.
During the SNAF testing (i.e., while executing steps #1-15), some dates had both an SAR and NDVI. In these cases, the NDVI of that date was ignored in analysis, and the preceding NDVI date was considered as the NDVI last_date (Step #1). This was done because the goal of this method is not to replace NDVI but rather to provide a solution when the NDVI is not available. In other words, when a clear optical image is available (i.e., an NDVI is available), there is no need for any SAR data or model. Moreover, from a statistical point of view, estimating the NDVI from SAR on a date when both are available will give optimistic results that will not reflect the robustness of the method.
In addition, the RF variable (i.e., SAR indices) importance (also termed the feature importance) for every SNAF scenario was recorded. This was done by utilizing the Scikit-Learn Permutation feature importance model (with its default settings). This model measures the importance of each variable (i.e., SAR index) by calculating the increase in the model's prediction error after the variable values are randomly permuted. A variable is important if permuting its values increases the model error, because, in this case, the model relied on the feature for the prediction, and vice versa.

Accuracy Metrics
The accuracy metrics were calculated considering only the dates with both the optical NDVI and SAR, resulting in 6880 pairs of images. The measured NDVI (the ground truth for that matter) was the harmonized NDVI of the entire dataset (per field), and the estimated NDVI was the NDVI SNAF . This harmonized NDVI of the entire dataset should not be confused with the NDVI harmoznied (mentioned in the steps in the previous section), which was part of the model training and was harmonized only on part of the data, i.e., 365 days from the NDVI last_date for each iteration.
During the SNAF testing procedure, the optical NDVI values were ignored on the dates with both an SAR and optical NDVI. In other words, these optical NDVI values were not part of any model training and did not affect the SNAF method's estimation of the NDVI.
Three accuracy metrics were chosen for the evaluation of the SNAF performance, namely, the bias, the root-mean-squared-error (RMSE), and the coefficient of determination (R 2 ).
where S i and M i are the estimated (NDVI SNAF ) and the measured (harmonized NDVI of the entire dataset, per field) value of the ith observation, respectively, M is the average of M, and n is the number of observations. The accuracy metrics considered only the dates with both an NDVI and SAR (no interpolated observations were included); however, most SAR images did not have a matching NDVI date and thus are not expressed in these metrics. Because it is important to observe how the NDVI SNAF (as a time series) aligns with the harmonized NDVI of the entire dataset, per field (for all dates), a visual inspection was conducted by plotting both time series and observing the agreement.

SNAF Performance for All Fields
A total of 6880 dates had both SAR and NDVI images. Figure 6 shows the NDVI SNAF vs. NDVI harmonized results for all fields. The overall performance of the SNAF method is high, with an RMSE of 0.06, an R 2 of 0.92, and a bias of 0.0, and the linear line between the NDVI SNAF and NDVI harmonized is very close to the 1:1 line (the grey diagonal line in Figure 6), as expressed by the slope and intercept values.    Table 3 presents how many fields had an absolute error greater than 0.1 and how many times this was the case. Most of the fields (76%) never had an absolute error > 0.1 or only had one once, while only 8.6% had this error four or more times.     Table 3 presents how many fields had an absolute error greater than 0.1 and how many times this was the case. Most of the fields (76%) never had an absolute error > 0.1 or only had one once, while only 8.6% had this error four or more times.  Table 3 presents how many fields had an absolute error greater than 0.1 and how many times this was the case. Most of the fields (76%) never had an absolute error > 0.1 or only had one once, while only 8.6% had this error four or more times.  Figure 8 exhibits the importance of each SAR index in a boxplot. For each SAR date (for each field), the SNAF estimated the NDVI, and each SAR index had an importance value ranging between 0 and 1 for that estimation. The closer the value was to 1, the more important this index was for that specific NDVI estimation. For each NDVI estimation, the importance sum of all the indices was equal to 1. It can be seen that each of the 17 indices was important in a specific field at a specific point in time (Table 4 shows an example), meaning they were all useful. Generally, the VV_median, PRVI, VH_minus_VV, and RVI4S1 were more important than the others.   Figure 8 exhibits the importance of each SAR index in a boxplot. For each SAR date (for each field), the SNAF estimated the NDVI, and each SAR index had an importance value ranging between 0 and 1 for that estimation. The closer the value was to 1, the more important this index was for that specific NDVI estimation. For each NDVI estimation, the importance sum of all the indices was equal to 1. It can be seen that each of the 17 indices was important in a specific field at a specific point in time (Table 4 shows an example), meaning they were all useful. Generally, the VV_median, PRVI, VH_minus_VV, and RVI4S1 were more important than the others.    Table 5 summarizes the SNAF performance per crop. The RMSE ranges from 0.02 to 0.1, which is considered a low and reasonable NDVI error. The bias for all crops is 0.0, and most of the crops have an R 2 higher than 0.9. As expected, the crops with the lowest errors are orchards, either evergreen or deciduous. This is because the NDVI for these crops does not change dramatically during the year, as opposed to that for field crops. The crop with the highest errors is alfalfa, as expected, as it is a short-cycle crop. Because of its short cycles (~28 days per cycle), the NDVIs from Sentinel-2 and Landsat-8 are probably not frequent enough to capture the rapid changes in the NDVI, thus hampering the accuracy.

SNAF Performance as a Time Series
The results so far focused on comparing the NDVI SNAF on the same date with the available NDVI harmonized , which demonstrated the robustness of the SNAF method. However, during this study, the SNAF method produced many more NDVI SNAF values (~35,000) that have not been included in the evaluation so far simply because there was not a matching NDVI harmonized date. These other NDVI SNAF values are important to visualize how the NDVI SNAF aligns with the NDVI harmonized to create a seamless NDVI time series for the fields. To that end, eight fields representing cases of many or few NDVI values per crop group are presented in Figure 9.
available NDVIharmonized, which demonstrated the robustness of the SNAF method. However, during this study, the SNAF method produced many more NDVISNAF values (~35,000) that have not been included in the evaluation so far simply because there was not a matching NDVIharmonized date. These other NDVISNAF values are important to visualize how the NDVISNAF aligns with the NDVIharmonized to create a seamless NDVI time series for the fields. To that end, eight fields representing cases of many or few NDVI values per crop group are presented in Figure 9. Figure 9. Eight examples visualizing the SNAF performance to estimate the NDVI across an entire year. Figure 9. Eight examples visualizing the SNAF performance to estimate the NDVI across an entire year. Figure 9 is a result of multiple SNAF runs (a run per SAR date). Each point in Figure 9 was generated based only on the past NDVI and SAR times series and did not consider any future data that are not available in real time.
The NDVI SNAF shows a good agreement with the NDVI from optical sensors (NDVI harmonized ), leading to a seamless NDVI time series. The SNAF was able to generate accurate SAR-estimated NDVI values even for fields with a relatively low number of optically based NDVI values. For example, the SNAF was able to capture the development phase of the cotton in India (15 July 2021 to 15 September 2021) and the potatoes in Australia (the end of March to May 2021) when the NDVI was not available for a long period (>1 month).

Discussion
The need to have a frequent and consistent provision of NDVI values increases with time as more and more companies, institutions, and end-users utilize Earth Observation (EO) datasets for their applications, research, workflows, and decision making. Commercial imaging companies such as Planet Labs (San Francisco, CA, United States) alleviate this need by generating massive and frequent EO image datasets. Nevertheless, some regions are susceptible to clouds, which is the main factor hampering the frequent and consistent provision of NDVI values. Moreover, regions that are not usually cloudy can still have short periods with a high cloud cover at crucial times-for example, when field crops emerge. To ensure frequent NDVI values for these cases, having more frequent optical images will not help, but utilizing SAR, which penetrates through clouds, will.
A method to estimate the NDVI from SAR should work globally (as does the NDVI), regardless of the field's local conditions (e.g., crop and soil type). Achieving this is a great challenge because the SAR and the NDVI are sensitive to different vegetation properties. Consequently, the context (i.e., the local conditions) is an important factor that needs to be addressed when trying to overcome this challenge.
There are three approaches to addressing local conditions in this context. The first approach is to completely ignore the local conditions and assume similarity across all fields. This is essentially the case when using an index based on SAR polarimetric channels or a model with fixed coefficients, thus leading to unsatisfying NDVI results when applying it to different fields, reflecting limited applicability. The second is to obtain the local field conditions and possible auxiliary data, as suggested by [32], and incorporate them into the model. However, incorporating the local conditions will increase the model's complexity and reduce its applicability, mainly because it is not possible to obtain the local field conditions for every field at every location, and these data are not always accurate or reliable. The third approach, which was adopted in this study, is to develop field-and time-specific models, thus making the task of incorporating the local conditions (e.g., soil and irrigation types, field topography, etc.) redundant because they hardly ever change with respect to one specific field.
It is true that, for field crops, crop rotation is a relatively common practice, and this was probably the case in some of the fields used here. The exact information regarding which crop was cultivated in 2020 was not available. Consequently, for some fields, the SNAF was trained on the NDVI-SAR data for the crop that was cultivated in 2020 and made the NDVI estimation on a different crop in 2021. Nevertheless, the SNAF accuracy indicates that the crop identity is less relevant.
The SNAF method suggested in this study generates multiple field-and time-specific models by training a new machine-learning model on the fly, per field, whenever a new SAR image is available. The outcome is a new estimated NDVI for a specific field for a specific time. Once a model generates this value, the model is disposable, meaning it will not be used again for NDVI estimations. In other words, once a new SAR image is available, the SNAF trains a new model and generates a new NDVI estimation. By doing that, the SNAF keeps the process simple and up to date and holds the local field conditions constant, eliminating the need to incorporate them into the model and enabling its global applicability, as it is not limited to a specific crop or to local conditions.
The SNAF method was tested on 548 commercial fields in 18 countries with 28 different crops through 2021, and the high performance is outlined in this study. The machine learning model used by the SNAF is the random forest, which was also found to be more useful than other models used in previous studies [29,36,46,47].
The overall RMSE of the SNAF method, based on 6880 paired SAR-NDVI images, was 0.06, which is better than the 0.08-0.11 value achieved by [36] (for rice, cotton, turmeric, and banana in India) and the 0.07 value achieved by [29] (for soybean and maize in Brazil), which did not use a large dataset for testing. Specifically, Ref. [36] reached an RMSE of 0.09 and an R 2 of 0.76 for cotton, whereas the SNAF method, for cotton, had an RMSE of 0.05 and an R 2 of 0.95 (Table 5). Ref. [37] used a self-developed SAR index and reported an R 2 of 0.73 (different land uses in India) between the model estimation and the NDVI.
The potential of using SAR data to estimate the NDVI, as reported in previous studies, was manifested and validated by the SNAF method. The high accuracy metrics across the assortments of crops and countries show the method's robustness in generating high-quality NDVI values globally.
Notwithstanding, SAR information cannot fully replace optical sensors in retrieving the NDVI; rather, it can only complement existing optical sensors. This is also specifically true for the SNAF method, as it depends on past NDVI time series for on-the-fly model training.
The SNAF method demonstrated here used the NDVI from two sources, namely, Sentinel-2 and Landsat-8, but more NDVI sources can be incorporated, which will probably lead to a higher accuracy. The source for the SAR data was the Sentinel-1 mission, which comprises a constellation of two polar-orbiting satellites (Sentinel-1A and Sentinel-1B) performing C-band imaging with VV and VH spectral bands. Incorporating other SAR sources is more complicated than incorporating optical sources, since they will have to have similar bands and similar preprocessing steps.
The SNAF was tested on dates in 2021 when Sentinel-1A and Sentinel-1B were both functioning as normal. However, since 23 December 2021, no data are being generated by Sentinel-1B due to power unit malfunction. The assumption is that this problem will continue for several months [48]. This means that Sentinel-1 data will be decreased by half for most of 2022. The implication for the SNAF is that less SAR data will be available in 2022 for model training. This decrease in data availability will not increase the SNAF accuracy, but it might not cause a decrease either (or at least a major decrease). Perhaps the available data for 2022 will be sufficient to generate good field-and time-specific models. Further investigation into this needs to be performed in the future.
Nevertheless, Sentinel-1 should not be considered as having permanently low availability, as ESA is considering moving up the already-planned Sentinel-1C and Sentinel-1D satellite launch, which will potentially triple the data availability and thus likely increase the SNAF accuracy.
The limitation of the SNAF method (and any other method based on SAR data) is that the SAR backscatter can differ and create artifacts when sudden (in terms of days) geometric or dielectric changes-as well as changes in the number of scatterers-are introduced. This might happen due to rain events or strong winds that can bend the crops (i.e., change the crop geometry), the rapid growth of weeds, litter due to pruning or trimming, or perhaps even irrigation events. In such cases, the SAR backscatter can be misleading, resulting in inaccurate NDVI estimations.
Future work should try to identify cases of strong wind or rain events via SAR backscatter and remove these values from the analysis. Further, more crop types should be tested with the SNAF method-predominantly rice, as it is a major grain crop that is usually irrigated by flood irrigation and can be challenging for an SAR-based method. In addition, more NDVI and perhaps even more SAR sources ought to be incorporated to account for cases where the current sources are not able to produce quality data.

Conclusions
This study introduced the SNAF (Sentinel-1 to NDVI for Agricultural Fields) method. The goal of the SNAF method is to estimate the NDVI from Sentinel-1 data for any field and crop anywhere and anytime. To do that, the SNAF calculates six SAR time series indices and uses them as independent variables in the random forest model, where the dependent variable is the NDVI time series. The SNAF is deployed every time a new SAR image is available because the relationship between SAR and the NDVI is not consistent, thus enabling a field-and time-specific NDVI estimation, which is flexible and up to date. The SNAF is based solely on the SAR and NDVI time series per field and per point in time, therefore reducing the complexity and eliminating the need to incorporate additional auxiliary and local field data that are not always obtainable or accurate.
The SNAF method was tested on a large dataset-something that is not found in previous studies-comprised of 548 commercial fields with 28 crops across 18 countries throughout 2021. The results show the model's high performance, with RMSE, bias, and R 2 values of 0.06, 0.00, and 0.92 respectively, expressing the model's robustness and global applicability.