Retrieving Crop Albedo Based on Radar Sentinel-1 and Random Forest Approach

: Monitoring agricultural crops is of paramount importance for preserving water resources and increasing water efﬁciency over semi-arid areas. This can be achieved by modelling the water resources all along the growing season through the coupled water–surface energy balance. Surface albedo is a key land surface variable to constrain the surface radiation budget and hence the coupled water–surface energy balance. In order to capture the hydric status changes over the growing season, optical remote sensing becomes impractical due to cloud cover in some periods, especially over irrigated winter crops in semi-arid regions. To ﬁll the gap, this paper aims to generate cloudless surface albedo product from Sentinel-1 data that offers a source of high spatio-temporal resolution images. This can help to better capture the vegetation development along the growth season through the surface radiation budget. Random Forest (RF) algorithm was implemented using Sentinel-1 backscatters as input. The approach was tested over an irrigated semi-arid zone in Morocco, which is known by its heterogeneity in term of soil conditions and crop types. The obtained results are evaluated against Landsat-derived albedo with quasi-concurrent Landsat/Sentinel-1 overpasses (up to one day offset), while a further validation was investigated using in situ ﬁeld scale albedo data. The best model-hyperparameters selection was dependent on two validation approaches (K-fold cross-validation ‘k = 10’, and holdout). The more robust and accurate model parameters are those that represent the best statistical metrics (root mean square error ‘RMSE’, bias and correlation coefﬁcient ‘R’). Coefﬁcient values ranging from 0.70 to 0.79 and a RMSE value between 0.0002 and 0.00048 were obtained comparing Landsat and predicted albedo by RF method. The relative error ratio equals 4.5, which is acceptable to predict surface albedo. and the results were quite accurate. The methodology is based on the correlation of VV and VH polarization backscatter coefﬁcients with the albedo of agricultural crops. The method described in this research demonstrates the utility of radar Sentinel-1 data for continuous agriculture monitoring in areas with frequent cloud cover. The random forest approach was applied over an irrigated perimeter in central Morocco and evaluated against Landsat albedo estimates. The Landsat-derived albedo was also compared to in situ albedo in two experimental ﬁelds sited in the 2015–2016 crop growing season. The comparison results based on statistical measures show that the random forest approach achieves very good albedo modeling performance. For the in situ evaluation, the determination coefﬁcient (R 2 ) between Landsat-derived and in situ albedo achieved 0.81 and 0.92 for drip- and ﬂood-irrigated wheat, respectively. The founded results illustrate that the RF algorithm presented a very good performance for different growing stage periods. For the spatial evaluation, a root mean square error (RMSE) between 0.00021 and 0.00048 was achieved between predicted and Landsat-derived albedo with a R value that lies between 0.7 and 0.79 (R 2 between 0.5 and 0.62). Those results are very encouraging as they open the path for monitoring of agricultural crops continuously every 3–6 days during any weather situation, taking the advantage of the high frequency of Sentinel-1 images. The study was done over an irrigated perimeter to retrieve crop albedo. Future research should be carried out over an extended area. This study opens a path for spa-tialization of surface albedo over various crop types in order to provide albedo products all over the world. Additionally, mountain areas are the most affected places by clouds; extending the application of random forest into those regions would be very interesting and of a beneﬁt to retrieve albedo in mountains. The estimate of surface albedo using radar data allows for improving the estimates of evapotranspiration (by an energy balance model, for example), instead of deriving the albedo from optical reﬂectance using empirical relationships. In this work, a random method to split data was used; nonrandom approaches such as (e.g., Kennard-stone, SELECT, principal component decomposition, etc.) could be used to improve our model by taking into account the maximum amount of spectral variability in the calibration dataset to develop a more robust and stable model.


Introduction
Surface albedo (α), described as the ratio of reflected to incoming (diffuse and direct fractions) solar irradiance at the surface, is one of the main variables affecting environmental, biophysical, and plant physiological processes (respiration, photosynthesis, etc.) as well as Earth's climate [1,2]. The vegetated land surfaces α is one of the crucial surface variables that play an important role included in the main processes that drive the surface energy balance exchanges and biomass growth, counting rain and light interception and evapotranspiration [3,4].
Surface albedo variation is related to many surface parameters and variables, including soil moisture, roughness, soil color, and the fraction of vegetation [5][6][7][8]. To help improve the ability to model the effects of land surface variation and the exchange between soil-vegetation-atmosphere, a better understanding of α and how it changes with variations in these parameters is crucial. Matthias et al. [5] and Potter et al. [9] stated that in a vegetated area, soil albedo affects soil temperature variation, which influences soil biophysical processes, like plant development, seed germination, and root growth. Over bare soil, α is influenced by soil surface optical properties, solar zenith angle, and soil surface hydric status (degree of dryness or wetness) [5,10,11]. However, Cresswell et al. [12] stated that the α variation is generally affected by soil moisture more than surface roughness, which impacts relatively less on soil albedo variation. In practice, α can be measured based on ground-based, aircraft, or satellite sensors depending on the application level [13][14][15][16][17]. However, these numerous measurements have shown that α shows large seasonal and spatial variations [18,19]. On a local scale, α can be measured directly from in situ observations by albedometers, or using two back-to-back identical pyranometers [16,20,21] or through a goniometric system providing many directional surface reflectance observations [22]. Despite the direct measurements of α, a limited scale can be observed and only a few surfaces can be characterized. Other limitations are related to economic and technological constraints as well as direct measurements are time consuming [18]. On a spatial scale, α derived from ground-based observations is not valid for other sites due to the spatial and temporal variations associated with surface properties of other external factors affecting α such as snow cover [23], altitude (relief) [24], soil moisture [21,25,26], and seasonal variation of vegetation characteristics. Therefore, accurate estimates of α over each field are known as becoming progressively significant in many respects. In this regard, remote sensing as a promising technic to retrieve α in different spatial scales proved its applicability and accuracy for a varied range of surface land cover types. Several previous studies have focused on estimating α from satellite observations based on radiative transfer model inversion and have been devoted to deriving regional α from satellite sensed information, like the Moderate Resolution Imaging Spectroradiometer (MODIS) [27][28][29][30][31], Advanced Very High Resolution Radiometer (AVHRR) [32], POLarization and Directionality of the Earth's Reflectances (POLDER) [33], Multi-angle Imaging Spectroradiometer (MISR) [30], Meteosat-EUMETSAT [34], and the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) [35]. Wang et al. [27] have proved that MODIS-based albedo and in situ observations are in accordance and agree well over different land surface types. Moreover, other works have also validated global albedo derived from satellites against ground-based measurements over several land covers [27,[36][37][38][39][40]. However, α products are available at a kilometric and sub-kilometric resolution (500m) every 16 and even 8 days' time scale, which makes their application over agriculture areas and individual small land areas limited [41]. In this context, high spatio-temporal resolution offered by many sensors, such as Sentinel 2, Landsat, and FORMOSAT-2 sensors, could fill the gap [42]. Bsaibes et al. [43] have used FORMOSAT-2 near infrared and red reflectances to retrieve solar albedo based on empirical approaches using continuous in situ observations over some different fields. In the same vein, other research used reflectance bands in green, red, and near infrared [42,44,45].
The above techniques are based on unidirectional surface reflectance, where they consider the surface as a Lambertian, therefore considering α equivalent to surface reflectance [46][47][48][49][50]. Other works were focused on driven α based on surface reflectance derived in diverse angular observations acquired from airborne data [51]. These works are based on the fact that surface reflectance is highly anisotropic, where a lot of numbers of angular measurements are necessary to derive α overall views and illumination angles, incorporating the bidirectional reflectance distribution function [31,[51][52][53][54][55]. However, retrieving surface albedo at a daily scale remains a challenging issue, due to its complexity and difficulty of implementation. In addition, optical data requires many cloudless images of the surface during the investigated time every day [56][57][58][59]. Promising tools like machine learning approaches have demonstrated their ability to retrieve many environmental variables and solve complex non-linear problems. Machine learning approaches were used for monitoring a normalized difference vegetation index using Sentinel-1 data [60], while other works used machine leaning approaches for spatial downscaling of land surface tempera-ture [61,62], mapping soil moisture [63,64], gap-filling of high-resolution soil moisture [65], evapotranspiration [66], and crop yield forecasting [67].
Even though many authors widely used the above-mentioned remote sensing algorithms and methodologies to retrieve surface albedo, cloud cover remains the main limitation and challenge to monitor environmental variables, where the optical remote sensing-based methods are promising in regions with infrequent cloud conditions. This issue affects the temporal resolution and makes monitoring α continuously unsuitable for an operational standpoint. To fill the gap, radar sensors give the opportunity to monitor surface variables in any conditions since they are not affected by clouds [68]. Thus, the aim of this work is to retrieve α independently of cloud cover (cloudless albedo product) for agricultural applications using cloud-free Sentinel-1 data. The idea is to model the α based on the Sentinel-1 backscatter coefficient with random forest regression technique. The presented research paper differs from the past studies, where the used approach here will allow for retrieving α independently from optical data. Nevertheless, retrieving α from radar and optical data such as Landsat could also be coupled, providing a continuous albedo product with a high frequency. In addition, this study uses radar images (Sentinel-1) to derive the crop albedo with high resolution although the presence of clouds.

Study Site
The study was investigated in a semi-arid zone in Morocco over the Tensift region. This area is located in North Africa especially in the center of Morocco ( Figure 1). It occupies about 24,000 km 2 , which corresponds to 4.4% of the Moroccan domain. This zone is known for a different topography (landform) with a wide altitude range. This area received an amount of 250 mm/year of rainfall against 1600 mm/year of reference evapotranspiration (ET0). This pushed farmers to an overexploitation of water resources through irrigation to overcome the crop water needs. The investigated area is known by different altitudes, from 575 m to 927 m ( Figure 1). The area is known for its heterogeneity in terms of land covers like wheat, olive, alfalfa, and horticulture, including an irrigated perimeter called R3 where 50% of this area is occupied by wheat crops. The selected area is considered as a well-known area and it has been extensively utilized as a typical agricultural area since 2002 [69][70][71][72][73][74][75][76][77][78][79][80]. Two experimental wheat sites were made in the R3 perimeter during the 2015/2016 agricultural season: a 4-ha flood-irrigated field (P1) and a 2-ha drip-irrigated field (P2). The two irrigated wheat sites were permanently monitored over the season (Figure 1).

In Situ Albedo Measurements
Both monitored sites were equipped with two eddy covariance (EC) systems providing energy flux measurements. Net radiometer Kipp and Zonen CNR4 sensors were installed in the tower, providing net radiation (Rn) measurements and incoming and outgoing shortwave components. The incoming and the outgoing short-wave components were used to calculate the daily ground-based albedo at the Landsat overpass satellite (~11 a.m.) due to the fact that Landsat data were used as reference data to validate the predicted albedo.

In Situ Albedo Measurements
Both monitored sites were equipped with two eddy covariance (EC) systems providing energy flux measurements. Net radiometer Kipp and Zonen CNR4 sensors were installed in the tower, providing net radiation (Rn) measurements and incoming and outgoing short-wave components. The incoming and the outgoing short-wave components were used to calculate the daily ground-based albedo at the Landsat overpass satellite (~11 a.m.) due to the fact that Landsat data were used as reference data to validate the predicted albedo. L-8 (L-7) images the land surface at eleven (eight) spectral bands in the optical and thermal infrared domains. The spatial resolution ranged from 30 m to 100 m with a revisit time of 16 days. By using both sensors, this allows acquiring data with a frequency of 8 days. Six L-7 and eight L-8 cloud-free level-1 images were available in the 2015-2016 wheat growing season (14 images in total), which were acquired at 11:30 a.m. Note that Landsat data were used as reference data to validate the random forest albedo predictions.

Landsat
In this study, for the optical data, the top-of-atmosphere (TOA) reflectance values were corrected for atmospheric effects, including environmental effects to get the surface reflectance. Then, corrected reflectances were projected into WGS84 UTM Zones 29 North. Reflectance at the red (RED) and near infrared (NIR) bands were used to compute the surface albedo (α) by weighting the RED and NIR reflectances with the coefficients given by Weiss et al. [42] and this was validated in Bsaibes et al. [43]. This equation was used and validated over several sites [81,82]. L-8 (L-7) images the land surface at eleven (eight) spectral bands in the optical and thermal infrared domains. The spatial resolution ranged from 30 m to 100 m with a revisit time of 16 days. By using both sensors, this allows acquiring data with a frequency of 8 days. Six L-7 and eight L-8 cloud-free level-1 images were available in the 2015-2016 wheat growing season (14 images in total), which were acquired at 11:30 a.m. Note that Landsat data were used as reference data to validate the random forest albedo predictions.
In this study, for the optical data, the top-of-atmosphere (TOA) reflectance values were corrected for atmospheric effects, including environmental effects to get the surface reflectance. Then, corrected reflectances were projected into WGS84 UTM Zones 29 North. Reflectance at the red (RED) and near infrared (NIR) bands were used to compute the surface albedo (α) by weighting the RED and NIR reflectances with the coefficients given by Weiss et al. [42] and this was validated in Bsaibes et al. [43]. This equation was used and validated over several sites [81,82].
To assess the reliability of α derived from L7/L8, the corresponding pixels of Landsat α were compared against in situ α for each site. The comparison is presented in Figure 2 for both monitored sites in the R3 irrigated perimeter for all of the year 2016 (January to December).
From Figure 2, a very good agreement is noticed between ground-based and satellitederived α with a determination coefficient (R 2 ) of 0.73 and 0.81 with a root mean square error (RMSE) equal to 0.021 and 0.023 for the flood-irrigated site (P1) using L-7 and L-8, respectively. For the drip-irrigated site, an R 2 reach 0.92 and 0.89 with an RMSE equal to 0.031 and 0.039 using L-7 and L-8, respectively. The shortage of spatial representativeness of in situ data at the L-7 and L-8 pixel scales explains the small errors and slight overestimation. In addition, the in situ measurements of α are performed over a limited and tiny portion of the target. The covered in situ area corresponds to a small area within Landsat pixel. Finally, each crop field might include a combination of wet and dry Landsat pixels. At the field scale, a linear average of all pixel values was calculated. To assess the reliability of α derived from L7/L8, the corresponding pixels of Land α were compared against in situ α for each site. The comparison is presented in Figur for both monitored sites in the R3 irrigated perimeter for all of the year 2016 (January December). From Figure 2, a very good agreement is noticed between ground-based and satell derived α with a determination coefficient (R 2 ) of 0.73 and 0.81 with a root mean squ error (RMSE) equal to 0.021 and 0.023 for the flood-irrigated site (P1) using L-7 and L respectively. For the drip-irrigated site, an R 2 reach 0.92 and 0.89 with an RMSE equa 0.031 and 0.039 using L-7 and L-8, respectively. The shortage of spatial representativen of in situ data at the L-7 and L-8 pixel scales explains the small errors and slight overe mation. In addition, the in situ measurements of α are performed over a limited and t portion of the target. The covered in situ area corresponds to a small area within Land pixel. Finally, each crop field might include a combination of wet and dry Landsat pix At the field scale, a linear average of all pixel values was calculated.

Sentinel-1
ESA Sentinel-1 (S-1) mission is composed of a constellation of two identical satelli the S-1A, which was launched in April 2014, and the S-1B, which was launched in A 2016, in the frame of the Copernicus program. S-1 data were acquired for free from th 1 Data Hub website (https://scihub.copernicus.eu/) (accessed on 7 August 2021). Both s sors orbit at an altitude of ~700 km in the same orbital plane with a phase shift of 180° 1 twins operate at a frequency of 5.33 GHz, which corresponds to wavelength of 5.6 This frequency lies down in C-band synthetic aperture radar. Four operational mo (Wave, Extra Wide Swath, Interferometric Wide Swath, and Strip Map) with different larizations are provided by S-1 sensors. The data were acquired with a high temporal olution that could go up to 3-6 days by using S-1A and S-1B. In the interferometric W Swath mode, S-1 provides data at cross-polarization VH and co-polarization VV m where V and H stand for vertical-horizontal, respectively. The native resolution of data is 10 m with an incidence angle around 40° for the ascending and descending ov passes over the investigated area.
The S-1 signal underwent several corrections to convert raw intensity to a backs tering coefficient using the Sentinel Application Platform (SNAP). The first step is to move thermal noise from the power-detected images by removing additive noise. A that, radiometric calibration was applied to obtain a backscattered signal called backscattering coefficient which expresses in general in decibel (dB). Then applying g metric corrections to correct for geometric distortions and deformations related to

Sentinel-1
ESA Sentinel-1 (S-1) mission is composed of a constellation of two identical satellites: the S-1A, which was launched in April 2014, and the S-1B, which was launched in April 2016, in the frame of the Copernicus program. S-1 data were acquired for free from the S-1 Data Hub website (https://scihub.copernicus.eu/) (accessed on 7 August 2021). Both sensors orbit at an altitude of~700 km in the same orbital plane with a phase shift of 180 • . S-1 twins operate at a frequency of 5.33 GHz, which corresponds to wavelength of 5.6 cm. This frequency lies down in C-band synthetic aperture radar. Four operational modes (Wave, Extra Wide Swath, Interferometric Wide Swath, and Strip Map) with different polarizations are provided by S-1 sensors. The data were acquired with a high temporal resolution that could go up to 3-6 days by using S-1A and S-1B. In the interferometric Wide Swath mode, S-1 provides data at cross-polarization VH and co-polarization VV mode where V and H stand for vertical-horizontal, respectively. The native resolution of S-1 data is 10 m with an incidence angle around 40 • for the ascending and descending overpasses over the investigated area.
The S-1 signal underwent several corrections to convert raw intensity to a backscattering coefficient using the Sentinel Application Platform (SNAP). The first step is to remove thermal noise from the power-detected images by removing additive noise. After that, radiometric calibration was applied to obtain a backscattered signal called the backscattering coefficient which expresses in general in decibel (dB). Then applying geometric corrections to correct for geometric distortions and deformations related to the satellite view mode, using a Digital Elevation Model (DEM-SRTM). Finally, a refined Lee speckle filter was adopted to reduce speckle effects in the images.

Method: Regression Algorithms and Approaches Used in Albedo Modeling
The random forest (RF) model is a machine learning model that has been widely utilized for solving classification and regression problems [83]. RF model is a non-linear regressor that functions by constructing an assembly of decision trees (multiple weak classifiers) at the training step and providing in output mean prediction (regression) of individual trees or classes for regression or classification problems, respectively [83]. The RF algorithm has many advantages in the regression tasks such as simplicity and flexibility where the affection is lower than other models by hyper-parameters. The decorrelated, adaptive, and randomized features make RF appropriate for non-linear relationships and complex algorithms.
Because the RF algorithm is non-parametric, more related input variables can be included. Further, it is not sensitive to outliers and performs well in the training stage when it comes to randomizing samples and variables. In this paper, the central idea of where the affection is lower than other models by hyper-parameters. The decorrelat adaptive, and randomized features make RF appropriate for non-linear relationships a complex algorithms. Because the RF algorithm is non-parametric, more related input variables can be cluded. Further, it is not sensitive to outliers and performs well in the training stage wh it comes to randomizing samples and variables. In this paper, the central idea of the model is to create a nonlinear function to estimate α based on regression trees as follow where Ӻ is a nonlinear function that relates input variables to output by a nonlin relationship.
and are the backscatter VV and VH polarization, respectively. D is the depolarization ratio which is calculated as / . DEM (m) is the digital elevat model. The hillshade and sky view (the percentage of the sky visible from each pixel) computed based on DEM. According to Ayoubi et al. [84], the obtained images were vided randomly into two group of pixels: training (75%) and validation (25%). The models were evaluated at 100 m resolution in this study using one agriculture seas (2015-2016) of available overlapping Landsat and Sentinel-1 data throughout the wh study area. In order to provide one image for each step of wheat growth, three dates w chosen based on the wheat growth stages including the initial, the development-mid-s son, the late season, and one day after harvest (by August). In the training procedu 45,750 points were used and a 10-fold cross-validation method was set up to adjust parameters in order to ensure the generalizability and robustness of the RF models. T pixels samples were divided into ten portions at random, with nine serving as the train dataset and the remaining serving as the testing dataset. In this work, the k value was as 10. The developed approach was applied over an irrigated perimeter R3.
A grid-search optimization method [63,85] was adopted to optimize the most re vant hyper-parameters in RF model. The most important parameters in RF that have be optimized in order to achieve better prediction results are the maximum characteri number that determines node splitting among different decision trees (max features), maximum depth of a decision tree (max depth), and the number of decision trees (m). T optimized max_features, max_depth, and m values were equal to 2, 500, and 100, resp tively. Figure 3 presents the flowchart of the investigated approach.
where algorithm has many advantages in the regression tasks such as simplicity and flexibility where the affection is lower than other models by hyper-parameters. The decorrelated, adaptive, and randomized features make RF appropriate for non-linear relationships and complex algorithms. Because the RF algorithm is non-parametric, more related input variables can be included. Further, it is not sensitive to outliers and performs well in the training stage when it comes to randomizing samples and variables. In this paper, the central idea of the RF model is to create a nonlinear function to estimate α based on regression trees as follows: where Ӻ is a nonlinear function that relates input variables to output by a nonlinear relationship.
and are the backscatter VV and VH polarization, respectively. DEP is the depolarization ratio which is calculated as / . DEM (m) is the digital elevation model. The hillshade and sky view (the percentage of the sky visible from each pixel) are computed based on DEM. According to Ayoubi et al. [84], the obtained images were divided randomly into two group of pixels: training (75%) and validation (25%). The RF models were evaluated at 100 m resolution in this study using one agriculture season (2015-2016) of available overlapping Landsat and Sentinel-1 data throughout the whole study area. In order to provide one image for each step of wheat growth, three dates were chosen based on the wheat growth stages including the initial, the development-mid-season, the late season, and one day after harvest (by August). In the training procedure, 45,750 points were used and a 10-fold cross-validation method was set up to adjust RF parameters in order to ensure the generalizability and robustness of the RF models. The pixels samples were divided into ten portions at random, with nine serving as the training dataset and the remaining serving as the testing dataset. In this work, the k value was set as 10. The developed approach was applied over an irrigated perimeter R3.
A grid-search optimization method [63,85] was adopted to optimize the most relevant hyper-parameters in RF model. The most important parameters in RF that have been optimized in order to achieve better prediction results are the maximum characteristic number that determines node splitting among different decision trees (max features), the maximum depth of a decision tree (max depth), and the number of decision trees (m). The optimized max_features, max_depth, and m values were equal to 2, 500, and 100, respectively. Figure 3 presents the flowchart of the investigated approach.
is a nonlinear function that relates input variables to output α by a nonlinear relationship. σ VV and σ V H are the backscatter VV and VH polarization, respectively. DEP is the depolarization ratio which is calculated as σ VV /σ V H . DEM (m) is the digital elevation model. The hillshade and sky view (the percentage of the sky visible from each pixel) are computed based on DEM. According to Ayoubi et al. [84], the obtained images were divided randomly into two group of pixels: training (75%) and validation (25%). The RF models were evaluated at 100 m resolution in this study using one agriculture season (2015-2016) of available overlapping Landsat and Sentinel-1 data throughout the whole study area. In order to provide one image for each step of wheat growth, three dates were chosen based on the wheat growth stages including the initial, the development-midseason, the late season, and one day after harvest (by August). In the training procedure, 45,750 points were used and a 10-fold cross-validation method was set up to adjust RF parameters in order to ensure the generalizability and robustness of the RF models. The pixels samples were divided into ten portions at random, with nine serving as the training dataset and the remaining serving as the testing dataset. In this work, the k value was set as 10. The developed approach was applied over an irrigated perimeter R3.
A grid-search optimization method [63,85] was adopted to optimize the most relevant hyper-parameters in RF model. The most important parameters in RF that have been optimized in order to achieve better prediction results are the maximum characteristic number that determines node splitting among different decision trees (max features), the maximum depth of a decision tree (max depth), and the number of decision trees (m). The optimized max_features, max_depth, and m values were equal to 2, 500, and 100, respectively. Figure 3 presents the flowchart of the investigated approach.  The approach to estimate α from radar data was compared against the α estimated from Landsat data using the correlation coefficient (R), bias (MBE), and root mean square error (RMSE). The calculation formulas for R, bias, and RMSE are as follows: The approach to estimate α from radar data was compared against the α estimated from Landsat data using the correlation coefficient (R), bias (MBE), and root mean square error (RMSE). The calculation formulas for R, bias, and RMSE are as follows: where E i is the estimated value by random forest algorithm; I i is the reference value of albedo (obtained from Landsat); I and E are the means of observed and reference albedo; n denotes the image's pixel number.

Results and Discussions
The albedo prediction results are not only influenced by the model parameters but also by the different combinations of input data, so it is crucial to further analyze the effect of each variable on the α prediction results. Each possible combination is composed of σ VV , σ V H , DEM, sky view, and hillshade to retrieve α. It is required to limit the number of input features to both decrease the computing cost of modeling and, in some situations, increase the model's performance. For this reason, a feature selection was performed to limit the number of input variables when developing the albedo predictive model. This statistical-based feature selection method uses statistics parameters to evaluate the relationship between surface albedo and input variables. This allows selecting the input variables that show a strong relationship with the output variable. Figure 4 presents a correlation matrix with the relationship between albedo and other input variables utilizing Pearson correlation using feature selection method. Figure 4 also shows the correlations of the independent variables with albedo, with the goal of displaying the quantitative links between these variables.
From the feature selection method, it is clear that σ VV , σ V H , and DEP are the most relevant variables affecting the α with a correlation coefficient of −0.66, −0.59, and 0.49, respectively. The fact that the correlation is higher between albedo and backscatter coefficients is due to the fact that the variability of α is due to the changeability of the land surface type (bare soil, green vegetation, and senescent vegetation); surface color which depends on soil moisture (surface soil color and crop); and soil proprieties. The only means to assess soil moisture variation in bare soil is to observe and interpret changes in soil color, with dry soil appearing brighter than wet soil. In the case of vegetated areas, changes in vegetation water content or color are reflected in albedo observations since the C-band radar signal is sensible to vegetation water content. The reason for the C-sensitivity band's response to variations in albedo can be explained in this way.
In order to visualize the correlation between albedo and other independent variables, Figure 5 shows a visualization of the images selected using feature selection to train the model. This set consists of the backscatter coefficients in the different polarizations of Sentinel-1 (σ VV , σ V H ) and the depolarization ratio σ VV / σ V H . By analyzing Figure 5, it seems visually that the relationship between the backscatter coefficients and albedo is remarkable. The albedo variability can be observed solely by looking at the dynamics of the backscatter coefficients' values, σ VV , σ V H , and σ VV / σ V H , and this is confirmed when looking to the correlation coefficient values reported in Figure 4. Pearson correlation using feature selection method. Figure 4 also shows the correlations of the independent variables with albedo, with the goal of displaying the quantitative links between these variables. From the feature selection method, it is clear that , , and DEP are the most relevant variables affecting the α with a correlation coefficient of −0.66, −0.59, and 0.49, respectively. The fact that the correlation is higher between albedo and backscatter coefficients is due to the fact that the variability of α is due to the changeability of the land surface type (bare soil, green vegetation, and senescent vegetation); surface color which depends on soil moisture (surface soil color and crop); and soil proprieties. The only means to assess soil moisture variation in bare soil is to observe and interpret changes in soil color, with dry soil appearing brighter than wet soil. In the case of vegetated areas, changes in vegetation water content or color are reflected in albedo observations since the C-band radar signal is sensible to vegetation water content. The reason for the C-sensitivity band's response to variations in albedo can be explained in this way.
In order to visualize the correlation between albedo and other independent variables, Figure 5 shows a visualization of the images selected using feature selection to train the model. This set consists of the backscatter coefficients in the different polarizations of Sentinel-1 ( , ) and the depolarization ratio / . By analyzing Figure 5, it seems visually that the relationship between the backscatter coefficients and albedo is remarkable. The albedo variability can be observed solely by looking at the dynamics of the backscatter coefficients' values, , , and / , and this is confirmed when looking to the correlation coefficient values reported in Figure 4.  Figure 7 presents albedo estimates captured all along the investigated period from RF-based Sentinel-1 retrievals. The radar-based albedo images are obtained across the season for four clear-sky selected dates over the irrigated perimeter R3. The k-fold crossvalidation results using the training set with a set number (k) equal to 10 were used to predict the surface albedo while the cross-validation is necessary to verify the modeling's resilience, which is evident in all of the RF approach's results due to their low variability, as seen in Figure 7.
From Figure 7, the spatial heterogeneity in soil properties (type, dry, or humid) and land cover over the investigated area generates strong variations in surface albedo at high resolution. Here, we test if the RF model can capture surface albedo variability over the complex area. The highest value of albedo (0.4) was obtained on the 30th of June, which corresponds to senescent vegetation albedo [86]. To better interpret the results and define the variable driven by the albedo variability over the irrigated perimeter R3 and along the season, Figure 8 presents the RF input variable importance to predict surface albedo for each date. From Figure 8, at the initial stage of wheat crop growth (1 June 2016), the spatial variation of surface albedo is driven by VV polarization. This is due to the small portion of the surface covered by the crops, where VV polarization is sensitive to soil moisture at the surface changes as reported by Amazirh et al. [78]. In the development stage (7 February 2016), some fields have achieved effective complete cover, while others have a low fraction of green vegetation, depending on the planting date and vegetation development. This variation in VH polarization has a direct impact on the variation in surface albedo, which is highly influenced by the geometrical arrangement and shape of the vegetation [78,80,[87][88][89][90]. In the vegetated area (especially at the center of the image), when the VH value is high, the surface albedo becomes smaller and vice versa. Note that the harvest date for winter wheat crop in the studied area is at the end of June. On 17 August 2016, the fields are in bare soil conditions, which are characterized by a low vegetation. Moreover, the surface albedo is driven by soil moisture variability, where VV polarization is sensible to soil moisture variability. When soil moisture is important on the surface, surface albedo decreases, which is similar with the findings of Cresswell et al. [12] and Idso et al. [10], who found a drop in surface albedo as surface soil moisture increased. Light scattering in the direction of the incident radiation and absorption of more incident photons are caused by a reduction in the relative refractive index of wet soil, resulting in a low albedo value [7,91]. This is due both to water content at the surface and surface roughness. Albedo is largely dependent on surface roughness over bare soil, especially in dry conditions, where surface roughness causes a drop in surface albedo, as illustrated in [92]. Using the 100 m Landsat albedo images as a reference, the retrieval results are visually and quantitatively analyzed in Figures 6 and 7. Results statistics are also reported and listed in Table 1, showing that the RF algorithm performs well for all of the dates and delivers accurate and stable results over the R3 agriculture site. The R value ranges from 0.70 to 0.79 for the validation step with a RMSE between 0.00021 and 0.00048. The highest R of 0.79 was obtained on 7 February 2016. . 2021, 13, x FOR PEER REVIEW gure 6. Comparison between predicted and Landsat estimates' surface albedo for training and validation datas  Figure 7 presents albedo estimates captured all along the investigated pe RF-based Sentinel-1 retrievals. The radar-based albedo images are obtained acro son for four clear-sky selected dates over the irrigated perimeter R3. The k-fold idation results using the training set with a set number (k) equal to 10 were used the surface albedo while the cross-validation is necessary to verify the modelin ence, which is evident in all of the RF approach's results due to their low vari seen in Figure 7. Using the 100 m Landsat albedo images as a reference, the retrieval results are visually and quantitatively analyzed in Figures 6 and 7. Results statistics are also reported and listed in Table 1, showing that the RF algorithm performs well for all of the dates and delivers accurate and stable results over the R3 agriculture site. The R value ranges from 0.70 to 0.79 for the validation step with a RMSE between 0.00021 and 0.00048. The highest R of 0.79 was obtained on 7 February 2016.
A further evaluation of the predicted α by RF technique was performed by comparing in situ and predicted albedo at the field level. In situ data were selected at Landsat overpass satellite over P1 and P2 fields (Figure 9). Since our objective is to asses wheat albedo, four dates were used for each field which correspond to the wheat phenological growth stages. From the plots, it can be observed that the RF technique allows the prediction of surface albedo correctly with very good accuracy and correlation. The RF method permits the retrieval of the temporal dynamics of α all over the season over the two fields with an R 2 of 0.97 and 0.98 over P2 and P1 sites, respectively. The reached RMSE for both sites is equal to 0.04 with a linear regression slope of 0.83 and 1.12 for P2 and P1 sites, respectively. By combing both sets of data from the two fields, a very good correlation was also obtained with a RMSE equal to 0.04 with an R 2 reaches 0.96. Remote Sens. 2021, 13, x FOR PEER REVIEW 11 of 17 Figure 7. Maps of the surface albedo by random forest algorithm compared to the Landsat reference albedo imagery on the four cloudless dates over R3 area. NDVI maps were also presented.
From Figure 7, the spatial heterogeneity in soil properties (type, dry, or humid) and land cover over the investigated area generates strong variations in surface albedo at high resolution. Here, we test if the RF model can capture surface albedo variability over the complex area. The highest value of albedo (0.4) was obtained on the 30th of June, which corresponds to senescent vegetation albedo [86]. To better interpret the results and define the variable driven by the albedo variability over the irrigated perimeter R3 and along the season, Figure 8 presents the RF input variable importance to predict surface albedo for each date. From Figure 8, at the initial stage of wheat crop growth (01/06/2016), the spatial variation of surface albedo is driven by VV polarization. This is due to the small portion of the surface covered by the crops, where VV polarization is sensitive to soil moisture at the surface changes as reported by Amazirh et al. [78]. In the development stage (7 February 2016), some fields have achieved effective complete cover, while others have a low fraction of green vegetation, depending on the planting date and vegetation development. This variation in VH polarization has a direct impact on the variation in surface albedo, which is highly influenced by the geometrical arrangement and shape of the vegetation [78,80,[87][88][89][90]. In the vegetated area (especially at the center of the image), when the VH value is high, the surface albedo becomes smaller and vice versa. Note that the harvest date for winter wheat crop in the studied area is at the end of June. On 17 August 2016, Figure 7. Maps of the surface albedo by random forest algorithm compared to the Landsat reference albedo imagery on the four cloudless dates over R3 area. NDVI maps were also presented.
ote Sens. 2021, 13, x FOR PEER REVIEW 12 o the fields are in bare soil conditions, which are characterized by a low vegetation. Mor ver, the surface albedo is driven by soil moisture variability, where VV polarizatio sensible to soil moisture variability. When soil moisture is important on the surface, s face albedo decreases, which is similar with the findings of Cresswell et al. [12] and I et al. [10], who found a drop in surface albedo as surface soil moisture increased. Li scattering in the direction of the incident radiation and absorption of more incident p tons are caused by a reduction in the relative refractive index of wet soil, resulting low albedo value [7,91]. This is due both to water content at the surface and surface rou ness. Albedo is largely dependent on surface roughness over bare soil, especially in conditions, where surface roughness causes a drop in surface albedo, as illustrated in [ Figure 8. Feature importance for albedo estimation using the random forest algorithm for each date.
A further evaluation of the predicted α by RF technique was performed by comp ing in situ and predicted albedo at the field level. In situ data were selected at Land overpass satellite over P1 and P2 fields (Figure 9). Since our objective is to asses wh albedo, four dates were used for each field which correspond to the wheat phenolog growth stages. From the plots, it can be observed that the RF technique allows the pre tion of surface albedo correctly with very good accuracy and correlation. The RF meth permits the retrieval of the temporal dynamics of α all over the season over the two fie   Figure 9. Comparison between predicted and in situ estimates of surface albedo for drip-(P flood-(P1) irrigated sites.

Conclusions
The investigated work confirmed that it is possible to accurately retrieve surfa bedo from radar C-band data, along with the regression approach. Sentinel-1 rada were used to simulate the surface albedo of agricultural crops during their growth and the results were quite accurate. The methodology is based on the correlation and VH polarization backscatter coefficients with the albedo of agricultural crops method described in this research demonstrates the utility of radar Sentinel-1 da continuous agriculture monitoring in areas with frequent cloud cover. The random approach was applied over an irrigated perimeter in central Morocco and eval against Landsat albedo estimates. The Landsat-derived albedo was also compared situ albedo in two experimental fields sited in the 2015-2016 crop growing season.
The comparison results based on statistical measures show that the random approach achieves very good albedo modeling performance. For the in situ evalu the determination coefficient (R 2 ) between Landsat-derived and in situ albedo ach 0.81 and 0.92 for drip-and flood-irrigated wheat, respectively. The founded results trate that the RF algorithm presented a very good performance for different growing periods. For the spatial evaluation, a root mean square error (RMSE) between 0.0002 The relative error ratio (RER, ratio between range and RMSE) was also calculated to show how the RF predictions are good independently of the albedo range and to test the applicability and practical utility of the RF model. In Figure 9, the RMSE equals 0.004 within a range of 0.18 (maximum value minus minimum value); therefore, RER is equal to 4.5 which shows an acceptable performance, especially in remote sensing applications.

Conclusions
The investigated work confirmed that it is possible to accurately retrieve surface albedo from radar C-band data, along with the regression approach. Sentinel-1 radar data were used to simulate the surface albedo of agricultural crops during their growth stage, and the results were quite accurate. The methodology is based on the correlation of VV and VH polarization backscatter coefficients with the albedo of agricultural crops. The method described in this research demonstrates the utility of radar Sentinel-1 data for continuous agriculture monitoring in areas with frequent cloud cover. The random forest approach was applied over an irrigated perimeter in central Morocco and evaluated against Landsat albedo estimates. The Landsat-derived albedo was also compared to in situ albedo in two experimental fields sited in the 2015-2016 crop growing season.
The comparison results based on statistical measures show that the random forest approach achieves very good albedo modeling performance. For the in situ evaluation, the determination coefficient (R 2 ) between Landsat-derived and in situ albedo achieved 0.81 and 0.92 for drip-and flood-irrigated wheat, respectively. The founded results illustrate that the RF algorithm presented a very good performance for different growing stage periods. For the spatial evaluation, a root mean square error (RMSE) between 0.00021 and 0.00048 was achieved between predicted and Landsat-derived albedo with a R value that lies between 0.7 and 0.79 (R 2 between 0.5 and 0.62). Those results are very encouraging as they open the path for monitoring of agricultural crops continuously every 3-6 days during any weather situation, taking the advantage of the high frequency of Sentinel-1 images.
The study was done over an irrigated perimeter to retrieve crop albedo. Future research should be carried out over an extended area. This study opens a path for spatialization of surface albedo over various crop types in order to provide albedo products all over the world. Additionally, mountain areas are the most affected places by clouds; extending the application of random forest into those regions would be very interesting and of a benefit to retrieve albedo in mountains. The estimate of surface albedo using radar data allows for improving the estimates of evapotranspiration (by an energy balance model, for example), instead of deriving the albedo from optical reflectance using empirical relationships. In this work, a random method to split data was used; nonrandom approaches such as (e.g., Kennard-stone, SELECT, principal component decomposition, etc.) could be used to improve our model by taking into account the maximum amount of spectral variability in the calibration dataset to develop a more robust and stable model.