Monitoring Agricultural Fields Using Sentinel-1 and Temperature Data in Peru: Case Study of Asparagus (Asparagus officinalis L.)

This paper presents the analysis and a methodology for monitoring asparagus crops from remote sensing observations in a tropical region, where the local climatological conditions allow farmers to grow two production cycles per year. We used the freely available dual-polarisation GRD data provided by the Sentinel-1 satellite, temperature from a ground station and ground truth from January to August of 2019 to perform the analysis. We showed how particularly the VH polarisation can be used for monitoring the canopy formation, density and the growth rate, revealing connections with temperature. We also present a multi-output machine learning regression algorithm trained on a rich spatio-temporal dataset in which each output estimates the number of asparagus stems that are present in each of the pre-defined crop phenological stages. We tested several scenarios that evaluated the importance of each input data source and feature, with results that showed that the methodology was able to retrieve the number of asparagus stems in each crop stage when using information about starting date and temperature as predictors with coefficients of determination (R2) between 0.84 and 0.86 and root mean squared error (RMSE) between 2.9 and 2.7. For the multitemporal SAR scenario, results showed a maximum R2 of 0.87 when using up to 5 images as input and an RMSE that maintains approximately the same values as the number of images increased. This suggests that for the conditions evaluated in this paper, the use of multitemporal SAR data only improved mildly the retrieval when the season start date and accumulated temperature are used to complement the backscatter.


Introduction
Due to the recent and future growth of freely available satellite remote sensing data, there is an opportunity to implement near real time agricultural monitoring systems to increase yield and crop management efficiency. This is based on informed decision making with information derived fully or partially from satellite sensors.
Such a system is particularly important in tropical regions which highly contribute to the global food production but for many crops with considerably lower yields per hectare compared with temperate regions [1]. It is also essential given the necessity in the tropics to preserve natural ecosystems by increasing yield in existing crop areas rather than transforming tropical forests to low yield croplands [1]. A distinctive operational characteristic in tropical and subtropical regions for several crop types is the uninterrupted production cycles, with cultivation of more than one cycle per year. Each of these production cycles or campaigns may be under slightly different meteorological conditions due to a "soft seasonality", e.g., mild winters, thus modifying to a certain extent the crop growth rate and structure (as will be shown later in this paper). This leads to the different responses captured by the satellites at each campaign. This paper considers the above-mentioned conditions for agricultural monitoring, particularly analysing the case of asparagus crops in Peru. Asparagus officinalis L. is a key crop for the country's agricultural exports, being the largest exporter in the world, the second largest producer after China [2] and an important source of job [3].
In this context, crop phenology has been used as a tool to measure crop status at any given time during the cultivation period and to measure the development rate relative to previous campaigns or relative to neighbour plots [4]. Accordingly, monitoring phenological evolution and accurately knowing crop status and development rate, the farmers can strategically plan the management.
Given the importance of monitoring phenology during the campaign without interruptions, synthetic aperture radar (SAR) emerges as a potential technology for this task. The capabilities to acquire images at day and night and under nearly all-weather conditions of SAR satellites offer significant opportunities for systematic monitoring regardless of cloud coverage [5]. On the other hand, to consider the impact of different climatological conditions on each campaign of a year, temperature records can be analysed to support the crop development monitoring.

Related Work
An initial step for crop monitoring from SAR data is to understand the time series evolution of the features derived from sensor observations. To this end, ground truth surveys are used as a reference and for validation, correlating measurements in the field with the SAR response.
In the case of Quad-polarimetric data, multi-temporal polarimetric SAR (PolSAR) analysis is used to characterise a crop signature in terms of evolution of scattering mechanisms along the season identifying key moments [6][7][8].
Recently, more attention is being given to dual polarimetric systems given the free access to these data (Sentinel-1). Research to understand the interaction of Sentinel-1 signal response to crop evolution has been presented for several crop types. The authors found sensitivity of the VH, VV backscatter and the ratio between the two polarisations with different crop biophysical parameters [9][10][11][12].
For the case of crop phenology retrieval, once the temporal evolution of the SAR indices has been analysed, an initial approach is to use the polarimetric features as inputs to a statistical or a machine learning model. These models are trained with SAR and ground truth from past campaigns [13,14].
Other authors have proposed the use of distance measures to compare the covariance matrix of a given SAR resolution cell inside a parcel with a set of previously characterised covariance matrices that are associated with a phenological stage [15]. The aim is to find the most similar predefined covariance matrix and assign the pixel under analysis, the crop stage with the most similar covariance matrix.
However, these approaches consider the phenology retrieval as a classification application, aiming at classifying the current parcel state as one of the previously defined pool of possible states (e.g., emergence, vegetative stage, maturation, etc.). This generates inconveniences selecting the appropriate boundaries for each stage in a process that may be subjective, often selecting ones (biasing) that the algorithm can actually identify. On the other hand, if standard phenological scales are used, such as the BBCH scale [16], the algorithm may not be able to disentangle every stage since the SAR response may not be sensitive to all these agronomic processes. Likewise, previous approaches ignore the fact that it is possible to have simultaneously more than a single stage in a parcel due to different plant growth rates and the fact that in the real evolution adjacent stages overlap (e.g., a parcel with some plants in flowering and some in fructification simultaneously). These approaches predict current phenology based on a single SAR image without considering the multi-temporal information. This leads for instance to stages at the beginning of the season being miss-classified by the model with final crop stages as they may have similar SAR response, as occurred in [13,17].
To overcome this, a hidden Markov model technique was proposed in [18,19] so that a prediction of the current stage is dependant on the previous stage, following a Markov property. However, the problem of subjectively selecting the boundaries for the crop stages still remains. Other authors have proposed a different approach whereby they consider the crop evolution as a time dependant dynamical process that follows a trajectory governed by the crop underlying dynamics [20][21][22]. The aim of these methodologies is to retrieve the crop state in that trajectory when a new SAR image is analysed. However, these last methods have only been proven using polarimetric SAR data, which provides a much richer amount of information to characterise a target compared to the freely available data from Sentinel-1. Studies associated with the potential of space borne radar remote sensing concerning asparagus fields have been presented in [23][24][25][26][27], although all of them focus on the crop type classification problem rather than in the analysis of individual crop stages as we present in this paper. In [23][24][25], C-band ENVISAR ASAR satellite data in VH and VV polarisations are used to identify land use of two agricultural regions, with asparagus being one of several crop types to identify. However, neither description of the crop stages nor the backscatter response over time is presented. In [26], time series of HH and VV polarisations of X-band data from TerraSAR-X satellite are reported. This study aims to evaluate the potential for classification of agricultural areas by analysing the crop signature of several crop types. Among them, 12 parcels of asparagus are studied and both the HH and the VV polarisations using X-band data are presented. The backscatter seems to increase during the period of vegetation growth, with a widespread distribution among the 12 parcels. This increase happens during the summer similar to several other crop types which was identified as an inconvenient to classify asparagus. This same effect was reported in [23,24], where asparagus response is particularly similar to sugar beet. In [27], among other 13 crop types, the multitemporal response of asparagus was evaluated to use the crop signature for agricultural fields classification purposes using Sentinel-1 data. The authors report an increase of the VH backscatter during the periods of vegetation growth. Interestingly, the authors also report a more constant backscatter response during the whole cultivation period using the VV polarisation. This is aligned to what is presented in Section 2.6.1 of this paper and in contrast to [26] although it has to be highlighted that in [26], X-band data was used. The authors, also report low accuracies for classifying asparagus due to similarities to summer crop types as reported in [23,24].
Please note that none of the reported work in asparagus focuses on monitoring growth development or stages classification. Also note that since the climatological conditions in Peru are different to [23][24][25][26][27], direct comparisons are not straight forward. For instance, in [27], authors report that the agricultural season length is more than a year, which is not the case for our test site. Similarly, the crop signature is inherently different since the senescence periods for crops located in temperate regions are not present in our test site.

Objectives of the Study
The main objectives of this paper are the following:

1.
To analyse the SAR response to the asparagus crop evolution.

2.
To present examples of how the seasonal climatological conditions influence the crop development in the test site (tropical conditions).

3.
To present the implementation of a data-driven methodology that captures the recurrent patterns in the SAR response and the temperature to provide an approximation of the crop development at every new SAR acquisition. It consists of a Multi-output machine learning regression algorithm in which each output estimates the number of asparagus stems that are present in each of the predefined phenological stages at a given date.

Asparagus Crop Development and Production Cycles
In this section, we briefly introduce the asparagus crop and the main phases of development. For a more detailed description of the asparagus growth and physiology the reader is directed to [28,29]. Asparagus is a vegetable perennial crop which once is in a productive phase, re-emerges after harvest without the need to re-plant it. The cultivation process begins by transplanting to the fields the seedlings grown in a nursery. The roots system below ground, also known as the crown, and the fern above ground begin to grow and after approximately 2 years of development and establishment, the ferns are cut, the asparagus spears emerge and the crop is lightly harvested for the first time [30,31].
After the first harvest, at the emergence crop stage, the next stems that emerge from the buds of the crown develop into a fern as shown in Figure 1. The asparagus stems grow vertically and will start producing the horizontal branches in a crop stage known as ramification. From this point, the cladophylls (leaf-like structures in the branches) will develop during the aperture stage. The aggregation of several consecutive individual asparagus stems that emerged from the root system, with their respective side branches and cladophylls compose the fern.
Subsequently, the fern thickens and covers the sandy soil intercepting light and beginning the production of carbohydrates which are sent down via translocation to replenish the roots system and be stored [29]. The fern development is followed by the short appearance of small yellow flowers and a maturation period, which corresponds to the longest crop stage. In total, each production cycle takes between four to five months before the crop is ready to be harvested.
During harvest, the spears that emerge from the buds are harvested after reaching approximately 20 cm until a minimum level of carbohydrates depletion in the root system. At this point, the new asparagus stems are left to develop and grow again to begin a new production cycle, in a life cycle that can last up to 15 years [30,31].
The crop stages in Figure 1 are the same stages that are recorded during the ground surveys and that are estimated in Section 2.7.

Test Site
The asparagus fields ( Figure 2) are located in the north of Peru in a dry coastal zone with sandy soil, divided in plots of an average of two hectares. We considered approximately 442 parcels in total, where timing and management practices such as starting and harvest dates as well as application of nutrients and pesticides among other activities, are performed simultaneously in groups of around four to six neighbouring parcels.

Climatological Conditions
The temperature and solar radiation present the maximum variability in the local test site, following the seasons of the southern hemisphere with a maximum peak of temperature around mid-February (summer) reaching up to 26 degrees Celsius and lowest values in August (winter) with temperatures of around 15 degrees ( Figure 3).
Since the winter never reaches extremely low temperatures, the asparagus crop does not reach a dormant stage which permits growers to have two productive cycles per year. However, since the conditions along the year are not exactly the same, there may be a difference in the crop evolution of the same plot in two consecutive productive cycles, in response to these changing conditions. This is an important factor that will be analysed in Section 2.6.2.
The rainfall levels are extremely low given the desert conditions where the parcels are located with an average of less than 1 mm per month.

Ground Truth
A survey campaign to collect asparagus phenological information was carried over a period of 8 months, from January to August of 2019, for 442 asparagus parcels. For each parcel evaluated, field surveyors randomly selected two transects to assess, with each transect being a metre in length. Within each transect the surveyors identified the phenological stage of each stem and counted the total number of stems in each stage present. Results for the two transects were averaged and recorded. In this way, the proportion of stems per parcel in any of the phenological stages identified in Figure 1 can be recorded, and a proxy for the evolution of crop stage over time can be established. The average temporal evolution of asparagus stems in each crop stage is presented in Figure 4 for all the production cycles covered in the ground truth. Please note that after the asparagus stems have emerged from the ground, the total number of stems remains approximately the same throughout the season while the number of stems present at each stage changes as the crop develops.  An average of 36 surveys were performed per parcel during the eight months of the ground data collection, i.e, approximately one survey per week per parcel. However, given the complexity of the operation not all surveys were carried on systematically, e.g., exactly every week, but rather with irregular sampling. Also, note that given the climatological conditions as explained in the previous section, it is possible to grow up to two production cycles per year independently of the starting month and consequently the resulting surveys contain information to characterise multiple times an entire asparagus campaign or season. This enable us to create a rich spatio-temporal dataset to characterise the crop development.

Sar Datasets
The Sentinel-1 dataset used for the analysis was built using the Level-1 Ground Range Detected (GRD) and the Interferometric Wide swath (IW) acquisition mode, with VV and VH polarisation channels. The data was collected using the Google Earth Engine (GEE) platform [32] in which the data pre-processing steps include applying orbit file, GRD border noise removal, thermal noise removal, Radiometric calibration and terrain correction. After obtaining the data from the GEE, a 3 × 3 boxcar averaging window was used as speckle filter. Table 1 shows the three acquisition geometries available for the test site with the corresponding average incidence angles and acquisition times, including the orbit 142 in descending pass direction and the orbits 18 and 91 in ascending pass. Using the three available orbits the revisit frequency corresponds to between 3 and 5 days while it is 12 days using a single incidence angle. In order to analyse the long-term behaviour of the backscatter signal (e.g., seasonality), a time series of nearly two years was built from Sentinel-1 data over a typical parcel of asparagus where the ground truth is known and which includes four consecutive agricultural seasons. For the methodology to monitor asparagus development presented in Section 2.7 data from January to August of 2019 was used for the 442 plots when ground truth is available. Figure 5 shows the average temporal evolution of the backscatter for all the production cycles covered in the ground truth. The crop characteristics that cause the SAR observations in Figure 5 will be further explained in Section 2.6.1. Please note that the VH polarisation presents more significant changes through time and with less statistical variance than the VV polarisation as described in Section 2.6.1. Also note that after the day 125 there is a significant increase in the variance of the response. This is due to different cultivation period lengths as will be explained in Section 2.6.2. Figure 7 shows the VH backscatter profile with the corresponding ground truth of a typical asparagus parcel over time, covering 4 consecutive production cycles.

Methodology for Estimating Asparagus Stems Per Stage
The methodology developed in the present analysis consists of three main parts: (1) Understanding of Sentinel-1 signal interaction and sensitivity with the asparagus temporal evolution, (2) Analysis of the impact that the local climatological conditions, particularly the temperature, have over the canopy development and (3) the multi-output machine learning regression model training and use to estimate the number of asparagus stems in the phenological stages of Figure 1.

Sar Sensitivity to Crop Evolution
Due to the mild winters with minimum temperatures of around 16 degrees Celsius, the asparagus crop does not reach a dormant stage naturally and there is not a harvest time forced by the climatological conditions. This allows growers to plan the starting and ending dates of the season (occasionally for individual parcels) so that is possible to have more than one production cycles per year (normally two) and if required, adjust to the market needs and contractual planning. Given this, at a single SAR acquisition there are plots at almost all possible crop stages.
The photographs presented in Figure 6 were taken the same date as a SAR acquisition (28/05/2019). Based on this, it is possible to locate in the time series of each parcel where the picture was taken, what the SAR response to the crop is and compare it with the crop status recorded in the footage. Please note that all the images were taken the same date to parcels at different growth stages, a possibility enabled by the local climatological conditions. Since the parcels are at different crop stages, the backscatter is also different as shown in the VH polarisation time series of Figure 6, in which the lowest backscatter of the season is present during the harvest periods given that the fern has been mechanically removed and the new emerging spears are being harvested ( Figure 6, images 1 and 2). At this point, the SAR signal interacts only with the soil presumably with a predominant surface scattering mechanism.
The SAR response to these conditions has been shown to depend on the dielectric and geometric properties of the soil, e.g., surface moisture and roughness [33,34]. In the test site, the moisture is mainly dependant on the irrigation since the rainfall levels are extremely low given the desert conditions (average of less than 1 mm per month). On the other hand, the roughness is determined by the ridges and furrows created by the rows where the plants were sowed. In this case, the height of these rows, the plot age (the younger the crop, the more sand present), together with the row orientation and the incidence angle define the soil-SAR signal interaction. This effect is particularly evident with the VV polarisation (although not shown here). Both polarisations, but particularly the VH polarisation increase significantly as the asparagus stems start emerging and vertically elongating up to two meters height. This increase may be a consequence of the double bounce created with the SAR signal reaching the soil and bouncing off the vertical spears back to the satellite ( Figure 6, image 3), although a more detailed polarimetric analysis would be required to confirm it. Please note that as show in Figure 4, the VV backscatter has less increment in time than the VH backscatter possibly due to the VH being more sensitive to the quasi-horizontal branches that grow sideways from the main vertical asparagus stems. A similar result was reported in [27] where authors present an almost constant VV response during the periods of vegetation growth.
However, when the crop reaches approximately the peak of the aperture stage (see Figure 7), the backscatter also reaches the peak in the entire cultivation period. At this point, the fern already has developed branches and the leaf-like structures in the stems are developing. From this moment, the contact of the SAR signal with the soil decreases thus also reducing the backscatter measured.
Subsequently, at the flowering stage (image 5 of Figure 6) the fern is fully developed and denser covering the soil and presumably creating a volume scattering response. The latter is less intense than the previous double bounce at the aperture stage, causing a decrease in the overall backscatter. Once the crop has reached the mature stage (images 6 to 8 of Figure 6), no significant changes happen in the biomass of the canopy hence the SAR signal remains at approximately the same level until the end of the season. An additional aspect to highlight is that as presented in Figure 4 and mentioned in Section 2.5, during the maturation stage the VV channel presents more statistical variation presumably since the VV backscatter still has an important ground component present as opposed to the VH backscatter which after the fern develops and stabilises, seems to have a strong contribution from the canopy and less from the ground. As a consequence, the VV captures features related to the soil, such as changes in moisture or roughness.

Impact of Temperature on the Crop and the Sar Response
In this section, we analyse the impact of the seasonality and the variable climatological conditions on the crop evolution. This analysis is relevant to build a phenology retrieval algorithm since these factors change the crop behaviour and/or the SAR response in time thus modifying the inputs for an algorithm and affecting the accuracy results. Please note that in this analysis of effect of temperature, we have used the VH channel given that it shows a more dynamic and cleaner signal than the VV polarisation.
Based on empirical observations, growers have noticed that the crop evolution during a "winter" and a "summer" campaign in the same year are different, in terms of canopy volume and development rate, possibly due to the different climatological conditions. Similarly, previous research showed that temperature influences asparagus plant growth rates and may cause growth depression [35,36]. On the other hand, mechanistic models of asparagus shoots height have been developed as a function of the temperature [36].
To investigate this effect in our test site using remotely sensed observations, time series of meteorological information, ground truth and SAR backscatter were analysed. Figure 7 shows the SAR and ground truth data for a typical parcel during four consecutive campaigns. The bottom plot of Figure 7 shows that the number of asparagus stems in maturation recorded during surveys are lower in the first semesters of 2018 and 2019 compared to the corresponding second semesters of the same years. Looking closely at the backscatter level for the same periods in the plot of the same figure, it is possible to see that the same pattern is followed in the time series of the VH polarisation once the crop has reached the maturation stage. This provides initial evidence of sensitivity of the SAR signal to the changes in the canopy volume (measured as number of stems in maturation, which would represent the total number of stems in the parcel for this period).
On the other hand, the left side of Figure 8 shows the VH backscatter response for the same parcel, during the winter campaign (second semester of 2018) and the summer campaign (first semester of 2019) as a function of the number of days after the cultivation started (DaS). It is possible to see the difference in the growth rate at the beginning of the cultivation, where in the summer campaign (red line) the crop reaches the peak of the VH time series faster relative to this same point in the cold season (blue line). In order to confirm that the temperature impacts the crop growth rate, a test was done using the temperature as independent variable instead of the number of days after the campaign start. A measure of daily accumulated heat has been previously used in the literature for this purpose. It considers an averaged measure of the daily maximum and minimum temperature to determine how much heat the crop receive in a day (Growing Degree Day) and how much it accumulates day after day during a period of time [37]. For the present study, 10 degrees Celsius was considered to be base temperature [30]. By accumulating the Growing degree days (GDD) and using it as independent variable, the plot on the right side of Figure 8 shows that the VH backscatter observed for the winter and summer campaigns are approximately aligned. This suggests that the temperature drives the rate of canopy formation as suggested in other studies [35,36], and that it is observable with the VH polarisation measurements. It also provides insights about the potential usefulness of the temperature as input feature for a remote sensing algorithm to retrieve crop stage information.
The boxplot of Figure 9 presents the median accumulated temperature (GDD) in all the campaigns registered in the ground truth, from the campaign start to harvest. It can be seen that depending on the month when the cultivation started, there is a seasonal trend in the accumulated temperature. This information is key to for example estimate the harvest date given the campaign starting month, based on the required accumulated temperature.
To summarise, in principle there are three visible effects of the temperature on the crop. The first one, corresponds to the canopy volume developed, being less biomass during hotter temperatures. The SAR backscatter signal is sensitive to this by measuring lower backscatter intensity during the maturation period (Figure 7). The second effect is associated with the growth rate, since as shown in the left side plot of Figure 8, it causes the stages at the beginning of the season to develop faster in a warmer campaign (red line). This effect is also visible from the backscatter response.
The third effect is related to the season length depending on the accumulated temperature during the cultivation period ( Figure 9). This accumulated temperature in turn depends on the month of the year when the campaign started. Figure 9. Campaign length measured in degrees Celsius (accumulated temperature) as a function of the production cycle starting month. As an example, if a campaign starts in January it normally accumulates around 600 degrees more than a campaign that starts in July. A total 442 campaigns were considered to generate this plot.

Estimation of Number of Asparagus Stems in Each Crop Stage
This section presents the methodology used for monitoring asparagus development as described by the number of asparagus stems in each of the stages of Figure 1. We use a data driven model to estimate ground measurements measurements from SAR observations. Please note that a deterministic physical model inversion is an ill posed problem since the number of unknowns are greater than the number of independent SAR measurements [38,39]. However, we exploit the patterns and correlations found in the SAR observations and temperature together with the ground truth to build empirical models.
We consider the retrieving the number of asparagus stems as a regression problem since it is possible to avoid selecting crop stage boundaries and allow soft transition between adjacent phenological stages. Please note that the ground truth used for training corresponding to phenological information is given by multiple and correlated variables (Figure 4). These variables do not evolve in time independently but rather they have temporal co-variation since they are produced by the same underlying process, i.e., the crop growth.
To exploit this structure in the output data, we use a multi-output regression algorithm that considers this interdependence of the individual outputs before making predictions. In this context, multitask learning (MTL) has been used in several applications precisely with this objective [40,41].
It is expected that not only the accuracy of a single multi-task learner increases compared to individual single-task learners (i.e., fitting an individual model for each output), but also since the model captures the structure of the data, it is able to generalize or interpolate better when the model is presented with unseen data [40,41].
In the remote sensing community multitask learning has been previously implemented using different machine learning algorithms [42,43]. Specifically for the case of SAR, in [42] the authors show how a multitask learner is able to make more accurately predictions of soil moisture and plant water content than individual learners.

Model Development
We chose initially a multi-task Random Forest Regressor [44] from the available MTL algorithms due to its power for capturing both the non-linear relationships and the correlation between multiple outputs.
A Random Forest Regressor [45] is known as an ensemble algorithm in which several individual decision regression trees (n-estimator trees) are built from individual bootstrapped datasets (datasets in which the samples are randomly selected from the original training dataset) [46]. Each regression tree uses a random subset of the input feature variables (m-features) from the original number of input features, and from this subset an optimal feature with an associated threshold is selected for each node [47]. In the single output case of a regression decision tree, both the optimal feature and the node threshold (i.e., the threshold that decides whether to go to the left or right child node) are found by minimizing a split function (also known as node cost) based on a euclidean distance error measure [47]. In the case of a multiple-output regression decision tree, an additional term is added to the node cost to account for the correlations in the output data. Specifically, in the framework proposed by [44], a Mahalanobis distance [48] is added to the split function to consider the multiple dimensions in the output data in the minimization cost function to select the corresponding thresholds. Please note that other split functions have been developed [49] but the in this paper the framework proposed in [44] is used. A final estimation is obtained from the multi-task Random Forest Regressor by averaging the estimations provided by the leaf nodes in each individual tree in the forest.
In this paper, the objective is to estimate the number asparagus stems in each of five possible stages. Each of these five estimations corresponds to an output predicted by the multi-output random forest regression. On the other hand, the algorithm uses historical SAR, temperature and ground truth data to learn the corresponding mapping functions. For this purpose, the scenarios in Table 2 have been considered in this paper. Table 2. Scenarios considered for asparagus growth estimation. Please note that each of the scenarios in B and C categories is tested using one image as well as sequence of multiple images. An initial category identified as category A, does not use remote sensing as input for the multi-task regression but only uses ground data. In the scenario A1 the number of days after the cultivation started (DaS) is used to estimate the number of asparagus stems present in each phenological stage at every image (Table 2), similar to how farmers traditionally execute their planning in the test site and in general for farms with low adoption of technology. Considering that multiple production cycles per year occur and grow under different climatological conditions, as shown in Figures 8 and 9, the day of the year (DoY) when the cultivation season starts impacts the canopy development. We tested the value of using this information as input for the algorithm, given that for instance, 20 days of cultivation in summer may differ from 20 days of cultivation in winter. This corresponds to the scenario (A2).

Category Scenario Input Description
A more robust scenario, the scenario A3, uses additionally the accumulated temperature during the cultivation period (from cultivation start to the SAR acquisition date) or accumulated growing degree-days (AGDD), since as it was shown in Section 2.6.2, the temperature drives the growth rate and canopy volume. Several other methodologies have used AGDD to account for the impact of climatic conditions in the crop growth [22,50,51]. Please note that using these input data sources the model learns the mapping function to give a theoretical estimation of the number of asparagus stems in each growth stage. This estimation may be accurate only if no external abnormal conditions affect the crop, such as extreme weather events including droughts, hail, etc., plant diseases, pests or changes in the management practices. Similarly, it does not provide any spatial information of crop status but a single prediction for the entire field. Although this information may be valuable for planning, it is not sufficient for operational crop monitoring.
A remote sensing-based approach that uses SAR images was considered to be the second category (B), where the Sentinel-1 backscatter including VH and VV polarisation channels with their corresponding ratio was used. This was tested using a single image and a sequence of images, from two to five. In this case, the near-real time data acquired by the satellite provides the capabilities for operational monitoring. In this scenario the VH polarisation was tested individually as well as together with the VH and VV ratio (scenarios B1 and B2). This scenario only uses SAR data as input for the multi-output regression.
A third category (C), includes each of the previous data sources SAR, DaS, DoY, and AGDD as input for the algorithm hoping to integrate their individual advantages (scenarios C1 to C3). Please note that the scenario C1 is included with the aim of quantifying the usefulness of the DaS feature when using multi-temporal SAR data and the scenario C2 quantifies the usefulness of the DoY feature.

Outputs
In all the scenarios, the aim is to produce estimates of the number of asparagus stems present in each of the following phenological stages: Emergence, ramification, aperture, flowering and maturation at any given SAR acquisition (same stages shown in Figure 1). Because in this case we require to simulatenoulsy predict the multiple outputs, we chose the model described in Section 2.7.1. On the other hand, as described in Section 2.4, these values would correspond to measurements taken in a randomly selected linear meter within the parcel. Here we assume that the ground data collected in this way is representative of the entire parcel.

Training and Testing Data
The ground truth described in Section 2.4 collected between January and August 2019, Sentinel-1 data and temperature measurements from the same periods are used to create the datasets. The dataset D takes the form D = (X, y) where X is a matrix of dimension mxn, m is the number of ground truth measurements available, n is defined by the features being used according to each scenario of Table 2 and y is the matrix of ground truth data with dimension mx5, where the number 5 corresponds to the number of asparagus stems in each stage of Figure 1 recorded in the ground truth surveys.
In order to separate training and testing data, we randomly select plots based on the ID to complete approximately 70% of them in a training dataset and 30% in a testing dataset so that we guarantee the unseen data required for the testing phase. In total, from the 442 plots (of 2 Ha on average) with available ground truth, approximately 309 plots randomly selected are used for training and the remaining 133 plots for testing.
Given that several ground truth measurement dates do not coincide with the Sentinel-1. acquisition dates, a three order spline interpolation was used to interpolate daily the ground truth so that an associated ground truth measurement can be obtained for every SAR observation. In total, there exist 4023 training data-points and 1739 testing data-points.

Model Hyper Parameters
Tuning of the optimal model hyper-parameters was done using 5-fold cross-validation with grid search. Table 3 presents the selected hyperparameters for the scenario C3.

Accuracy Metrics
The coefficient of determination R 2 computed with Equation (1) was used to measure the model performance, both for the individual outputs and for the model as a whole, by averaging the scores of the five outputs.
where y i corresponds to the i − th ground truth test sample, y i a prediction made with the model for this sample after training, and y the mean value of the n-ground truth test samples. Similarly, the root mean squared error RMSE calculated with Equation (2), was computed between predicted and testing values.
where y i corresponds to the i − th ground truth test sample, y i a prediction made with the model for this sample after training.

Single-Sar Image Results
The obtained results when using a single SAR image as input for the model are reported in Tables 4 and 5. When using one SAR image (see Table 4), particularly the scenarios A2, A3, C2 and C3 achieve satisfactory predictive capabilities with overall coefficients of determination R 2 between 0.84 and 0.89. This is confirmed with the RMSE's that are also the lowest for these scenarios.
Regarding individual outputs of the multi-task regression for these same scenarios, the maturation phase has the best performance achieving an R 2 of more than 0.9 in almost all of them and flowering the lowest accuracy between 0.70 and 0.8. Note from Figure 4 that comparing flowering to any other stage the temporal shape described in the ground truth by this measurement is more irregular and reaches on average fewer stems than the other stages. This could be due to an agronomic reason that requires further analysis or due to a systematic error affecting flowering when surveying the fields. This is also possibly the reason causing the predictions of asparagus stems in flowering less accurate than in the other stages.
On the other hand, the results of scenario C1 are substantially higher than the scenarios in category B. By providing to the regressor the number of days after the season started (DaS) as in scenario C1, the algorithm improves the retrieval with respect to category B potentially since it would be possible to disentangle similar backscatters at different dates.
An additional increase in R 2 (from 0.72 to 0.84) and reduction of RMSE (from 3.73 to 2.9) is achieved in the scenario C2 only by specifying the day of the year when the agricultural season starts. This feature indirectly provides information about the seasonality present in the test site and shown in Figures 7 and 8. The R 2 and RMSE are further improved and decreased respectively in the scenario C3 after the addition of the AGDD feature, although not significantly. This low increase may be explained by considering that providing DoY (as in scenario C2) we already provide information about seasonality and given that as mentioned in Section 2.6.2, the impact of higher and lower accumulated temperature in the canopy is perceived by the VH backscatter as shown in Figures 7 and 8. Consequently, the algorithm indirectly receives information about the accumulated temperature through the use of VH and DoY. This is an important result given that it implies that not using the AGDD feature (as in scenario C2), the temperature data from a ground station is not needed, without sacrificing substantially the model performance. Figure 10 summarises the model performance for the scenario C3. For this same scenario, Figure 11 shows the test and predicted data-points as a function of the days after the season started (using the cultivation days associated with the test data-points as x-axis). It can be seen that in general the predicted values (in red) follow the timing and the expected number of stems of the testing data-points (in blue). It is also possible to see however, that they are not exactly the same, indicating that although the model is accurately making predictions, it is not over-fitting to produce identical values as the testing points nor predicting extreme values that may correspond to outliers.

Multi-Temporal Sar Results
In order to quantify the performance of all the scenarios considered when using multiple Sentinel-1 images, we tested increasing the number of images to create the training time series, e.g., from only using the latest SAR image to using the 5 latest available images. In this case, the dataset takes the form D = ([X t , X t−1 , X t−2 , X t−3 , X t−4 ], y t ) where t represents the index of the date when the ground truth was collected or in a prediction setting, when the prediction is desired. Figure 12 show the corresponding overall, R 2 scores and RMSE for each scenario of Table 2 when using from one to five Sentinel-1 images to estimate the number of asparagus stems in each phenological stage. Please note that since the scenarios of the category A do not use Sentinel-1 data, we only consider the categories B and C for this part of the analysis. In this case, the scenarios that use the VH, VV and ratio as features (B1 and B2), increase their performance when more images are used, with the scenario B2 achieving an R 2 of 0.66. This is not surprising since by using the sequence of the backscatter evolution provides additional information for the algorithm to disentangle similar backscatter present at different time of the season. An additional significant result obtained here is that no substantial improvement is achieved when more than 4 images are used.
However, all the other scenarios considered in this paper do not increase the performance when increasing the number of images used as it would be expected, but rather maintain the same accuracy achieved when using a single SAR image, temperature and the start of the season information as input features. This may be due to the fact that the information provided by DaS, DoY and AGDD is sufficient for helping to disentangle the backscatter of a single image and thus no further images are required for this purpose. Figure 13 shows for all the parcels in the test site, the estimation of asparagus stems present in each phenological stage, obtained using the trained multi-task random forest of the scenario C3, for the Sentinel-1 image acquired the 2018/10/12. This is the same acquisition date as in Figure 2 and the intermediate subplot of Figure 14, which in turn shows an RGB composite of the same information using the predicted asparagus stems in emergence in the blue channel, the predicted asparagus stems in maturation in the green channel and the sum of the predicted asparagus stems in ramification, aperture and flowering in the red channel ( given their short duration). Figure 14 shows additionally the RGB composites of 4 other Sentinel-1 acquisitions in order to visualise the change in time of the predicted crop stages due to crop development. This composite reveals the crop stage of each parcel in an intuitive and fast way while the number of asparagus stems predictions map of Figure 13 shows more detailed information for every individual crop stage.

Discussion
We have provided the analysis of the SAR backscatter response to asparagus growth development and canopy formation as shown in Figure 4. Similarly, Figure 7 presents the seasonality effect both in the sensor response and the ground truth due to consecutive production cycles that grow under different meteorological conditions. Figure 8 shows how the VH polarisation is used for crop monitoring in order to visualise the canopy growth rate, revealing that it is faster in summer than in winter, but with less canopy volume (biomass). It also shows that although the production cycles in winter are longer, those cycles accumulate less temperature measured in GDD compared to the summer campaigns. Based on this information, the season length varies depending on the cycle starting month as shown in Figure 9. The backscatter response is sensitive to all these events as shown in Section 2.6.2.
With respect to the algorithm to retrieve the crop stage algorithm, several scenarios were considered in the analysis to understand the relevance of each data source and input feature and to determine the best way to combine the available information.
The scenarios of category A, which do not include remote sensing data, show that using accumulated growing degree days improves the predicting capabilities of an algorithm given that temperature is an important factor driving the crop evolution. In fact, the scenario A3 provides the highest R 2 -scores as well as the lowest RMSE's of the scenarios tested (0.89 and 2.3 respectively). This is aligned with the well known techniques to estimate the timing of phenological events using thermal calendars [52,53]. However, this estimation may be accurate only if no external anomalous conditions affect the crop, such as extreme weather events, diseases or changes in the management practices. Similarly, it does not provide spatial information of crop status but a single prediction for the entire field. In consequence, although this information may be valuable for planning, it is not sufficient for operational crop monitoring.
With regard to the scenarios of category B (SAR only), although using multiple images as input for a model the predictive capabilities improve significantly, it achieves poorer results than the categories A and C. The scenarios in category C which use all available input sources, achieve similar accuracy retrieving the number of asparagus stems in each phenological stage than the scenarios of category A, according to the results of Tables 4 and 5, but additionally providing spatial resolution and the ability to determine growth anomalies. Focusing particularly in scenarios C2 and C3, the difference in their performance is not significant thus the scenario C2 which do not use temperature may be preferred.
To summarise the scenarios considered and based on the results obtained, we grouped four users cases that may benefit differently for each scenario depending on their specific needs and data availability as shown in Table 6. Depending on the farm size a scenario evaluated in this paper can be adopted, considering additionally the availability of temperature data and whether a larger scale monitoring system is required.
With regards to the current literature available, although previous studies have considered the possibility of identifying asparagus from other crops using radar remote sensing data [23][24][25][26][27], they did not focus on analysing the backscatter response relative the crop evolution due to the transition between phenological stages as was presented in Sections 2.6.1 and 2.6.2. Similarly, no previous study evaluated the possibility to retrieve the crop stage. In this context, the present study contributes to the literature with a more detailed analysis of asparagus providing an interpretation of backscatter evolution that may offer tools for better crop classification (since so far low accuracy have been reported [23,24,27]), through a better understanding of the temporal crop signature.
The VV backscatter throughout the agricultural season does not have significant changes relative to the VH polarisation channel. This has been previously reported in [27] and it differs from what was presented in [26] using TerraSAR-X. This suggest that X-band is able to capture events in the crop development not visible in C-band as happens for instance in rice fields [6].
It is important to highlight that the current methodology is limited in part by the availability of the season starting date information as input, being used for calculating the AGDD as well as input for the multitemporal regressor (DaS). A potential solution to this is the estimation of starting dates from remote sensing as it has been previously investigated with satisfactory results in [54][55][56][57][58][59]. A further limitation in the current analysis is the lower accuracy of retrieving the number of asparagus stems in flowering, with respect to other stages such as emergence and maturation (Tables 4 and 5), which as mentioned in Section 3.1 might be caused by the unclear ground truth samples used to train the model. Statistical tests to better understand the characteristics of the training data and variations between measurements in each stage may be used to make decisions about better strategies to use the same data.
Future research will focus on the automatic detection of starting date from SAR and the use of quad-polarimetric data to better understand the scattering mechanisms throughout the season. Additionally, the use of better ways to deal with the sequential nature of data generated from agricultural fields and multitemporal remote sensing data will be considered including dynamical modelling [20,22] and more advanced machine learning models . Large scale monitoring (regional or national level) and satellite measurements of land surface temperature Automatically detected known from satellite measurements of land surface temperature C3 **** * It is assumed that the spatial resolution is not critical since anomalies can be identified by inexpensive field surveys. ** Spatial resolution is important since growth anomalies cannot be identified by simple visual inspection or the field surveys are expensive. *** The scenario C2 is the same as B2 by adding information about season start date. The lost of accuracy for not using temperature is not signicant based on the findings of Section 3. **** The accuracy results may decrease since automatically detecting start date and using low spatial resolution LST measurements introduce uncertainty.

Conclusions
In this paper, we provided an interpretation of the SAR backscatter response to asparagus crop growth and analysed the impact that temperature has on the canopy volume, its development rate, and the cultivation length. It was shown how the VH backscatter is sensitive to all these effects. We also presented a multi-output machine learning regression algorithm trained to retrieve the number of asparagus stems present in each of five possible phenological stages. We tested several operational scenarios finding that using the VH, VV, VH/VV, and information about season start date (scenario C2), the model is able to retrieve the number of asparagus stems with an overall R 2 of 0.84 and RMSE of 2.9. Adding the accumulated temperature (AGDD) as in the scenario C3, improved slightly the accuracy resulting in overall R 2 of 0.86 and RMSE of 2.7. However, given that this increase is not substantial, the scenario C2 might be preferred since the temperature feature is not required and therefore this additional data source could be removed without losing significant model performance.
Additionally, as shown in Figure 12, for the conditions evaluated in this paper, the use of multitemporal SAR data is not critical when using information about the season start date and temperature as crop stage predictors to complement the backscatter, since these features provide similar information for the algorithm to disentangle events in the temporal dimension.