Time Series of Quad-Pol C-Band Synthetic Aperture Radar for the Forecasting of Crop Biophysical Variables of Barley Fields Using Statistical Techniques

: This paper aims to both ﬁt and predict crop biophysical variables with a SAR image series by performing a factorial experiment and estimating time series models using a combination of forecasts. Two plots of barley grown under rainfed conditions in Spain were monitored during the growing cycle of 2015 (February to June). The dataset included nine ﬁeld estimations of agronomic parameters, 20 RADARSAT-2 images, and daily weather records. Ten polarimetric observables were retrieved and integrated to derive the six agronomic and monitoring variables, including the height, biomass, fraction of vegetation cover, leaf area index, water content, and soil moisture. The statistical methods applied, namely double smoothing, ARIMAX, and robust regression, allowed the adjustment and modelling of these ﬁeld variables. The model equations showed a positive contribution of meteorological variables and a strong temporal component in the crop’s development, as occurs in natural conditions. After combining different models, the results showed the best efﬁciency in terms of forecasting and the inﬂuence of several weather variables. The existence of a cointegration relationship between the data series of the same crop in different ﬁelds allows for adjusting and predicting the results in other ﬁelds with similar crops without re-modelling. resources, N.S.; curation, M.T.S.-M., R.V.-D. C.S.d.B.; writing—original A.E.S.; writing— review and editing,


Introduction
Assessing and quantifying the biophysical variables of vegetation cover such as biomass, leaf area index (LAI), height, and the fraction of vegetation cover (FVC), among others, have been identified as a paramount challenge in agriculture due to their role in modelling the growth and development of plants [1], improving prediction of crop yields [2][3][4], and in hydrological processes or droughts monitoring [5][6][7][8][9]. Furthermore, monitoring such parameters through ground sensors or field measurements is expensive and highly time-consuming; thus, remote sensing represents an appealing alternative.
Remote sensing data acquired from space are a good source of information to regularly derive biophysical variables. Although optical remote sensing has been successfully used [10][11][12][13][14], acquiring data using these types of sensors is limited to nearly cloud-free growth models over time, considering the data obtained in situ versus satellite measurements, establishing the dependence and relationship between them. Finally, analyzing the results obtained in one plot using cointegration techniques proved that the model can successfully make accurate predictions for the other plot when used as a reference.
The paper is organized as follows: Section 2 presents the study area and data used. Section 3 describes the statistical methodology. The results from the statistical methods are analyzed and discussed in Sections 4 and 5. Finally, the conclusions are drawn in Section 6.

Study Area and Field Data
The Iberian Peninsula is extremely sensitive to future climate changes, especially the inner areas, where rainfed agriculture depends solely on rainfall. The study area ( Figure  1) belongs to the central part of the Duero basin (41.1° to 41.5°N and 5.1° to 5.7°W), a fragile semi-arid area where a soil moisture station network (REMEDHUS) has been working since 1999, which is the property of the University of Salamanca in Spain. This network monitors the climatic and soil conditions of the area and provides ground data for agricultural and hydrological applications. A detailed description of the network, together with its agrometeorological equipment, is available in [44][45][46]. Two stations of REMEDHUS, namely F11 and M9 (Figure 1), were chosen in 2015 for a field campaign in their corresponding plots (both planted with barley) devoted to estimating field crop vegetation throughout their growing cycle. Soil moisture probes (Hydra Probes, Stevens Water Monitoring System, Inc.) provided a soil moisture dataset for each area, and the automatic weather stations of the network (Figure 1) supplied the meteorological data (precipitation, P, and evapotranspiration, ET0). The study spanned from February to June 2015, and the crops measurements were taken fortnightly (nine measurements in total). The objective of the campaign was to seek consistent relationships between ground measurements and remote-sensed SAR data acquired simultaneously by the RADARSAT-2 mission, as explained in the next section. Previous experiments successfully proved the relationships between both datasets using simple correlations [47]. The crop measurements (taken either in situ or in the laboratory) aimed to estimate the following parameters: height, FVC, LAI, fresh biomass, and percentage of water content (PWC). The FVC was estimated using supervised image classification of photographs The crop measurements (taken either in situ or in the laboratory) aimed to estimate the following parameters: height, FVC, LAI, fresh biomass, and percentage of water content (PWC). The FVC was estimated using supervised image classification of photographs taken zenithally over the canopy. For the LAI estimation, a binary-threshold segmentation was applied to the scanned leaf samples. Biomass and PWC were obtained by weighting the fresh/dry samples. Finally, the soil moisture (SM) records (in volumetric units) were  [47,48] for a more detailed explanation of the measurements.
The field data were measured fortnightly over the study period (excepting the soil moisture, with daily records), producing a total of 9 observations. As the dates of the field observations and those of the satellite images did not coincide (because the satellite's passage over the area is not known in advance), the image series was interpolated to fit in with the field observations, as explained in the next section.

Satellite Data and Pre-Processing
RADARSAT-2 is a satellite launched by MacDonald, Dettwiler and Associates Ltd. (MDA) and the Canadian Space Agency (CSA) in December 2007. It has a C-band synthetic aperture radar (SAR) operating at 5.4 GHz, with multiple beam modes and an orbit repeat cycle of 24 days. For this study, a time series of 20 RADARSAT-2 images from February to July 2015 was acquired in Fine Quad-Pol Single Look Complex (SLC) from three different beam modes (FQ16W, FQ11W, and FQ6W) at different incidence angles (36 • , 31 • , and 25 • ) ( Table 1). As shown in Table 1, there is a mismatch between field measurements and RADARSAT-2 acquisition dates. This can be a source of uncertainty in the statistical analysis, especially when field measurements show trend changes or rapid changes; thus, a spline interpolation was applied to avoid this problem [47].
RADARSAT-2 data were processed using the Sentinel-1 Toolbox of the Sentinel Application Platform (SNAP, https://step.esa.int/main/toolboxes/snap/, accessed on 21 July 2020) provided by the European Space Agency (ESA) as follows: (1) radiometrically converting the images into backscattering coefficients using the look-up table from the images; (2) polarimetric coherency matrix generation [49]; (3) speckle filtering (9 × 9 boxcar); (4) terrain correction and geocoding using the Range Doppler orthorectification method and the digital elevation model from the Shuttle Radar Topographic Mission (SRTM); and (5) retrieving the polarimetric parameters using the PolSARpro software provided by the ESA (https://earth.esa.int/eogateway/tools/polsarpro, accessed on 21 July 2020). A detailed description of the procedure and parameters may be found in [47]. Table 2 summarizes both the polarimetric parameters and field observations, together with the weather records. Table 2. List of polarimetric parameters derived from RADARSAT-2, field/laboratory estimations, and weather records used in this study.

Polarimetric Parameters Symbol/Acronym
Backscattering coefficient at HH, HV, and VV channels HH, HV, VV Ratio of backscattering coefficients at HH, HV, and VV channels HH/VV, HV/VV Normalized correlation between the copular channels (HH and VV) γ HHVV Polarization phase difference between the copular channels (PPD) PPD Entropy and dominant alpha angle H, α 1 Normalized between the 1st two channels in the Pauli basis (

Methods
In order to adjust and predict the field measurements, different statistical techniques based on time series analysis were used, such as exponential smoothing on the in situ variables, ARIMAX, and regression by robust least squares. These methods were enriched with the contribution of potential meteorological explicative variables and linear combinations obtained from principal component analysis (PCA) of the satellite variables for each type of crop.
Once the adjustments and predictions of the aforementioned models were refined, the forecast combination methodology was used to obtain the final adjustment and prediction. Further, the cointegration method allows taking advantage of previous training to transfer the model to a new plot, provided the plot demonstrates a similar behaviour.

Interpolation
As previously mentioned, the dates of the field observations and satellite images do not coincide, so it seemed necessary to interpolate the data to compare the time series. Four data were taken per month starting in February 2015 and ending in June 2015. Since the meteorological records have a daily rate, weekly averages were also calculated to fit the temporal frequency of the other data. Twenty observations were used for the estimation models, and four were used for the prediction model.
Interpolation of missing data was carried out using cubic spline and least squares. Interpolation fits a cubic spline curve to the input values, which is a segmented function consisting of polynomial (cubic) functions joined so that the total curve and its first and second derivatives are continuous [50].

Dimension Reduction
Owing to the high number of parameters led by the images, the dimensionality of the information collected by these 10 satellite parameters was reduced by a principal highest relevance, and so on. These components can be considered new axes representing the cloud of points that form the original variables. The projection of the points on the components serves to interpret the relationship between the different variables. If this interpretation is complex, the components (axes) can be rotated to redistribute the variance of the original variables in the factors to achieve a better interpretation of the results. For the rotation, the equamax rotation method is used, which minimizes the number of variables that contribute to a factor and the number of factors necessary to explain a variable [51][52][53].

Fitting Models
Several methodologies were used to adjust the time series of crop measurements, namely height, FVC, LAI, biomass, PWC, FVC, and SM. Exponential smoothing, which uses the information from the dynamics of the series, considers only the raw values of the field variables. The ARIMAX models also incorporate the information from the correlation of the white noise together with external variables such as the satellite variables (previously summarised in the factors obtained by the PCA) and meteorological variables. For field variables where the ARIMAX methodology did not provide a good fit, a robust regression was used to model them instead.

Exponential Smoothing
The simple exponential smoothing method is an evolution of the weighted moving average method. In this case, the average value of the time series is calculated using a self-correcting mechanism that aims to adjust the forecast in the opposite direction of the past deviation through the correction affected by the smoothing coefficient, α ∈ [0, 1]. This parameter controls the influence of the observations, i.e., large values indicate that the most recent past observations influence the forecasts the most, while smaller values mean that most of the historical data is considered when making a prediction. The single exponential smoothing method is appropriate for series that move randomly above and below a constant mean without trends or seasonal patterns [54]. A double smoothing adjustment was applied for each of the previously interpolated variables since they have a trend but no seasonality. This method repeats the single smoothing adjustment twice, appropriate for a series with a linear trend.
For a time series y t at instant t, we define: where S t is the single smoothed series, D t is the double smoothed series, and α is the smoothing coefficient. The forecast of a new observation at time t is computed as:

ARIMAX Models
ARIMAX models are ARIMA that incorporate exogenous variables into the adjustment. ARIMA are statistical models for time series that take into account the existing dependence between the data through its three components: autoregressive (AR), integrated (I), and moving averages (MA), which include cyclical or seasonal components and allow the description of the temporal variable as a linear function of previous data and errors due to randomness. The general ARIMAX model is described by Equation (3): where By t = y t−1 , d is the number of differences that must be applied to the series to make it stationary, α is the prerequisite for fitting an ARMA model using the Box Jenkins methodology, p is the number of autoregressive components in the equation and Φ i the associated parameter, X j is the model's exogeneous variables and γ j the associated parameter, and ε t is the error at time t [55,56]. The Akaike information criterion (AIC) is used to determine which possible ARIMAX model is the best. This method serves to identify and select ARIMAX models as a measure of goodness-of-fit, which describes the relationship between the bias and variance in the model construction. It provides a means for comparison between models and a tool for model selection, with the best model being the one with the lowest AIC value [57].
The Ljung-Box test [58] was used to validate the assumptions of the model, i.e., that the residuals follow a time series where values in different time instants are not statistically correlated (white noise) and, therefore, have a constant power spectral density.

Robust Regression
The most used method to estimate the coefficients of linear regression is the least squares. This method, while optimal in the case of normally distributed errors, is very sensitive to the presence of outliers. To solve this problem, other, more robust estimation methods have been developed, which are less affected by the presence of atypical data, such as M-estimation [59], S-estimation [60], and MM-estimation [61], referred to hereafter as robust least squares methods.
To evaluate the accuracy of the forecasts from both the ARIMAX and robust regression models, measures of accuracy (mean absolute error (MAE) and Theil's inequality coefficient (TIC)) were used. A Theil's inequality coefficient of less than one confirms the model's suitability for forecasting [62].

Forecast Combination Techniques
A combination of predictions made through different processes can improve the precision of a single prediction [63]: the more diverse the prediction model, the more accurate the prediction result. The combination of predictions aggregates the different forecasts obtained from each model into a single one as a weighted average of all the predictions by considering the information collected by each model individually, see [64][65][66].
Different prediction combination techniques were used [67]: • Weighted least squares (WLS) requires knowledge of the true values of the forecasted variable for some of the forecast period. It compares real values with predictions using the regression coefficients as weights. This method does not require that the underlying individual forecasts are unbiased, and the resulting average can lie outside the range of the underlying forecasts. • Weighted mean squared error (MSE) consists of making a weighted average giving greater weight to the predictions of the models with less mean squared error. Stock and Watson [68] propose MSE weighting, which compares individual forecasts with real values over a forecast period.
The mean squared error of each forecast is computed and used to form individual forecast weights:

Cointegration Analysis
Engle and Granger [69] developed the concept of cointegration, i.e., two series are said to be cointegrated if they move together over time, and the differences between them are stable. Hence, cointegration reflects the presence of a long-term equilibrium toward which the system converges over time. If more than two series are cointegrated, at least one must cause the other. The differences (or term error) in the cointegration equation are interpreted as the disequilibrium error for each particular point in time. Cointegration implies that there is a trend common to both series, so a properly weighted combination cancels this component. There is also a long-term equilibrium between both series, so deviations from equilibrium tend to disappear in relatively short terms.
A simple procedure to estimate the cointegration relationship consists of fitting a linear regression for the potentially cointegrated variables and then evaluating the stationarity of the residuals (constant mean and variance) that occurs in these data series. Different tests were used to analyze the feasibility of the cointegration:

•
The existence of cointegration between time series requires that each one is nonstationary, but the linear combination is stationary [70]. The augmented Dicky-Fuller test contrasts the null hypothesis for the existence of a unit root, equivalent to nonstationarity. To assess that both series are non-stationary, the augmented Dicky-Fuller test is used [56,71].

•
Granger's causality test [72] checks whether the results of one variable serve to predict another variable analyzing the causal relationships between time series. Once this test is carried out, the cointegration equation is constructed, permitting one variable to be predicted using the results from another.

•
Johansen's cointegration test [73,74] is another method to test the cointegration of several series. There are two types of Johansen tests, either with trace or with eigenvalue (maximum eigenvalue test). The inferences obtained with both alternatives may derive quite similar results; in this case, the results have been obtained with the trace.

Dimension Reduction
After the series interpolation, PCA was performed to reduce the size of the satellite variables. The constructed factors are a linear combination of the original satellite variables: HH, HV, VV, HH/VV, HV/VV, γ HHVV , PPD, H, α 1 , and γ P1P2 . An equamax rotation has been performed to improve the scoring of the satellite variables' contributions for the factors. The results showed that the first four factors preserve 98% of the information in terms of cumulative variance, while the first two factors explain about 76% of the variability. Table 3 shows the coefficients of the linear combination of the satellite variable in each factor. The constructed factors with the parameters associated with each variable according to the values are presented in Figure 2, indicating which variables have the greatest weight for each of the factors.
The incidence angle of the satellite observations was not used to subdivide the sample due to the scarcity of data. Of the 20 satellite data, 6 were observed with an angle of 25 • , 7 with 31 • , and 7 with 36 • . However, the influence of the angle on the four new factors was analyzed employing an ANOVA test, using the angle as the factor. Table 4 shows the p-values of the tests for independence between the new variables and the angle and the associated Levene's test for homogeneity of variances. The constructed factors with the parameters associated with each variable according to the values are presented in Figure 2, indicating which variables have the greatest weight for each of the factors. The incidence angle of the satellite observations was not used to subdivide the sample due to the scarcity of data. Of the 20 satellite data, 6 were observed with an angle of 25°, 7 with 31°, and 7 with 36°. However, the influence of the angle on the four new factors was analyzed employing an ANOVA test, using the angle as the factor. Table 4 shows the p-values of the tests for independence between the new variables and the angle and the associated Levene's test for homogeneity of variances. The results showed that the null hypothesis of homogeneity of variances was assumed in all cases (p-value > alpha = 0.05) for the Levene's test, and there were no significant differences between Factors 1, 2, and 4 and the angle (p-value > alpha = 0.05) for the independence test, but significant differences were found for Factor 3.
Based on these results, of the four factors that explain 98% of the variability, the two that provide the greatest explanatory power while not being influenced by the angle (Factor 1 and Factor 2) were used as explanatory variables in the following adjustment models.  The results showed that the null hypothesis of homogeneity of variances was assumed in all cases (p-value > alpha = 0.05) for the Levene's test, and there were no significant differences between Factors 1, 2, and 4 and the angle (p-value > alpha = 0.05) for the independence test, but significant differences were found for Factor 3.
Based on these results, of the four factors that explain 98% of the variability, the two that provide the greatest explanatory power while not being influenced by the angle (Factor 1 and Factor 2) were used as explanatory variables in the following adjustment models.

Analysis of Fitting Models over the Barley Crop on Field F11
In this section, the selected adjustment models described in Section 3 were evaluated to forecast the field variables (height, biomass, FVC, PWC, LAI, and SM) of the F11 crop field. When applied to the fittings, the meteorological variables (precipitation, evapotranspiration, and temperature) and the factors that summarize the satellite variables obtained from the principal component analysis were considered as explanatory variables. Once the adjustment models were obtained, the combination of predictions was carried out to obtain the final prediction.

Double Smoothing Adjustment
The double smoothing adjustment (Equation (1)) was performed for the field variables. Table 5 shows the fit coefficient (α), the residual sum of squares (RSS, which measures the amount of variance that is not explained by the model), and the mean squared difference between the estimated values and the true value: the mean squared error (MSE). The α values near one highlighted the importance of the last measurements in the fitting model. This result agreed with the growing cycle behaviour of this crop, in which the last dates, corresponding to the fruit developing and ripening (around May), are vital for predicting the yield and final performance of the crop.

ARIMAX Adjustment and Robust Regression
The adjustment was carried out using ARIMAX models for the field variables. Table 6 presents the results of the models (parameters significant at 5%) with the lowest AIC, mean absolute error (MAE), MSE, and Theil's inequality coefficient (TIC). The adequacy of the models for forecast purposes was confirmed by Theil's inequality coefficient (<1), whereas the Ljung-Box assumed the hypothesis that the residuals are white noise at 5% and not autocorrelated. The results in Table 6 made clear that the influence of satellite variables (summarized in the constructed factors) was significant for all the considered measures to be estimated. Moreover, the adjustments are all dependent on the previous time instant, which is supported by the natural behaviour of barley plants, i.e., their state in a given moment is a consequence of the previous conditions. As can be seen in Table 6, the dynamic structure of the time series for height was an ARIMAX (0,1,0) (Equation (5)). In this case, there was a positive impact of the two factors (F1 t and F2 t ), dependence on the previous past observation (H t−1 ), and a dynamic dependence with its lagged variable (ε t ). For the biomass variable, the fitting model was an ARIMAX (0,1,1) (Equation (6)), where there was a positive impact of the first factor and the ET0 variable, dependence on the previous instant (B t−1 ), and a dynamic dependence of their errors. The ARIMAX highlighted the dependence of the biomass on the ET0. Equation (7) presents the estimation model for FVC, an ARIMAX (0,1,1), where there was a positive impact of F1 over FVC, dependence on the earlier time (FVC t−1 ), and a dynamic dependence of their errors. An ARIMAX (1,0,0) resulted for PWC (Equation (8)), where there was a positive impact of the first factor and temperature and a dynamic dependence on its own variable at the previous stage. Finally, the fit for LAI (Equation (9)) was an ARIMAX (0,1,1), and there was a positive impact of F1, dependence on the time instant past (LAI t−1 ), and a dynamic dependence on the errors.
In the case of the SM variable, the adjustment using ARIMAX models was not appropriate (TIC > 1 and adjustment coefficients not significant at 5%), so it was modelled using robust regression. The robust least squares estimation method allows better adjustments when there are outliers, as it is less sensitive to them. In this case, the S-estimation was used. After the estimation of this model for SM (Equation (10)), all resulting parameters were significant at the 0.05 level. A positive impact of F1 and the meteorological variables, ET0 and precipitation, took place for the SM. This is not surprising since the SM is the residual of the water inputs (rain) and outputs (ET0) of the soil-plant-atmosphere system [75]. The goodness-of-fit had an R-squared of 80%, the MAE was 0.02, the MSE was 0.83, and the TIC was 0.26. SM t = 0.16 + 0.02 F1 t + 0.03 ET0 t + 0.96 P t + ε t (10)

Combination of Forecasts
In the previous subsections, two models have been obtained for the adjustment and prediction of the field variables, double smoothing, and ARIMAX/robust regression. Thus, two different predictions for each field variable were available. To obtain the final prediction, a combination of the two predictions was made using the weighted least squares (LSW) and mean squared (MS) methods. Table 7 shows the mean squared error (MSE) of both methods for each field variable. As demonstrated, these results have a lower MSE than that obtained for the predictions of the individual models, double smoothing, and ARIMAX/robust regression ( Table 6). The LSW is the method with the lowest MSE.

Prediction of Biophysical Variables of Barley on M9
Since the time evolution of barley in plots F11 and M9 is similar for all field vari ( Figure 9), a cointegration may be expected. Thus, the results obtained for F11 m used to adjust and predict those of plot M9.

Prediction of Biophysical Variables of Barley on M9
Since the time evolution of barley in plots F11 and M9 is similar for all field variables ( Figure 9), a cointegration may be expected. Thus, the results obtained for F11 may be used to adjust and predict those of plot M9. To ensure that cointegration is possible, the three tests described in Section 3.5 applied to both series. The results of the augmented Dicky-Fuller test showed that a series were non-stationary. In all cases, the null hypothesis of non-stationarity at 5% To ensure that cointegration is possible, the three tests described in Section 3.5 were applied to both series. The results of the augmented Dicky-Fuller test showed that all the series were non-stationary. In all cases, the null hypothesis of non-stationarity at 5% was accepted, and the p-values were height (0.99), biomass (0.39), FVC (0.18), PWC (0.95), LAI (0.16), and SM (0.16). Granger's causality test found a bidirectional relationship between the same measures of the two fields at a significance level of 5%. Granger's causality test assumes the null hypothesis that the second series does not cause the first series with a certain time lag. Thus, if p-values below the significance level are obtained (p-value < alpha = 0.05), the hypothesis can be rejected, concluding that causality exists: height (p-value = 0.014), biomass (p-value = 0.04), FVC (p-value = 0.005), PWC (p-value = 0.003), LAI (p-value = 0.01), and SM (p-value = 0.02). Finally, the Johansen cointegration test reflects at 5% the existence of cointegration relationships between the two fields. The Johansen cointegration test shows that the null hypothesis of cointegration between the series is assumed in all cases (p-value > alpha = 0. Therefore, it seemed unnecessary to repeat the adjustments in plot M9 for each field variable since the predictions of the modelling of plot F11 would be used to predict plot M9 with the cointegration equation. Table 8 shows the fit of the field measurements of the two plots by the cointegration equations, indicating a stable linear relationship. Furthermore, the residuals of the regression were stationary, which indicated the fit robustness. Figure 10 shows the results of the biophysical variables of M9 adjusted by cointegration through the data of the F11 plot.

Discussion
To develop efficient adjustment and prediction models for the biophysical variables of crops, it is necessary to have a good dataset, including meteorological data, to fulfil the remote sensing series and the in situ measurements. Prior to analyzing the datasets, an interpolation had to be carried out to be able to compare the data series over time. The

Discussion
To develop efficient adjustment and prediction models for the biophysical variables of crops, it is necessary to have a good dataset, including meteorological data, to fulfil the remote sensing series and the in situ measurements. Prior to analyzing the datasets, an interpolation had to be carried out to be able to compare the data series over time. The satellite data provide many variables, so they are clustered using PCA to obtain factors that explain a high percentage of the variability of these variables. These factors were used as exogenous variables in the adjustment and the rest of the meteorological variables. The weather data was shown to be especially helpful in the case of the soil moisture adjustment, although this variable is one of the most typically retrieved from radar data [76]. Moreover, ARIMAX highlighted the dependence of both the biomass on the ET0, as it is recognized and applied in many crop models [77,78], and the temperature as an indicator of the water status of plants and soils jointly with the FVC [79][80][81].
To choose the most appropriate method for modelling and predicting each of the field variables (heigh, biomass, FVC, PWC, LAI, and SM), comparing different statistical methodologies is necessary to finally choose the two models with the best results for each field variable (double smoothing and ARIMAX/robust regression). Once the adjustment and forecast was performed with the chosen models, the accuracy was improved using the combined predictions. In this study, different methodologies were tested and compared to obtain the best fit for LSW. To date, this method has been scarcely applied to SAR data. Statistical methods have usually been applied for crop classification [82] or to characterize polarimetric parameters [83,84]; however, statistical modelling is rare. Not surprisingly, Yuzugullu et al. [85] stated that "most of the current statistical methods are not capable of explaining the growth cycle of rice crops in different areas while avoiding high training costs". We tried to show effective statistical modelling with limited field data and without training to fit the growing cycle of several parameters of barley.
The statistical analysis carried out for one of the barley plots can be extrapolated to the other. Considering the specific characteristics derived from the same crop, the analysis of the cointegration existing between the data series of both plots (F11 and M9) allows adjusting the field variables of one through the adjustments made in the other.
The results obtained show the suitable adjustments made with the different statistical methods used, despite having a small sample of field data and many satellite variables. Therefore, the use of satellite images makes it possible to adjust the field variables, avoiding costly in situ measurements. For this reason, the field measurements are usually scarce in comparison with SAR observations and, therefore, limit other fitting methods such as simple regression, as done in Maity et al. [86] (three measurements over cotton), Lim et al. [87] (six measurements over rice) and our previous application with the same dataset [47].
In comparison with the results of the reviewed literature where statistical time series methodologies are used, the novelty of this paper is that these were performed with data obtained from remotely sensed observations, together with field data and meteorological variables. In addition, the best fitting models are combined, and, using cointegration techniques, the results are extrapolated to other fields.
As ideas for further research, different machine learning methodologies could be used if a large amount of satellite data is available and to compare fields with different crop types.

Conclusions
This research showed a statistical procedure for modelling and predicting crop biophysical variables with time series of SAR images. Although the method is well-known in the statistical community, it was never applied to temporal series retrieved from remote sensing observations. The strategy is specifically convenient when the temporal series of field data, usually accompanying the image series, are short or scarce, a situation rather frequent owing to the difficulty of gathering field observations. The real dimensionality of image variable parameters was reduced by the PCA analysis, obtaining four factors that could explain 98% of the variability of the satellite variables, among which only the first two factors were relevant in the model.
The field data were modelled and adjusted by several statistical procedures (double smoothing, ARIMAX, and robust regression), generating two models with their corresponding coefficients, influence variables, and predictions for each field variable (height, biomass, FVC, PWC, LAI, and SM). The model equations showed a positive contribution of the meteorological variables, in particular, ET0 and precipitation for SM, ET0 for the biomass, and temperature for PWC. Further, the modelled series noted the importance of the crop status from one measurement to the next, highlighting the strong temporal component in crop development. The contributions found in the modelling reflect the dependence between the crop parameters and the effect of time, mirroring the real behaviour of this crop. The combination of the predictions obtained for the field variables by the two models improves the accuracy of the predictions. To make the combination of predictions, the weighted least squares (LSW) and the mean squared (MS) methods were tested and compared, concluding that the LSW provides a better fit.
Cointegration is an interesting tool to avoid replicating the model for different crops. In this case, the M9 field variables were fitted with the data from field F11, both producing barley.
The methodology employed is very popular with economic data, where different model adjustments are made to subsequently combine them to obtain the best possible fit. Moreover, cointegration between time series is commonly performed in econometrics. In this case, despite the small sample of filed data, the procedures have provided good results.
Alternative methods of statistical fitting applied between fields and remote sensing images pave the way to use the imagery as a subrogate of the crop behaviour, avoiding time-consuming field measurements.