2. Study Area
The study area in this research is Ukraine, which is one of the major producers and exporters of agricultural commodities. In Ukraine, 30 percent of nominal and 11.7 percent of real gross domestic product (GDP) are related to the agricultural sector. Since 1991 (the breakup of the Soviet Union), many alterations in land use have been made because of economic and policy changes as well as from the climate change impact. The past few years were especially dynamic in terms of land use changes and policy making due to (a) the military conflict in Eastern Ukraine and the occupation of 7 percent of the territory by Russian forces, (b) land market opening reform in Ukraine, and (c) the appearance of double-cropping fields due to the climate change warming processes. In terms of land productivity research, Ukraine is especially interesting because this country is a large world exporter of agricultural production. In 2020, Ukraine was listed as one of the five biggest agri-food exporters to the EU, with total exports to the EU of about EUR 1 billion [
24]. In addition, Ukraine is the world’s largest producer and exporter of sunflower oil, the world’s third-largest exporter of corn, the fourth of barley, and the sixth of soybeans. The main crops grown in Ukraine are cereals, sunflowers, corn, soybeans, and sugar beets. Most cereal fields are related to the winter wheat fields, but also barley as a minoritarian crop is occurring. Despite the fact that Ukraine fully ensures its food security, it meets only a third of its potential [
25]. The reason for this is the insufficient use of modern agricultural practices and irrigation. The biggest agro-climatic zone in Ukraine is the steppe zone, which occupies 40 percent of the country’s territory (240,000 km
2). In this zone efficient agriculture is hardly possible without irrigation, but the total irrigated area in Ukraine in 2017 was 5000 km
2 [
25].
Ukraine is actively struggling with problems of sustainable agriculture. These problems are compounded by global climate change processes that have significantly affected food production in the world and in Ukraine. According to Ukraine’s 2021 Common Country Analysis designed by the United Nations [
26], climate change significantly influences the effectiveness of the Ukrainian agricultural sector. The crop growth periods and crop calendars for the majority of crops are affected by the annual average temperature increase by 1.45 Celsius degrees over 50 years, which is double the global increase [
27,
28]. Particularly sensitive to these changes are winter crops, which have become more likely to sustain loss due to weather conditions. For example, in 2020, droughts in the Odessa region caused a 37 percent loss of winter crops; the loss of winter crops was observed on a smaller scale throughout Ukraine. Farmers had to plow over land, and thus Ukraine suffered great economic and food losses. Although there were no significant changes in cereal planting areas (50.2 percent in 2000 and 55.4 percent in 2012) observed by [
29], there is a clear trend toward increasing the land area of industrial crops. As an example, from 2000 to 2014 the total area of sunflowers, soybeans, and rapeseed in Ukraine grew from 8.4 percent to 28.4 percent. In 2016, the area of sunflowers peaked, and it has remained stable for the past several years. The main reason for that stability is the use of sunflower seed as a sort of “insurance policy” against a high loss rate for winter crops [
30], helping farmers to salvage income in such conditions. However, this option also leads to an increase in crop rotation violation events [
31]. Similar patterns of agricultural practices are common in other Eastern European countries, such as Belarus and the Russian Federation. The planting of industrial crops leads to a decrease in the productivity of land and to its degradation. Therefore, the observed trend to increase the sown area of sunflowers contradicts the principles of sustainable agriculture in Ukraine.
To solve this problem, the government of Ukraine has introduced crop rotation rules that restrict farmers from planting industrial crops, forcing them to follow certain crop rotation schemes. For example, sunflowers can be planted on the same field once per seven years, according to the Resolution of the Cabinet of Ministers of Ukraine, 11 February 2010, Nr 164, “Approval of standards for optimal balance of crop types in crop rotation in different natural and agricultural regions” [
32]. However, the introduction of such norms without the creation of a mechanism for detecting violations and for controlling crop rotations has not led to an improvement in the situation. The lack of a state instrument for crop rotation control still leads to frequent cases of crop rotation violations. In Ukraine, industrial crops such as sunflowers are sometimes planted not just twice in a row, but four or five times in a row. Additionally, farmers compensate for the reduction in soil productivity by increasing the use of fertilizers. Thus, according to Ukrainian State Statistics Service, from 2000 to 2014, the area of crops using fertilizers increased from 22 percent to 84 percent without changes in the use of organic fertilizer (2 percent to 3 percent). Furthermore, the continued industrialization of the agricultural sector leads to the increase in field sizes (
Figure 1). As a result, more than 91% of fields in Ukraine have an area bigger than 5 ha. Even 65% of minoritarian crops that occupy 6.8% of cropland are grown within the fields with an area of 5 or more ha, while most small fields (29% of total area for other classes) are usually owned by local country folk and located in villages close to the village housings.
4. Methodology
The main objective of this study is to investigate the impact of sunflower planting in the different crop rotation schemes during previous years on the biophysical parameters of crops for the current year. This investigation can be done by the analysis of relationships between the areas of various crop rotation schemes on the averaged vegetation indices at the village level. For this purpose, we will use the common technique for the multivariate dependences’ analysis—regression analysis [
35]. To do this, the dependent and independent variables should be determined first. The dependent variable should reflect the biophysical characteristics of agricultural land for some territory. The independent variables should reflect the impact of each certain crop rotation scheme to the value of the dependent variable. The linear regression model will describe the relations between dependent variable dv and independent variables
:
where
i is the number of the crop rotation scheme, which takes values from 1 to
n (total amount of crop rotation schemes), and
and
are coefficients of regression. After assessing
values for each independent variable they can be interpreted as the estimate of the impact of
on
. If the sign of
is negative, then the impact is negative; otherwise, the impact is positive. To evaluate the significance of the coefficient
for the
, we use the f-test of overall significance [
36]. This test allows us to assess multiple coefficients simultaneously by checking the null hypothesis—that the fits of the model with and without specific coefficient are equal. For the f-test purpose, two models should be trained for each independent variable successively. If the independent variable is insignificant, the difference between the models’ outputs with and without this variable will be small. As a result, the f-test provides the
p-values for each regression coefficient—the probabilities
would get the same or larger effect if the
is equal to 0 (zero). So, the smaller
p-value for a coefficient, the more likely
is significant.
First, the space of the features used for the regression analysis
S and the dependent variables should be clarified. To do that the separate regression model for each vegetation index NDVI, EVI, LAI, FAPAR, and LSWI was created. The cumulative vegetation index
, calculated by formula (1) and averaged by village boundaries for each village council index (
ID) is proposed for use as a dependent variable. The independent variables (or features) should reflect the impact of each considered crop rotation on the vegetation index
. To conduct that, the proportion of the area of each crop rotation and the area of fixed crop in the reference (target) year aggregated at the village council level were used as the features. In this way, the contribution of different crop rotations in the value of the vegetation index was considered. Thus,
is the cumulative value of the vegetation index VI for year
y and crop type
j, averaged for village
ID. The respective vector of features
contains the proportions of area of each considered crop rotation to the total area of fields with crop type
j in the year y in the village
ID. To calculate it, the reference year
y and crop type
j for the year
y were fixed. In this way,
n available crop rotations that are sequences of crop types for
m years starting with the crop type
j in the reference year were constructed. As a result, the vector
consists of the following elements, calculated for each crop rotation
i.
where
is the area of crop rotation with number
i in the village
ID and
is the area of crop type
j in the village
ID and year
y. If the crop rotation
i will include all possible crop rotations for five years, the number of features, most of which could be present in the training set in only a few examples, will be enormous. This could lead to the overfitting of the model. In particular, analysis of available crop rotation combinations shows that 625 possible variations of crop rotations cover 80 percent of the country’s area. In Ukraine, there are only 7834 villages, and none of them include all possible crop rotations. In this situation, any regression model would have the problem of overfitting, and the results will not be adequate. To address the issue, we propose to consider two different features’ representation schemes.
The first one is representation of crop rotation by binary values for five years, in which
j for the target year (2020) and each pixel for every previous year can have two possible values:
j (or 1) or
not_j (or 0). Such representation will be named as model A1. In this research,
j corresponds to sunflowers (sf) and
not_j to not sunflowers (nsf). So, there are 16 possible crop rotations (
Table 3). We should mention that such representation is not exhaustive, because it does not take into account all possible combinations of crop rotations. For instance, crop rotation sf—nsf—nsf—nsf—nsf is not a violation of sunflower rotation rules, but in the years when sunflowers have not been planted, the rotation of other crops was not considered. It means that in the years when sunflowers were not planted, crop rotation violations for other crop types still could appear. Thus, this model reflects the impact of sunflower monocropping, but it does not measure the impact of different crop rotations.
To address the issue, taking into account different crop rotations and at the same time avoiding the problem of overfitting, let us consider a smaller number of features.
To reduce the number of crop rotation combinations for the analysis and at the same time increase the number of training points, crop rotation analysis for major crops only for three years with several different reference years will be considered. This representation will be named as model B.
Model B uses 25 crop rotations (25 features) for each value of cumulative VI for sunflowers per each village council. As a result, it will be possible to evaluate 25 possible crop rotations for each reference year and select the best and the worst crop rotations.
Table 4 illustrates the investigated crop rotation schemes in model B. This model combined the data for the years 2020-2019-2018, 2019-2018-2017, and 2018-2017-2016 in one training data set and used it for the regression function fitting. This approach, on the one hand, increases the number of training points and, on the other hand, reduces the effect of errors in the classification of extreme weather conditions for a specific year.
4.1. Regression Analysis of Crop Rotation
To analyze the impact of sunflower crop rotation we propose to use a model that represents the cumulative vegetation index for sunflowers, averaged for village ID, as a linear regression function (2) from the independent values (3).
Calculation of coefficients
could be considered as a traditional regression problem, which could be solved using an ordinary least squares approach. However, the large number of independent variables in the model leads to a collinearity problem and to potentially overfitting the model. To avoid it, the ridge regression technique was used [
37]. The technique uses L2 regularization, which causes regression coefficients to be more balanced and representative in the regression. Thus, the regression model used in our study is this:
where
is the L2 regularization penalty. To determine the best value of
, it is possible to use a cross validation technique that fits the regression function with
and selects the value
based on the best coefficient of determination (R
2 score). The
coefficients defined through the model fitting determine the influence of crop rotation
i in the biophysical characteristics of crops. The biggest values of
correspond to the best crop rotations with positive impact on VI, while the lowest values of
correspond to the worst crop rotations. As a result, it is possible to combine all crop rotations into three groups: those (a) with a negative effect on the vegetation, (b) with a positive effect on the vegetation, and (c) with a low effect on the vegetation. The usage of different vegetation indices allows us to evaluate this impact for different biophysical parameters of vegetation, such as NDVI, EVI, LAI, FAPAR, and LSWI.
The reliability of the results can be assessed with the use of the f-test of overall significance. In this case, three levels for the p-value for the regression coefficients were defined. The first is * p > 0.05, in which the variable is insignificant and the regression coefficient does not express the impact of independent variables to the dependent variable. The second is ** p > 0.01, in which the regression coefficient expresses the impact of the independent variable to the dependent variable; however, the significance is low. The third is *** p < 0.01, in which the variable is significant and the regression coefficient expresses the impact of the independent variable to the dependent variable.
4.2. Model A1 with Binary Crop Rotation Features and Analysis for Five Years
The first model—model A1—evaluates the impact of monocropping on the different vegetation indices. In this case, each crop rotation is described by the sequence of n binary values (we consider n − 1 preceding years), so for each and if sunflowers had been grown in the territory in year y, and in the other case. Model A1 describes the relation of at the village level from 16 crop possible monocropping combinations during the previous four years. The main drawback of this approach is the ignorance of the influence of other crop rotations on the vegetation indices.
4.3. Model A2 and A3 with Derived Regressors
Binary representation of monocropping schemes allows for the design of a regression model (model A2) of the crop rotation impact prediction for the
i-th monocropping scheme (
based on the number of sunflower monocropping (
for five years and the period of years without sunflowers before the reference year (
:
Model A2 uses obtained from model A1 as a dependent variable. So, if model A1 provides the evaluation of the crop rotation scheme represented as the sequence of sunflower and not sunflower plantings, model A2 fits to reproduce the result of model A1, but only with the use of statistical characteristics of sunflower appearance in the crop rotation scheme. The main advantage of this model is the small number of regressors, which allows us to avoid overfitting the model and to provide more accurate regression analysis. In addition, based on this model, it is possible to get the evaluation of crop rotation schemes that cannot be estimated from the available time-series of crop maps. Having a statistically significant model (5), we can mathematically estimate the optimal interval between the subsequent planting of sunflowers and can provide predictions (extrapolation) for a longer time period. The main problem in this model is that and are not unique characteristics of concrete crop rotation. For example, model A2 considers crop rotation with = 3 and = 1 as a crop rotation with sunflower planting every two years, so two different crop rotations such as sf-nsf-sf-nsf-sf (10101) and sf-nsf-sf-sf-nsf (10110) can have the same and values. In order to be sure of the reliability of the result, another model—model A3 was fitted. This model represents the relationships between and the time intervals between sunflower planting, in the case when all sunflower plantings in the crop rotation scheme have equal intervals. So, the A3 model can be considered a simplification of the A2 model that predicts with use of only one variable—the interval between two sunflower plantings in the crop rotation scheme. The fitting strategy is the same as for model A2—dependent values are obtained from model A1, and independent values are , from crop rotation schemes. In this set of points, there are only five schemes that can be used for fitting such a model. We assumed that for crop rotations with 0-,1-,2-, 3-, and 4-year planting intervals for 5 years will be same as for 11 years. So, the crop rotation sf-nsf-nsf-nsf-nsf (10000) will be considered as the planting of sunflower once per 5 years in 11 years of observation. This assumption can be done because the impact of crop rotations weakens over the years. So, the plantings made more than 5 years before have a much lower impact than recent plantings do. However, if the p-values of for some of the useful crop rotation schemes is too big, these crop rotation schemes cannot be used for the A3 model fitting.
4.4. Model B for Three-Year Crop Rotation Analysis out of Five Years
The second regression model—model B—takes into account all possible crop rotations during three years for the VIs of sunflower. In this, it is possible to increase the number of training samples by considering different three-year intervals of observations. So, as an output (dependable variable) of the model we consider cumulative VIs for sunflower for the years 2020, 2019, and 2018: . Model B includes 25 different regressors; each of them corresponds to the sequence of two crop types grown in the preceding two years out of the five major crop types (cereals, sunflowers, maize, soybeans, and other crops). In this case, the example of crop rotation violation is sunflower-sunflower-sunflower, and the reverse example is sunflower-soybeans-other crops.
4.5. Potential of Sentinel and Landsat Data Usage
MODIS data collection was chosen because of the need for high temporal resolution data that give the possibility to create a uniform collection of vegetation indices time series for the accurate estimation of accumulated vegetation indices for each year of interest. In our experiment, moderate spatial resolution 250 and 500 m is more than enough for crop monitoring at the village council level. However, in the future studies it is possible to improve the accuracy of the experiment by the use of higher spatial resolution data. The growth of available data source numbers as well as the development of new methods of satellite data collection harmonization can give opportunities to improve the described experiment for further years.
There are two more available satellite missions that can be used for such an experiment in the future. The first one is the Landsat mission that launched in 1972. The two most recent satellites of this mission have similar characteristics—Landsat-8 launched in February 2013 and Landsat-9 launched in September 2021. These satellites have two instruments—an Operational Land Imager (OLI) that provides 8-band images with 30 m spatial resolution and 1 panchromatic band with 15 m spatial resolution and a Thermal Infrared Sensor (TIRS) that provides 2 bands of thermal infrared specter with 100 m spatial resolution. The temporal resolution of both satellites is 16 days. The second is the Sentinel mission. The Sentinel-3A satellite was launched in the February 2016. Images obtained from the Ocean and Surface Colour Instrument, installed on the Sentinel-3 satellites, have 21 multispectral bands with 300 m spatial resolution, 1-day temporal resolution, and can be used as an alternative for MODIS data. The Sentinel-2 mission was launched in June 2015, and since March 2017 includes two satellites with multi-spectral instruments that provide 12-band images with a spatial resolution from 10 m to 60 m and temporal resolution of 5 days. Sentinel-2 optical data is an essential source of information in the agricultural monitoring applications that give the possibility to estimate all essential vegetation indices and biophysical characteristics of crops that can be used in the crop state analysis such as NDVI, EVI, LAI [
38], and others.
A lot of applications in remote sensing require a dense time-series of measurements with high temporal resolution. Thus, harmonization of multi-satellite high spatial resolution data collections is the cornerstone for the improvement of land monitoring as well as agricultural monitoring systems. Claverie et al.’s study [
39] shows the workflow that can be used for the estimation of the harmonized collection of Landsat-8 and Sentinel-2 data. The harmonization process requires the conductance of atmospheric correction, geometric resampling, geographic registration, BRDF normalization, and band pass adjustment. A good example of this harmonization method use is shown in the research by Skakun et al. [
40] on yield forecasting in Ukraine. This method has a high potential for the use of 30 m optical data for crop rotation analysis in future research, despite the need for a lot of computational resources for data processing. However, the combined and harmonized archive data of Landsat-8 and Sentinel-2 still have a lower temporal resolution, if we take into account cloud conditions, in comparison with MODIS or Sentinel-3. Kirovohradska oblast, located in central Ukraine, on average for the winter crop vegetation period (from March until the end of June) has 6 cloud-free observations for 2016, 8 for 2017, and 11 for 2018 [
41]. The absence of uniformity in the data coverage at the regional or country level causes complications in the use of the historical harmonized collection of Landsat-8 and Sentinel-2 data in the multi-year crop rotation analysis experiment. Such uniformity for these years can be explained not only by atmospheric conditions, but also by the fact that before the launch of Sentinel-2B satellite in March 2017, the temporal resolution of Sentinel-2 data was 10 days. This is why, in our experiment on the historical data from 2016 to 2020, we are using MODIS data. However, in future experiments from 2018 with 3 satellites (Landsat-8, Sentinel-2A and Sentinel-2B) or from 2022 with 4 satellites (by adding Landsat-9), the temporal resolution of harmonized high spatial resolution image collections can improve the results obtained by the methods described in the Methodology section.
As an additional data source in the crop rotation analysis, we can use the Sentinel-1 mission that was launched in the April 2014. The spatial resolution of Sentinel-1 is 20 m for the Ground Range Detected (GRD) data used for the land monitoring applications. Usage of Synthetic Aperture Radar data usually require the usage of complex data processing workflows that include calibration, geometric correction, terrain correction, and resampling to 10 m spatial resolution. Due to the presence of noise that seriously influence the quality of data, an additional filtering step with a refined Lee algorithm is required [
42] to acquire qualitative data. As a result, it is possible to obtain SAR data with VV and VH polarization, 10 m spatial resolution, and 6-day temporal resolution that can be used for crop phenology estimation [
42] and crop classification [
35]. In addition, Filgueiras et al.’s [
43] study showed high dependences between Sentinel-1 SAR indices and NDVI. This study shows that it is possible to model Sentinel-2 NDVI synthetic data with the usage of regression functions based on the VV and VH characteristics of Sentinel-1, expanding in this way the time-series of Sentinel-2 vegetation indices. If we consider that Sentinel-1 is an active sensor that is not vulnerable to the clouds, these synthetic data can be used for missing value recovery or as an alternative to absent Sentinel-2 images due to the atmospheric conditions.