Cropland Mapping Using Sentinel-1 Data in the Southern Part of the Russian Far East

Crop identification is one of the most important tasks in digital farming. The use of remote sensing data makes it possible to clarify the boundaries of fields and identify fallow land. This study considered the possibility of using the seasonal variation in the Dual-polarization Radar Vegetation Index (DpRVI), which was calculated based on data acquired by the Sentinel-1B satellite between May and October 2021, as the main characteristic. Radar images of the Khabarovskiy District of the Khabarovsk Territory, as well as those of the Arkharinskiy, Ivanovskiy, and Oktyabrskiy districts in the Amur Region (Russian Far East), were obtained and processed. The identifiable classes were soybean and oat crops, as well as fallow land. Classification was carried out using the Support Vector Machines, Quadratic Discriminant Analysis (QDA), and Random Forest (RF) algorithms. The training (848 ha) and test (364 ha) samples were located in Khabarovskiy District. The best overall accuracy on the test set (82.0%) was achieved using RF. Classification accuracy at the field level was 79%. When using the QDA classifier on cropland in the Amur Region (2324 ha), the overall classification accuracy was 83.1% (F1 was 0.86 for soybean, 0.84 for fallow, and 0.79 for oat). Application of the Radar Vegetation Index (RVI) and VV/VH ratio enabled an overall classification accuracy in the Amur region of 74.9% and 74.6%, respectively. Thus, using DpRVI allowed us to achieve greater performance compared to other SAR data, and it can be used to identify crops in the south of the Far East and serve as the basis for the automatic classification of cropland.


Introduction
The transition to digital agriculture can be achieved by addressing the challenges associated with the automated recognition of crops growing in a given area [1].In recent years, research at both the national and global scale has focused on creating regional cropland masks of different regions, as well as masks of individual crops [2,3].Machine learning (ML) methods are actively used to solve these problems.Typically, input data for these ML models are seasonal time series of various indices (especially optical) that are obtained after processing images.This is because time series usually provide higher accuracy in mapping than an index calculated from a single image [4][5][6].
Typically, to identify agricultural crops, time series of optical vegetation indices (VI) and traditional ML classifiers, such as decision trees, Support Vector Machines (SVM), and Random Forest (RF) algorithms are used.An example of this is a study conducted in the Tashkent province, Uzbekistan, in which researchers classified several crops (cotton, wheat, rice, other crops) using a series of monthly Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Normalized Difference Water Index (NDWI1 and NDWI2) composites, based on Landsat-8 and Sentinel-2 imagery [7].The highest accuracy was 90% (Kappa = 0.88) when using EVI composites derived from Landsat-8 data.Hu Sensors 2023, 23, 7902 2 of 22 et al. performed retrospective restoration of crop rotation using the RF algorithm [8].They trained the classifier on percentile monthly Sentinel-2 composites in 2020 to map cropland in 2019.The possibility of successful classification 2 months before the end of the growing season was shown.The overall accuracy of retrospective forecasting was 92%.In this study, monthly bands and VI time series were used.The use of a large number of correlating parameters (in the model [8], 90 parameters were included) in the classification model requires a large number of calculations and may be excessive.
A serious disadvantage of optical data used in the discussed works is the strong dependence of satellite data quality on weather conditions.Clouds, shadows from clouds, and the presence of aerosols in the atmosphere may significantly reduce the potential of optical sensors.A large number of cloudy images leads to substantial gaps in VI time series built using optical imagery.
Some researchers have used the approximation of the seasonal course of VI to bridge gaps in remote sensing data.Polynomials, Gauss function, and the double logistic function [9][10][11] are most often used as approximate functions.In Uruguay, scientists developed a soybean growth model and predicted the NDVI maximum using polynomials and the double logistic function for fields with an area of more than 250 hectares [12].In the Samara region of Russia, a group of researchers [13] approximated the seasonal course of NDVI using a piecewise function, Fourier series, and cubic splines.
SAR (Synthetic Aperture Radar) data, independent of lighting and weather conditions, is a reliable alternative for long-term monitoring of crop rotation.Sentinel-1A/B satellites sensors allow the observation of a significant part of the Earth and receive double polarization data every 12 days (every 6 days when using both satellites).This is sufficient to build continuous time series with a sufficiently large number of observations for the season.Therefore, radar data obtained from Sentinel-1 has recently been actively used for crop mapping and the identification of abandoned cropland.For instance, Qiu et al. [14] used time series of VV polarization to detect woody abandoned cropland in China.However, to determine grass fallows, single polarization was not enough, and the authors additionally used optical data.In the majority of studies, scientists have used either the VH or VV polarizations, or even their ratio (VH/VV).For example, Song et al. [15] determined the proportion of pixels of soybean and corn (three classes per crop: small, middle, and large) in experimental polygons throughout the United States.Sentinel-1 VV, VH, and VV/VH time series for one agricultural season were used.The use of three predictors in a decision tree classifier achieved an accuracy greater than 94% for both crops.Arias et al. [16] attempted to identify 14 crops based on Sentinel-1 radar data (three sets of polarizations) at the level of the province of Navarra in Spain, as well as that of individual municipalities.The accuracy of the classification using Sentinel-1 did not exceed 70%.Attempts have also been made to merge radar and optical data for crop mapping [17,18], where Landsat-8 (or Sentinel-2) and Sentinel-1 imagery were used as initial data and the RF algorithm was used as a classification method.In both of these studies, significant sparsity in the data was observed, and linear interpolation greatly decreased the accuracy of the classification.
Time series for several years make it possible to observe crop rotation.For example, a group of scientists [19] used the Dynamic Time Warping method and linear interpolation to equalize data for two years when classifying nine crops in China based on the combination of polarizations.
Despite the frequent and successful use of individual Sentinel-1 polarizations for crop mapping, the use of radar VIs (by analogy with well-known optical VIs) appears to be more promising because they are constructed for monitoring vegetation.The Radar Vegetation Index (RVI) is the most common radar index [20].Evidence suggests that this index is highly sensitive to the dynamics of plant growth; therefore, it can likely be used to assess crop phenology and the characteristics of vegetation and perform classification [17,21].Bickensdorfer et al. [22] classified croplands in Germany using RVI, VV, VH and the ratio of VV/VH monthly composites.Classification accuracy while using four time series (12 values) was 65%.To increase the accuracy to 80%, the authors had to additionally The Dual-polarization Radar Vegetation Index (DpRVI) is based on the transformations of complex polarimetric radar Level-1 Single Look Complex (SLC) data [24].Processing is carried out by applying one of the polarimetric decomposition methods [25][26][27].At the same time, a comprehensive polarimetric response of the signal from the object to the components, which characterizes the contribution of a particular radar signal (single, double, or volumetric), is obtained.Information about scattering is characterized by the degree of polarization and the measure of the dominant scattering mechanism.Based on these calculated indicators, as shown by [24], DpRVI becomes more sensitive to crop growth and is used as a relatively simple and physically interpretable vegetation descriptor.
Croplands in different regions have significant differences in their spectral characteristics, which leads to inaccurate forecasts even if the classifier is well-trained [28].In particular, a number of researchers, including those from Russia, have faced the problem of incorrect results from the model when it was transferred to regions where crops had a different phenology and the temporal spectral profile of the crops was changing [29,30].
In recent years, the challenges associated with creating a cropland mask have been addressed at the Space Research Institute in Russia.Land classification is carried out using NDVI composites on cropland (spring and winter crops separately), forest, grassy vegetation, swamps, and tundra.Miklashevich et al. [30] achieved high accuracy in land mapping for the European regions of Russia.The disadvantage of large-scale maps is the significantly uneven quality of the classification between regions (in particular, Miklashevich et al. [30] noted an increase in error in the Russian Far East).It can also be noted that although most of these maps are aimed at determining the land types and the creation of a cropland mask, they do not provide any information about crop rotation on the allocated lands.Moreover, since different regions are characterized by different sets of agricultural crops and the variety of climatic conditions leads to variation in phenology, the task of separating crops based on VI can be extremely difficult even if the classifier was well performed in one region.Therefore, the most productive approach is the development of accurate local maps indicating the growing crops within the region or in a few regions characterized by similar climatic conditions and agricultural practices.
The main crop in the Russian Far East is soybean.Soybean occupies 60-90% of the cropland area in various municipalities [31].It should be noted that the increase in soybean acreage is not as much due to the commissioning of new lands to crop rotation but due to repeated sowing.This is unacceptable in terms of the recommended crop rotations, where its specific share should not exceed 50%.The placement of soybean in crop rotation patterns contributes to an increase in yield by an average of 36.4% compared to that in the case of permanent cultivation.However, repeated sowing of this crop in crop rotations reduces its yield by 27.2% [32].To ensure the optimal use of the cropland, soybean is rotated with grain crops, including oat [33,34].
However, there are a lot of abandoned croplands in the Russian Far East.Currently, the total fallow area in Russia is up to 40 million ha.According to the data obtained from the agricultural census conducted in 2016, nearly 17.7 million ha of fallow land are owned by agricultural enterprises yet remain unclaimed by them.According to the Ministry of Agriculture of the Russian Federation, this area was approximately 15.3 million ha in 2019 [35].Such lands include both bushy and forested land, as well as swampy fallow land.Defining fallow lands and introducing them into the crop rotation system are among the most important tasks for the agricultural sector in the country.
Thus, the main objective of this study is to identify the main crop of the region (soybean) used in crop rotation, as well as oat and fallow lands, by developing cropland Further, it aims to assess the possibility of using best classifiers for other municipalities in the Russian Far East, and compare the differences in the accuracy of determining these classes across these regions.

Study Areas
The municipal districts of the southern part of the Russian Far East in the Amur River basin were chosen as the study areas.Study area 1 (SA1) is the cropland of the Khabarovskiy District, situated on the right bank of the Amur River in the vicinity of the city of Khabarovsk.It is located between 48.31 • N and 48.64 • N and between 134.81 • E and 135.57• E of the Greenwich meridian (Figure 1).It is characterized by favorable climate and soil conditions, which makes it possible to grow grain and leguminous crops.
Defining fallow lands and introducing them into the crop rotation system are among the most important tasks for the agricultural sector in the country.
Thus, the main objective of this study is to identify the main crop of the region (soybean) used in crop rotation, as well as oat and fallow lands, by developing cropland classifiers for the Khabarovsk District based on different DpRVI time series and ML methods.Further, it aims to assess the possibility of using best classifiers for other municipalities in the Russian Far East, and compare the differences in the accuracy of determining these classes across these regions.

Study Areas
The municipal districts of the southern part of the Russian Far East in the Amur River basin were chosen as the study areas.Study area 1 (SA1) is the cropland of the Khabarovskiy District, situated on the right bank of the Amur River in the vicinity of the city of Khabarovsk.It is located between 48.31° N and 48.64° N and between 134.81° E and 135.57°E of the Greenwich meridian (Figure 1).It is characterized by favorable climate and soil conditions, which makes it possible to grow grain and leguminous crops.Three districts (Arkharinskiy, Ivanovskiy, and Oktyabrskiy) are located on the territory of the Zeya-Bureinskaya Plain (Amur Region).About 75% of the territory of the Zeya-Bureya Plain of the Amur Region has been converted into agricultural landscapes.The total area of cropland in the territory of the Zeya-Bureya plain is 1.32 million ha.Cropland Three districts (Arkharinskiy, Ivanovskiy, and Oktyabrskiy) are located on the territory of the Zeya-Bureinskaya Plain (Amur Region).About 75% of the territory of the Zeya-Bureya Plain of the Amur Region has been converted into agricultural landscapes.The total area of cropland in the territory of the Zeya-Bureya plain is 1.32 million ha.Cropland is the most important type of agricultural land and the main wealth of the region.Study area 2 (SA2) is located between 49.28 • N and 50.61 • N, 127.76 • E and 130.05 • E of the Greenwich meridian (Figure 2).Relatively high temperatures and the length of the vegetation season make it possible to grow good and stable yields of cereals, industrial, and other crops [36].
is the most important type of agricultural land and the main wealth of the region.Study area 2 (SA2) is located between 49.28° N and 50.61°N, 127.76°E and 130.05° E of the Greenwich meridian (Figure 2).Relatively high temperatures and the length of the vegetation season make it possible to grow good and stable yields of cereals, industrial, and other crops [36].The study areas are characterized by a monsoonal climate with cold snowy winters (the average January temperature varies from −25 °C in study area [SA] 2 to −20 °C in SA1) and humid warm summers (the average July temperature in the study areas exceeds 21 °C).The average annual rainfall is at the level of 600-700 mm per year.

DpRVI Calculation
To calculate DpRVI, Sentinel-1 Level-1 SLC images were acquired for SA1 from one track (one scene with relative orbit number 90), but for SA2 from two tracks (two scenes with relative orbit number 105 and one scene with relative orbit number 134), as shown in Table 1.The data were obtained from the Alaska Satellite Facility Distributed Active The main agricultural crop in the southern part of the Russian Far East is soybean; in 2021, soybean occupied 55% (9381 ha) in the Khabarovskiy District, 79% (28,596 hectares) in the Arkharinskiy District, 76% (82,425 ha) in the Ivanovskiy District, and 79% in the Oktyabrsky district (78,899 ha) of the total area of arable land.The land occupied by oats is 7% (1232 ha), 5% (1851 ha), less than 1% (865 ha), and just over 1% (1380 ha) in the Khabarovskiy District, Arkharinsky District, Ivanovsky District, and Oktyabrskiy District, respectively.

Data 2.2.1. DpRVI Calculation
To calculate DpRVI, Sentinel-1 Level-1 SLC images were acquired for SA1 from one track (one scene with relative orbit number 90), but for SA2 from two tracks (two scenes with relative orbit number 105 and one scene with relative orbit number 134), as shown in Table 1.The data were obtained from the Alaska Satellite Facility Distributed Active Archive Center (https://search.asf.alaska.edu/(accessed on 14 January 2023), which contains modified Copernicus Sentinel data from 2015, processed by the European Space Agency.The Geopandas package was used for raster data handling.Two scenes with track number 105 between 2 May and 29 October 2021 were selected.The scene with track number 134 was acquired between 4 May and 31 October 2021, and the scene with track number 90 was acquired between 1 May and 28 October 2021.These data were used in the processing Terrain Observation with Progressive Scans SAR (TOPSAR) mode Interferometric Wide (IW) swath products.The swath length is 250 km and the spatial resolution is 5 × 20 m in this configuration.Swath in the IW mode is divided into three sub-swaths (IW1, IW2, and IW3).Each sub-swath consisted of 9 bursts in the azimuth direction.SA1 is within one sub-swath of scene 90.SA2 is located in three zones, marked with red rectangles with a dash-dotted line in Figure 3. Fields that are in the western part are included in the zone of intersection of two scenes from the Sentinel-1B satellite, while in the eastern part they are concentrated in one scene.Thus, pre-processing of the radar image time series data was carried out using two different workflows, with standard correction steps as follows [37]   All workflows were processed using the Graph Processing Framework (GPF) on the ESA Sentinel Application Platform (SNAP) v.8.0 (http://step.esa.int/main/(accessed on 14 January 2023)).This allows the user to create graphs and set up chains for batch processing, as shown in Figure 4. Pairs of SAR images for the same date were fed as input.All workflows were processed using the Graph Processing Framework (GPF) on the ESA Sentinel Application Platform (SNAP) v.8.0 (http://step.esa.int/main/(accessed on 14 January 2023)).This allows the user to create graphs and set up chains for batch processing, as shown in Figure 4. Pairs of SAR images for the same date were fed as input.All workflows were processed using the Graph Processing Framework (GPF) on the ESA Sentinel Application Platform (SNAP) v.8.0 (http://step.esa.int/main/(accessed on 14 January 2023)).This allows the user to create graphs and set up chains for batch processing, as shown in Figure 4. Pairs of SAR images for the same date were fed as input.In Module 1, the operator S1 TOPS Split was used to select sub-swaths and bursts depending on the location of the study area.Slice Assembly combined parts of different scenes into a single product.Further, the precise orbit state vector files, which contained accurate information about the satellite position and speed, were downloaded and applied.Then, radiometric calibration was performed, during which it was important to keep the data in a complex-valued format because complex values were needed to calculate the covariance matrix for dual-polarization C2 in the subsequent steps.Next, in Module 2, the S-1 TOPS Deburst operator was applied to produce a single SLC image.These image subsets were multi-looked with factors of 3 and 1 in the range and azimuth directions, respectively, to generate a square pixel.The spatial resolution of the output was 10 m.Then, for each date for the entire data stack, a dual-polarization 2 × 2 covariance matrix C2 was formed, as shown in Equation (1): where * indicates complex conjugation and indicates the spatial average over a moving window.
All elements of the matrix C 2 were additionally subjected to the noise mitigation procedure using the refined Lee adaptive filter [38].The elements of the matrix were complex values containing all of the information about the polarimetric scattering properties.The polarimetric decomposition technique was then used to obtain the parameters of the state of polarization of an electromagnetic wave, which was characterized by the degree of polarization m [24,39] and the measure of the dominant scattering mechanism β of the reflecting target as follows: where Tr represents the sum of the diagonal elements of the matrix, || represents the determinant of a matrix, m is the degree of polarization (0 ≤ m ≤ 1), which is defined as the ratio of the average intensity of the polarized portion of the wave to the average total intensity of the wave, and β refers to the measure of the dominant scattering mechanism, which is determined based on the decomposition of the matrix C 2 into two non-negative eigenvalues (λ 1 ≥ λ 2 ≥ 0), as follows: Further, the DpRVI index was calculated for each date using Equation ( 4): In Module 3, all Sentinel-1 images were co-registered as a stack using the Sentinel-1 Back Geocoding operator and subsetted based on the area of interest.Following this, SAR stack data were geocoded to a WGS84/UTM zone 52N projected coordinate system using the Range Doppler Terrain correction operator to yield a DpRVI time-series product in GeoTiff format.
For single sub-swath processing (e.g., IW3), Module 1 was modified as shown in Figure 5. Since all studied fields were in one IW3 for track 134, Modules 2 and 3 remained the same as for the first workflow.It must be noted that when several sub-swaths (e.g., IW1 and IW2) are processed, Module 2 can be modified, as shown in Figure 6.The S-1 Back Geocoding and S-1 TOPS Merge operators can be added to merge separated products of different sub-swaths into a single image.It must be noted that when several sub-swaths (e.g., IW1 and IW2) are processed, Module 2 can be modified, as shown in Figure 6.The S-1 Back Geocoding and S-1 TOPS Merge operators can be added to merge separated products of different sub-swaths into a single image.It must be noted that when several sub-swaths (e.g., IW1 and IW2) are processed, Module 2 can be modified, as shown in Figure 6.The S-1 Back Geocoding and S-1 TOPS Merge operators can be added to merge separated products of different sub-swaths into a single image.the quality of the model.Information on crop rotation, obtained from ground-based observations, was additionally verified by expert visual interpretation of satellite images.Overall, pixels from 80 fields were divided into three classes: soybean, oat, and fallow.Pixels of 40 fields were used to build a model, pixels of another 40 sites were used to evaluate model accuracy.
Table 2 provides information on the number of fields and areas occupied by the different crops.It can be seen that half of the fields (20) from the training sample in 2021 were occupied by oat; however, soybean was planted in more large fields, and thus the total area covered by soybean exceeded the area covered by oat.The share of fallow land in the studied fields of the Khabarovskiy District exceeded 35%.Labeled DpRVI time series for pixels of agricultural fields in the districts of the Amur Region were used to evaluate the quality of the model when transferred to a neighboring region.To form a test sample, we selected abandoned fields, as well as fields with soybean and oat, in each of the three districts of the Amur Region: Arkharinskiy, Ivanovskiy, and Oktyabrskiy.To mark up the test data, we used information on crop rotation for the fields from the Unified Federal Information System of Agricultural Lands (https://efis.mcx.ru/landing (accessed on 26 December 2022)).During the preliminary visual interpretation of satellite images, we discovered that some of the fields were in the flood zone, while for other fields, information about the growing crop was deemed unreliable.
Overall, 29 fields were used to form an Amur region sample.Table 3 provides information on the number of fields and areas of land included in this sample.

Datasets
As shown in Table 1, there was no Sentinel-1 image for track 105 on 18 August 2021 and for track 134 on 3 July 2021.DpRVI time series were approximated using a cubic polynomial to restore gaps.The dates of observations converted to DOY (day of the year) format were considered an independent variable.DpRVI data values were used as the dependent variable.The polynomial coefficients were determined for each time series and then the function values were calculated at the points of interest: 230 (18 August) for track 105 and 184 (3 July) for track 134.Thus, after data recovery, the time series included 16 observations.
Despite the fact that DpRVI data are practically independent of atmospheric phenomena, anomalous values of this index can be observed in some parts of the fields.To minimize the influence of such noises, "2σ-filtering" was performed.For each field and for each date, the mean value and standard deviation (σ) were calculated for DpRVI.Pixels with more than half of the values of the time series that did not lie in the 2σ-interval were considered anomalous and were not included in the training set.After removing anomalous pixels from the sample, a training dataset was formed, which consisted of 71,153 DpRVI time series.The test dataset for Khabarovsk District was not filtered, and its size was 30,550 time series for DpRVI.Then, DpRVI time series for Amur region fields were calculated and the Amur region dataset was formed.The Amur region set was also not filtered and included 203,120 time series.The distribution of time series between classes in both samples is presented in Table 4.

ML Methods
Classification was performed using the Python scikit-learn package.This package contains different machine learning approaches for supervised classification, such as tree algorithms, discriminant analysis, Bayes classifiers, nearest-neighbor techniques, ensemble classifiers, and neural networks.This package contains out-of-the-box implementations and does not require thorough tuning.We selected three machine learning algorithms for cropland classification: SVM, RF, and Quadratic Discriminant Analysis (QDA).The GridSearchCV method (scikit-learn) was applied to optimize SVM and RF hyperparameters during cross-validation.Random state controlled the pseudo random number generation for shuffling the data and reproducible output across multiple function calls.

Support Vector Machines
SVM is a set of supervised learning methods used for classification, regression, and outlier detection.The advantages of SVMs include effectiveness in high dimensional spaces, memory efficiency (they use a subset of training points in the decision function, which are called support vectors), and versatility (different kernel functions can be specified for the decision function).The objective of the SVM is to find a hyperplane with the maximum margin (the maximum distance between data points of classes) in an N-dimensional space (N represents the number of features).LinearSVC is a fast implementation of Support Vector Classification in scikit-learn for the case of a linear kernel.This classification method is widely used in remote sensing for crop type mapping [40-42].

Random Forest
RF [43] is a meta estimator that fits a number of decision tree classifiers on various sub-samples of the dataset and makes final decisions using majority votes to improve the predictive accuracy and control over-fitting.The features are always randomly permuted at each split.Therefore, the best split may vary even with the same training data.To obtain deterministic behavior during fitting, a random state has been fixed.Evidence suggests that RF is effective for crop mapping [18,22,44].Moreover, it is known for performing particularly well and efficiently on large input datasets with many different features.Another advantage of RF is its high accuracy and robustness to outliers and noise [40].

Quadratic Discriminant Analysis
QDA is the classic classifier, with a quadratic decision surface.It is attractive because it has closed-form solutions that can be computed easily, is inherently multiclass, has been proven to work well in practice [45], and has no hyperparameters to tune.QDA can be derived from simple probabilistic models that model the conditional distribution of the data for each class [46].More specifically, QDA models data distribution as a multivariate Gaussian distribution.To train a classifier, the fitting function estimates the parameters of a Gaussian distribution for each class.

Classification Performance Evaluation
The accuracy metrics for each class, namely, overall accuracy (OA) and F1 metrics, were calculated based on a confusion matrix.A confusion matrix is a square matrix of dimensions n × n (where n is the number of classes).In the confusion matrix, each row represents an actual class, and each column represents a predicted class.Each element of the matrix X ij denotes the number of time series of class i that fall into class j during classification.The diagonal elements of the matrix reflect the number of correctly classified time series.The OA is the ratio of the sum of diagonal elements to the sum of all elements of the error matrix N (expressed as a percentage): The confusion matrix presents both the number of time series in class i (i ∈ [1,n]) erroneously assigned to other classes (false negative time series, FN) and the number of pixels erroneously assigned to class i (false positive time series, FP) by the classifier.The F-measure for class i is a harmonic estimate of the FN and FP quantities: To assess the model's performance, OA and F1 metrics were calculated for different ML algorithms (RF, SVM, QDA).The accuracy metrics OA test , F1 soy test , F1 oat test , F1 f allow test were evaluated.The best-performing method was selected based on the analysis of the accuracy metrics.This method was then used in the classification of land in the Amur Region.To assess the performance of the classifier in the neighboring region, the metrics OA Amur , F1 soy Amur , F1 oat Amur , F1 f allow Amur were calculated.To assess the efficiency of the classifier, it is also important to determine the number and proportion of fields in which crops were identified correctly.The number of time series assigned by the classifier to a particular class was counted to identify the crops in each field.Each field was assigned the label of the class with the greatest number of time series.We calculated the number and proportion of fields with correctly identified crops to assess classification accuracy at the field level.

DpRVI Seasonal Course
Figure 7 shows the seasonal course of DpRVI for soybean, oat, and fallow fields in the Khabarovsk District and the averaged course for the three districts of the Amur region that were studied.Table 5 shows the mean DpRVI maxima values (with variation among fields) and maxima dates (DOY) for the studied districts by crops.
Figure 7a,b show that soybean sowing in the Far East was carried out at the end of May and in the first half of June.However, it can be noted that in the Khabarovskiy District, a grass cover was formed on the fields by the beginning of June.In areas of the Amur Region, as a result of soil treatment with herbicides, the cover was clearly not formed.As soybean grows in the fields, the DpRVI value was also increased.The values of the DpRVI maximum varied insubstantially, from 0.58 for the Oktyabrskiy District to 0.61 for the Ivanovskiy District.The maximum was reached in the second half of September (DOY 259 (16 September) for the Khabarovskiy District and DOY 262-266 (19)(20)(21)(22)(23) for the districts of the Amur Region).From the end of September onwards, the DpRVI values began to decrease rapidly due to the wilting of plants.At this time, the soybean was ready for harvest, but the exact date of harvest often depends on weather conditions (in rainy seasons, harvest was only done at the beginning of winter).

DpRVI Seasonal Course
Figure 7 shows the seasonal course of DpRVI for soybean, oat, and fallow fields in the Khabarovsk District and the averaged course for the three districts of the Amur region that were studied.Table 5 shows the mean DpRVI maxima values (with variation among fields) and maxima dates (DOY) for the studied districts by crops.Oat sowing was carried out in May; however, in the Khabarovskiy District, there was an early increase in DpRVI values before sowing the crop (Figure 7c,d).This growth can be caused not only by the rapid growth of weeds in May but also by incomplete harvesting in the previous agricultural season-the seeds left on the field yield early shoots.The DpRVI maximum for oat was reached in July and DpRVI reached a plateau during this period.Therefore, DOY of the maximum can vary significantly among fields-from the beginning of July in the Ivanovskiy District to the end of July in the Khabarovskiy District.The values of the maximum also varied considerably, from 0.55 in the Arkharinskiy District to 0.7 in the Ivanovskiy District.In the entire study area, oat harvesting was carried out in late July-August.Further growth of DpRVI in some fields may be caused by the growth of grasses after harvesting oat, which is especially true for fields where oats grow together with perennial grasses.DpRVI course in fallow land did not show distinct peaks due to a lack of planting and harvesting during the entire study period (Figure 7e,f).Nevertheless, various natural and anthropogenic phenomena can lead to some anomalies in the growth of grass cover on abandoned cropland.Due to the heterogeneity of the fallows and the minor variation in the DpRVI during the season, the DpRVI maximum in different areas varied slightly (from 0.63 to 0.65); the maximum DOY can occur on any date during the season.

ML Hyperparameter Optimization
Based on the grid search results, RF hyperparameters were chosen during crossvalidation.A Gini function was applied to measure the quality of a split.The minimum number of samples required to split an internal node was 2. The most important parameter influencing cross-validation accuracy was the number of trees.Figure 8 shows the crossvalidation accuracy dynamics with the increasing number of trees.Accuracy increased slightly after 20 trees, peaked at 93.5% at 120 trees, and then overfitting was observed.Thus, 120 trees were chosen as the optimal value.Increasing this parameter further does not improve accuracy.The SVM method also has a set of hyperparameters.A grid search revealed that the balanced class weight insubstantially increased the accuracy.Balanced weight uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data.L2 norm was used in the penalization, and squared hinge was the loss function.The maximum number of iterations was increased to 10,000.Figure 9 shows the tuning of regularization parameter C. It is a loss term that makes sure that that the weights classify the training data points correctly.The highest cross-validation accuracy (93.5%) was achieved with a standard value (C = 1), which means that changes in this hyperparameter decreased the accuracy.The SVM method also has a set of hyperparameters.A grid search revealed that the balanced class weight insubstantially increased the accuracy.Balanced weight uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data.L2 norm was used in the penalization, and squared hinge was the loss function.The maximum number of iterations was increased to 10,000.Figure 9 shows the tuning of regularization parameter C. It is a loss term that makes sure that that the weights classify the training data points correctly.The highest cross-validation accuracy (93.5%) was achieved with a standard value (C = 1), which means that changes in this hyperparameter decreased the accuracy.

Khabarovsk District Classification Results
Table 6 presents the values of the accuracy metrics on the Khabarovsk District test set for different ML algorithms with optimized hyperparameters.The best performance was shown by the RF algorithm.RF OA test was 82.0%.F1 soy test exceeded 0.9, which indicates that the number of classifier errors was small.The value of F1 oat test and F1 f allow test was 0.78.From the point of view of the classifier, the oat class was most similar to fallow; 31% of oat time series were assigned to fallow.SVM metric values were similar to RF. OA test for the SVM with a linear kernel was 81.9%, QDA OA test was 81.1%, and the value of F1 oat test and F1 f allow test for QDA were 0.76 and 0.77, respectively.
The SVM method also has a set of hyperparameters.A grid search revealed that the balanced class weight insubstantially increased the accuracy.Balanced weight uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data.L2 norm was used in the penalization, and squared hinge was the loss function.The maximum number of iterations was increased to 10,000.Figure 9 shows the tuning of regularization parameter C. It is a loss term that makes sure that that the weights classify the training data points correctly.The highest cross-validation accuracy (93.5%) was achieved with a standard value (C = 1), which means that changes in this hyperparameter decreased the accuracy.Figure 10 shows the confusion matrix for the RF.The proportion of correctly identified time series was: 94% for fallow, 97% for soybean and 68% for oat.Further, 31% of oat time series were attributed by RF to the fallow class, and among the time series assigned by the classifier to the fallow class, the share of the oat was 27%.
We evaluated our classification results in comparison to the results of prior studies that used radar data.For example, the use of various combinations of polarizations and the RVI index by [22] when training the RF classifier made it possible to achieve an accuracy of 66%.In a study by [23], when classifying cropland in the Netherlands, OA was 69.5%.A combination of different polarizations and double polarization VIs was used as parameters for training by the nearest-neighbor method.Zhou et al. [47] achieved the highest overall classification accuracy (85%) using RF (VH-polarization); the accuracy of soybean determination was 88%.
Thus, the performance of our approach is quite high.For an additional assessment of the quality of the model, RF was applied on a sample from another region with similar climatic and soil conditions.

Test Results in Amur Region
The RF classifier was applied to label the DpRVI time series of the fields in the test sample located in the Amur Region.The values of the metrics for assessing the efficiency of the classification of a test sample are presented in Table 7.When using the classifier in another region, OA Amur was 83.1%.F1 for soybean decreased by 0.06, but for fallow and oat it increased by 0.06 and 0.01, respectively.1 0.78 0.78 0.77 Figure 10 shows the confusion matrix for the RF.The proportion correctly identified time series was: 94% for fallow, 97% for soybean and 68% for oat.Further, 31% of oat time series were attributed by RF to the fallow class, and among the time series assigned by the classifier to the fallow class, the share of the oat was 27%.   Figure 11 shows the RF confusion matrix for the test set in Amur Region.The proportion of correctly classified time series was 87% for the fallow and soybean and 75% for oat.Due to regional differences in seasonal DpRVI profiles, 14% oat time series were identified as fallow.There were also some difficulties in the separation of soybean and oat-11% of the oat time series were recognized as soybean.In general, however, we can say that when using the classifier on the fields of another region, there was no decrease in accuracy.

Crop Identification at the Field Level
Table 8 presents the results of crop identification in the fields of the Amur Region.The overall identification accuracy was 79% (6 errors per 29 fields).All fields where soybean grew, as well as the fallow land, were recognized correctly.

Identification at the Field Level
Table 8 presents the results of crop identification in the fields of the Amur Region.The overall identification accuracy was 79% (6 errors per 29 fields).All fields where soybean grew, as well as the fallow land, were recognized correctly.Figure 12 shows examples of crop identification in the fields.Figure 12a shows two soybean and four abandoned fields, located in Arkharinskiy District.Out of 12,413 soybean field pixels, 10,148 (82%) were recognized correctly.Problematic areas in the fields were identified as a fallow.Similarly, 7916 out of 8497 fallow land pixels were identified correctly (93%).The detection of growing crops in individual pixels of agricultural fields may be associated with the germination of crops sown in previous years.Figure 12c shows the example of correct oat detection in two fields (total area was 463 ha) in Ivanovskiy District.RF detected 40,317 oat pixels (94% of total amount).Black sites (fallow land) indicated wetlands.Figure 12d shows an example of correct identification of abandoned land in Ivanovskiy District.RF identified 32,203 out of 36,362 time series (88.6%) as fallow  Figure 12 shows examples of crop identification in the fields.Figure 12a shows two soybean and four abandoned fields, located in Arkharinskiy District.Out of 12,413 soybean field pixels, 10,148 (82%) were recognized correctly.Problematic areas in the fields were identified as a fallow.Similarly, 7916 out of 8497 fallow land pixels were identified correctly (93%).The detection of growing crops in individual pixels of agricultural fields may be associated with the germination of crops sown in previous years.Figure 12c shows the example of correct oat detection in two fields (total area was 463 ha) in Ivanovskiy District.RF detected 40,317 oat pixels (94% of total amount).Black sites (fallow land) indicated wetlands.Figure 12d shows an example of correct identification of abandoned land in Ivanovskiy District.RF identified 32,203 out of 36,362 time series (88.6%) as fallow land.Meanwhile, only four of ten oat were correctly identified (examples of incorrect identification are presented in Figure 12b).Five incorrectly identified fields were recognized as soybean, one field was recognized as fallow land.Five of six fields were located in the Arkharinskiy District.Figure 13 shows the DpRVI seasonal course for oat in the training set (Khabarovskiy District), in the Ivanovskiy District (plus one field in the Oktyabrskiy District), where oat fields were recognized correctly, and in the Arkharinskiy District, where the classifier made a mistake.In the Khabarovskiy and Ivanovskiy districts, the course of DpRVI for most of the season had similar trends (peak in July and decrease in values after harvesting).Oat in the Arkharinskiy District was characterized by low DpRVI in summer and a sharp increase in autumn with the formation of a second peak, temporally coinciding with the maximum in soybean, which led to incorrect classification.The presence of the second peak indicated the growth of the second crop, which was not documented in the crop database.It is known that perennial grasses are sown with crops; sowing seeds of timothy grass, clover, and alfalfa with oat increases its productivity and therefore is a fairly common practice [48].However, the presence of an additional crop that significantly affects the DpRVI seasonal course led to classification errors.In further studies, it is planned to separate the DpRVI time series of fields with overseeding of perennial grasses into a separate class.Meanwhile, only four of ten oat fields were correctly identified (examples of incorrect identification are presented in Figure 12b).Five incorrectly identified fields were recognized as soybean, one field was recognized as fallow land.Five of six fields were located in the Arkharinskiy District.Figure 13 shows the DpRVI seasonal course for oat in the training set (Khabarovskiy District), in the Ivanovskiy District (plus one field in the Oktyabrskiy District), where oat fields were recognized correctly, and in the Arkharinskiy District, where the classifier made a mistake.In the Khabarovskiy and Ivanovskiy districts, the course of DpRVI for most of the season had similar trends (peak in July and decrease in values after harvesting).Oat in the Arkharinskiy District was characterized by low DpRVI in summer and a sharp increase in autumn with the formation of a second peak, temporally coinciding with the maximum in soybean, which led to incorrect classification.The presence of the second peak indicated the growth of the second crop, which was not documented in the crop database.It is known that perennial grasses are sown with crops; sowing seeds of timothy grass, clover, and alfalfa with oat increases its productivity and therefore is a fairly common practice [48].However, the presence of an additional crop that significantly affects the DpRVI seasonal course led to classification errors.In further studies, it is planned to separate the DpRVI time series of fields with overseeding of perennial grasses into a separate class.
Prior research has attempted to identify abandoned cropland.For example, Qiu et al. [14] used Sentinel-1 VV polarization time series and achieved an accuracy of 90% when classifying abandoned cropland at the field level in China.The use of the DpRVI time series in our work made it possible to achieve unmistakable classification (all 10 fields were identified correctly).The accuracy of crop identification in the fields has also been previously assessed using radar data [47].In this study, fields with soybean were classified with an accuracy of 88% (the test sample consisted of 100 polygons).In general, even with a smaller number of fields and taking into the transfer of learning outcomes to a neighboring region, the classification accuracy at the field level in our study is consistent with the findings of studies published in leading scientific journals.Prior research has attempted to identify abandoned cropland.For example, Qiu et al. [14] used Sentinel-1 VV polarization time series and achieved an accuracy of 90% when classifying abandoned cropland at the field level in China.The use of the DpRVI time series in our work made it possible to achieve unmistakable classification (all 10 fields were identified correctly).The accuracy of crop identification in the fields has also been previously assessed using radar data [47].In this study, fields with soybean were classified with an accuracy of 88% (the test sample consisted of 100 polygons).In general, even with a smaller number of fields and taking into account the transfer of learning outcomes to a neighboring region, the classification accuracy at the field level in our study is consistent with the findings of studies published in leading scientific journals.

Comparison with Another SAR Data Time Series
The use of radar data for cropland mapping is not new to scientific research.Among the most popular dual-pol Sentinel-1 data for the cropland mapping, one can single out the ratio of VV/VH polarizations and the RVI from Level-1 Ground Range Detected High Resolution (GRD-HR) product.GRD-HR consists of focused SAR data that have been multi-looked and projected to ground range using an Earth ellipsoid model of resolution 10 m × 10 m on pixel.
We applied the approach proposed for DpRVI to the VV/VH polarization ratio and the RVI time series at the same sites.The number of time series of radar data used for training and testing DpRVI-, VV/VH-, and RVI-based models in the Khabarovskiy District, as well as number of time series in the Amur region, are presented in Table 9.As already mentioned in Section 2.3, we used 71,153 DpRVI time series to train the classifier, 30,550 series were used to test the Khabarovsk region, and 203,120 series were used to test the model on the fields of the Amur region.The number of time series obtained for the studied fields from the same images is higher for RVI and VV/VH than for DpRVI by 8-9%.RVI seasonal courses for the studied crops and study areas, as well as oat RVI courses in different districts are presented in Figures S1 and S2

Comparison with Another SAR Data Time Series
The use of radar data for cropland mapping is not new to scientific research.Among the most popular dual-pol Sentinel-1 data for the cropland mapping, one can single out the ratio of VV/VH polarizations and the RVI from Level-1 Ground Range Detected High Resolution (GRD-HR) product.GRD-HR consists of focused SAR data that have been multi-looked and projected to ground range using an Earth ellipsoid model of resolution 10 m × 10 m on pixel.
We applied the approach proposed for DpRVI to the VV/VH polarization ratio and the RVI time series at the same sites.The number of time series of radar data used for training and testing DpRVI-, VV/VH-, and RVI-based models in the Khabarovskiy District, as well as number of time series in the Amur region, are presented in Table 9.As already mentioned in Section 2.3, we used 71,153 DpRVI time series to train the classifier, 30,550 series were used to test the Khabarovsk region, and 203,120 series were used to test the model on the fields of the Amur region.The number of time series obtained for the studied fields from the same images is higher for RVI and VV/VH than for DpRVI by 8-9%.RVI seasonal courses for the studied crops and study areas, as well as oat RVI courses in different districts are presented in Figures S1 and S2.
We applied RF to train classifiers based on RVI and VV/VH time series.Tables 10 and 11 show that RVI and VV/VH classifiers demonstrated similar results and were less accurate in comparison with the DpRVI model.For the Khabarovskiy District, RVI OA test was 77% (the difference with DpRVI was 5%) and VV/VH OA test was 76.8% (5.2% less than DpRVI).F1 for RVI and VV/VH varied from 0.73 for the fallow class to 0.85 for the soybean class (0.04-0.07 less than DpRVI).RVI and VV/VH model application in the Amur region led to a decrease in classification accuracy by 2%.At the same time, just as for DpRVI, F1 for soybean decreased by 0.06, whereas oat and fallow F1 remained at the same level.Thus, we have shown that the use of the DpRVI series makes it possible to increase the accuracy of classification, in comparison with the commonly used radar indicators for cropland mapping.These results can be related to the fact that DpRVI is calculated using the SLC product, which contains both amplitude and phase information from which we can generate a covariance matrix and obtain many polarimetric features.This makes the DpRVI index a more interpretable and sustainable descriptor of the crop growth course.It is also worth noting that DpRVI effectively incorporates scattered wave information to describe phenological changes which are vital for monitoring crop time series.In turn, data processing for DpRVI is more complex and requires high-performance computing.

Conclusions
This study successfully established the possibility of using DpRVI time series for cropland classification in Khabarovsk District.This process leads to good results in the identification of three classes: soybean, fallow, and oat.The seasonal course of DpRVI for each class follows a similar pattern, with similar maxima and the same calendar day on which these peaks are reached.According to the results of classification using three machine learning methods, it was found that the overall accuracy of the RF, SVM, and QDA methods was 82.0%, 81.9%, and 81.1%, respectively.The RF classifier was applied for cropland mapping in another region (Amur region).The use of RF for the fields of the Amur region did not worsen the performance; the overall accuracy was 83.1%, and F1 was 0.86 for soybean, 0.84 for fallow, and 0.79 for oat.The accuracy of crop identification at the field level was 79%.In future studies, it is recommended to separate oats that have series of several vegetation indices based on Sentinel-1 and Sentinel-2 data, resolve inconsistencies in the spatial resolution, restore data, and add temperature, precipitation, and humidity variables to the model.Gella et al. [23]  used Dual Polarization SAR Vegetation Index (DPSVI) and Modified Radar Vegetation Index (MRVI) time series to identify eight agricultural crops in the Netherlands.The use of such VIs has been found to improve overall accuracy.
Khabarovsk District based on different DpRVI time series and ML methods.

Figure 2 .
Figure 2. Study area 2 (Amur Region).Red boxes means spatial areas with sites.The study areas are characterized by a monsoonal climate with cold snowy winters (the average January temperature varies from −25 • C in study area [SA] 2 to −20 • C in SA1) and humid warm summers (the average July temperature in the study areas exceeds 21 • C).The average annual rainfall is at the level of 600-700 mm per year.The main agricultural crop in the southern part of the Russian Far East is soybean; in 2021, soybean occupied 55% (9381 ha) in the Khabarovskiy District, 79% (28,596 hectares) in the Arkharinskiy District, 76% (82,425 ha) in the Ivanovskiy District, and 79% in the Oktyabrsky district (78,899 ha) of the total area of arable land.The land occupied by oats is 7% (1232 ha), 5% (1851 ha), less than 1% (865 ha), and just over 1% (1380 ha) in the Khabarovskiy District, Arkharinsky District, Ivanovsky District, and Oktyabrskiy District, respectively.

Figure 3 .
Figure 3. Sentinel-1 tracks for SA2.Red boxes means spatial areas with sites.

Figure 3 .
Figure 3. Sentinel-1 tracks for SA2.Red boxes means spatial areas with sites.

Figure 3 .
Figure 3. Sentinel-1 tracks for SA2.Red boxes means spatial areas with sites.

Figure 4 .
Figure 4. Flowchart of DpRVI time series generation for assembly of two slices.Figure 4. Flowchart of DpRVI time series generation for assembly of two slices.

Figure 4 .
Figure 4. Flowchart of DpRVI time series generation for assembly of two slices.Figure 4. Flowchart of DpRVI time series generation for assembly of two slices.

Figure 6 .
Figure 6.Flowchart for multiple sub-swath processing.2.2.2.Ground Reference Data Labeled DpRVI data seasonal series for pixels of agricultural fields of the Khabarovskiy District were divided into training data to build a model and test dataset to evaluate the quality of the model.Information on crop rotation, obtained from ground-based observations, was additionally verified by expert visual interpretation of satellite images.Overall, pixels from 80 fields were divided into three classes: soybean, oat, and fallow.Pixels of 40 fields were used to build a model, pixels of another 40 sites were used to eval-

Figure 6 .
Figure 6.Flowchart for multiple sub-swath processing.2.2.2.Ground Reference Data Labeled DpRVI data seasonal series for pixels of agricultural fields of the Khabarovskiy District were divided into training data to build a model and test dataset to evaluate the quality of the model.Information on crop rotation, obtained from ground-based observations, was additionally verified by expert visual interpretation of satellite images.Overall, pixels from 80 fields were divided into three classes: soybean, oat, and fallow.Pixels of 40 fields were used to build a model, pixels of another 40 sites were used to evaluate model accuracy.

Figure 6 .
Figure 6.Flowchart for multiple sub-swath processing.2.2.2.Ground Reference DataLabeled DpRVI data seasonal series for pixels of agricultural fields of the Khabarovskiy District were divided into training data to build a model and test dataset to evaluate

Sensors 2023 , 23 Figure 8 .
Figure 8. Trend of the cross-validation accuracy with the increasing number of random forest trees.

Figure 8 .
Figure 8. Trend of the cross-validation accuracy with the increasing number of random forest trees.

Figure 9 .
Figure 9. Cross-validation accuracy with the regularization parameter value.Figure 9. Cross-validation accuracy with the regularization parameter value.

Figure 9 .
Figure 9. Cross-validation accuracy with the regularization parameter value.Figure 9. Cross-validation accuracy with the regularization parameter value.

Figure 13 .
Figure 13.Oat DpRVI seasonal courses in the study districts.

Table 1 .
Specification for SAR images by Sentinel-1 data acquired for thestudy areas.

Table 2 .
Number of fields and their areas, included in training dataset.

Table 3 .
Number of fields and their areas included in the Amur region dataset.

Table 4 .
Number of training and test DpRVI time series.

Table 5 .
Mean DpRVI maxima values (with variation among fields) and maxima dates (DOY) for the studied districts by crops.
* There was only one oat field in Oktyabrskiy District.

Table 6 .
Test accuracy for different ML methods in Khabarovskiy District.

Table 7 .
RF test accuracy in Amur Region.

Table 8 .
Crop identification accuracy in Amur Region.

Table 8 .
Crop identification accuracy in Amur Region.

Table 9 .
Number of SAR time series.

Table 10 .
RF test accuracy for different SAR time series in Khabarovskiy District.

Table 11 .
RF test accuracy for different SAR time series in Amur region.