Selecting Relevant Biological Variables Derived from Sentinel-2 Data for Mapping Changes from Grassland to Arable Land Using Random Forest Classiﬁer

: Permanent grassland is one of the monitored categories of land use, land use change, and forestry (LULUCF) within the climate concept and greenhouse gas reduction policy (Regulation (EU) 2018 / 841). Mapping the conditions and changes of permanent grasslands is thus very important. The area of permanent grassland is strongly inﬂuenced by agricultural subsidy policies. Over the course of history, it is possible to trace di ﬀ erent shares of permanent grassland within agricultural land and areas with signiﬁcant changes from grassland to arable land. The need for monitoring permanent grassland and arable land has been growing in recent years. New subsidy policies determining farm management are beginning to a ﬀ ect land use, especially in countries that have joined the EU in recent waves. The large amount of freely available satellite data enables this monitoring to take place, mainly owing to data products of the Copernicus program. There are a large number of parameters (predictors) that can be calculated from satellite data, but ﬁnding the right combination is very di ﬃ cult. This study presents a methodical, systematic procedure using the random forest classiﬁer and its internal metric of mean decrease accuracy (MDA) to select the most suitable predictors to detect changes from permanent grassland to arable land. The relevance of suitable predictors takes into account the date of the satellite image, the overall accuracy of change detection, and the time required for calculations. Biological predictors, such as leaf area index (LAI), fraction absorbed photosynthetically active radiation (FAPAR), normalized di ﬀ erence vegetation index (NDVI), etc. were tested in the form of a time series from the Sentinel-2 satellite, and the most suitable ones were selected. FAPAR, canopy water content (CWC), and LAI seemed to be the most suitable. The proposed change detection procedure achieved a very high accuracy of more than 95% within the study site with an area of 8766 km 2 .


Introduction
The area and structure of agricultural land is significantly influenced by the development of methods and technologies of agricultural management. In recent years, in the context of agricultural management and the associated development of land use, it has influenced the effects of the global market and subsidy policies, especially agricultural, energy, and landscaping policies. These are mainly specific subsidy programs within the European Union, where subsidies are provided for agriculturally oriented lands for specific use of the given land unit [1][2][3][4]. As a result of newly introduced subsidy policies and changes in the food market, agricultural production and the structure of agricultural crops have changed in recent years. Mapping the structure of agricultural land use is very important [5], especially the structure of cultivated crops, changes in the share of individual agricultural areas, and the emphasis on changes between permanent grassland and arable land [5][6][7]. Permanent grasslands are important landscape components with high ecological value [8] and strong effect on reducing soil erosion [9]. Constant reduction in the area of these biomes has a very negative effect not only on the landscape as a whole in Europe [10] but also on the absorption of greenhouse gases due to the use of land for agricultural purposes [11].
It is obvious that changes from grassland to arable land not only play a major role in production change [12] but also influence the local landscape and rural development [13]. Grasslands are a very valuable habitat in terms of soil erosion reduction [14], which is a worldwide problem [15]. On the other hand, arable land is considered as an area with low soil erosion protection [16]. Changes from grassland to arable land may lead to biodiversity loss [17,18], especially in semiarid regions, where extreme climate conditions occur [19]. Mapping changes from grassland to arable land is essential to monitor both agriculture and climate conditions in order to find the right balance between needs of humankind and planetary sources [8]. Remote sensing technologies offer a fast and efficient way to map large-scale areas, especially in cases where a lot of satellite data are freely available, such as with Sentinel-2 satellites [7].
European authorities and national authorities of individual states, with the Czech Republic being no exception [20], provide considerable funding for the development of monitoring/control systems for the use of agricultural land. For these purposes, a specifically oriented geographic information system called the land parcel identification system (LPIS) was developed in the Czech Republic [21], in which registered land blocks (basic unit LPIS) are monitored on the basis of aerial and satellite images of very high spatial resolution. Information on users and specific use of the type of agricultural land is stored as attribute data of individual vector layers (http://eagri.cz/public/app/lpisext/lpis/verejny2/plpis/). The control and validation of these data take place on the basis of a field survey and using remote sensing methods. Monitoring changes within LPIS soil blocks offers effective standardization and time series from the satellite data, thereby providing a promising method to monitor the current use and changes of agricultural land as well as check for any deviations from the registered state [22,23].
It follows from the above that effectively monitoring the state of and changes in the use of agricultural land with the help of remote sensing is a highly relevant topic. From the point of view of both agricultural and environmentally oriented policies, there is high need for up-to-date information on the area and changes of permanent grassland and arable land [24][25][26][27], especially larger territorial units [28][29][30]. Recently, a number of methods have been developed to detect conditions and changes using remote sensing technologies. Freely available satellite images from Landsat and Sentinel-2 satellites, which contain extensive data archives [31][32][33][34], are widely used for this purpose. These archives allow time series suitable for monitoring land use/land cover changes to be created. Furthermore, there is a need to automate classification processes [35] so that satellite image time series can be processed quickly and efficiently with the required degree of accuracy. In this direction, certain progress has been made with data archives from the Sentinel-2 satellite, which, compared to Landsat, offers a higher spatial and temporal resolution as well as specifically designed spectral bands in the red edge [36][37][38][39]. The higher temporal resolution of Sentinel-2 data is suitable for use in a time series [40][41][42][43][44], which allows oscillations between individual phenological phases of vegetation to be captured.
To date, a wide range of specific studies have been published to map agricultural areas based on multitemporal data [22,40,41,[45][46][47][48][49][50]. Although significant progress has been made in the accuracy of classification of agricultural lands, the mapping of changes in permanent grassland and arable land remains a relevant research challenge [51][52][53][54]. Mapping changes from permanent grassland to arable land is a relatively challenging matter as the change detection process requires capturing and accurately analyzing changes during individual phenological phases [55], which are highly variable due to the high heterogeneity in arable land/permanent grassland use and different local and climatic conditions [56]. Different spectral bands and vegetation indices are widely used in change detection methods using time data [39,[57][58][59]. From this point of view, a time series of the satellite data can be divided according to whether one variable is used, i.e., a single vegetation index for a defined time period [41,[60][61][62], or more variables are used, for example, the original spectral bands + vegetation index, optional radar data, and optical data [54,[63][64][65]. In addition to the possibility of calculating traditionally conceived indices, data from Sentinel-2 offer calculations of specific indices that are freely available in the Sentinel Application Platform (SNAP) software [66]. These mainly consist of the following biological variables: fraction absorbed photosynthetically active radiation (FAPAR), leaf area index (LAI), fractional vegetation cover (FCOVER), content of chlorophyll A in the plant leaves (CAB), and canopy water content (CWC) [39,66,67].
An important methodological aspect of change detection is the selection of predictors that best describe and detect the observed phenomena [68][69][70]. This is mainly the effect of the time acquisition of a satellite image within a time series and a specific biological predictor or spectral band as many predictors are correlated and bring redundant information to the identification of the observed phenomena [51,71,72]. To date, various methods have been developed and tested to determine suitable predictors. A specific example is the use of the nonparametric classifier random forest [73], which has proven its reliability in many studies in remote sensing [49,53,61,[74][75][76][77]. The random forest classifier [73] is a nonparametric classifier that belongs to the family of decision trees. Its strengths include an overfitting resistance, the ability to classify large data volumes, and, above all, the identification of suitable predictors based on its internal mean decrease accuracy (MDA) and mean decrease Gini (MDG) index metrics [73]. Research and extensive discussions are currently underway as to which of the above metrics is more appropriate to reveal the relevance of the most important predictors [78][79][80]. In principle, both MDA and MDG values epend on mutual correlation of the input predictors but mainly on the number of decision trees [81]. In our case, 1000 decision trees were constantly set up. Another parameter is mtry, which determines the number of predictors in each node of the tree structure [73]. The mtry parameter is calculated as the square root of the total number of predictors [77] entering the classification process.
For this reason, according to some studies, it is recommended that the classification for the selection of significant predictors is repeated as the list of important predictors derived from one model of the random forest classifier may be skewed [80,82]. The final selection of suitable predictors based on the random forest classifier can take place empirically, where the most suitable predictors are selected based on the experience of the producer [78,83] or undertaken fully automatically [81][82][83][84]. The disadvantage of the fully automatic methods is that they are derived from a single model of the random forest classifier, which, according to Millard and Richardson [82], can lead to misleading results. Therefore, there is a continuing need to conduct studies to select suitable predictors and adapt them for specific purposes, such as detecting changes from permanent grassland to arable land.
The main objectives of this study were as follows: • developing and testing a methodology to detect changes from permanent grassland to arable land using the random forest classifier with the help of Sentinel-2 time series data; • identifying the most suitable predictors of the time series of images [22] on the basis of the selected MDA parameter; • finding statistically significant differences based on the overall average accuracy of the detection of changes between individual predictors [39]; and • discussing the achieved results and comparing them with similarly oriented studies.

•
The following research question was formulated: • Does one multitemporal predictor or a combination of them provide the best results within a time series?

Study Area
The study site is located in the Czech Republic, including the administrative units of the Pilsen, Karlovy Vary, and Ústí nad Labem regions. The central point of the site of interest is located at coordinates X = 358,773 and Y = 554,9188. The corner points have the coordinates X min = 300,475, Y min = 550,1255, X max = 409,195, and Y max = 559,9195. The information is all in the WGS84 UTM 33N coordinate system. The whole area is approximately 8766 km 2 ( Figure 1). The area of interest is vegetatively, topographically, and ecologically highly variable with frequent occurrence of permanent grassland and arable land. The study site was deliberately chosen due to the increased incidence of changes from permanent grassland to arable land. When selecting the study site, the criterion of localization within one scene was taken into account, but this requirement could not be met due to the occurrence of clouds. Changes from permanent grassland to arable land were detected based on the vector overlap of layers from the LPIS database. This vector overlay also served as a reference data source for training the random forest classifier.  (Figure 1). The area of interest is vegetatively, topographically, and ecologically highly variable with frequent occurrence of permanent grassland and arable land. The study site was deliberately chosen due to the increased incidence of changes from permanent grassland to arable land. When selecting the study site, the criterion of localization within one scene was taken into account, but this requirement could not be met due to the occurrence of clouds. Changes from permanent grassland to arable land were detected based on the vector overlap of layers from the LPIS database. This vector overlay also served as a reference data source for training the random forest classifier.

Input Data
A time series of optical data from the Sentinel-2 satellite was used to process this study. A total of 5 satellite images were selected ( Table 1). As the images were selected based on a predetermined criterion, i.e., images from 2015 to 2019, they had to be completely cloudless. In this case, the cloud cover was less than 5%. The satellite data were selected so that it was possible to find the highest possible number of cloudless images for each year between 2015 and 2019 during the growing season, which ranges from April to the end of September. According to Esch et al. [14], the most suitable period for capturing changes between grassland and arable land is immediately after the harvest of field crops, i.e., late summer or early autumn. A similar view is held by Možný et al. [85][86][87]. As one of the main tasks of this study was to determine the importance of time-lapse photography, it was necessary to include multitemporal data throughout the growing season. The time period from April to September was chosen due to a significant increase in temperature over recent years, especially in the Ore Mountains [88], which also includes part of the selected study site. Due to the general increase in the temperature, it can be assumed that all the field crops and grass species will be in the full phase

Input Data
A time series of optical data from the Sentinel-2 satellite was used to process this study. A total of 5 satellite images were selected ( Table 1). As the images were selected based on a predetermined criterion, i.e., images from 2015 to 2019, they had to be completely cloudless. In this case, the cloud cover was less than 5%. The satellite data were selected so that it was possible to find the highest possible number of cloudless images for each year between 2015 and 2019 during the growing season, which ranges from April to the end of September. According to Esch et al. [14], the most suitable period for capturing changes between grassland and arable land is immediately after the harvest of field crops, i.e., late summer or early autumn. A similar view is held by Možný et al. [85][86][87]. As one of the main tasks of this study was to determine the importance of time-lapse photography, it was necessary to include multitemporal data throughout the growing season. The time period from April to September was chosen due to a significant increase in temperature over recent years, especially in the Ore Mountains [88], which also includes part of the selected study site. Due to the general increase in the temperature, it can be assumed that all the field crops and grass species will be in the full phase of phenological growth at this time interval. Therefore, their spectral response will overlap, and it will be possible to better evaluate the effect of time-lapse satellite imagery. Potentially, it is possible to better evaluate sensitivity of the random forest classifier to the content of redundant information in the predictors that are subject to change detection. The images were downloaded from the United States Geological Survey (USGS) data archive [31] at Level 1C processing, with the images being manually converted to the surface reflectance. Table 1 documents the data used. In total, only 5 images were used as it was not possible to obtain a complete time series for each year (2015, 2016, 2017, 2018, and 2019) in a cloudless state between April and September. For each year, only completely cloudless images were obtained, as shown in Table 1. A time series was created from the selected images, which were subjected to partial analyses. Vector layers from the LPIS database were used as the reference data. LPIS is a geographic information system that is specially designed for the monitoring and registration of agricultural activities in the Czech Republic. The basic unit of this system is a soil block, which is defined as the agricultural land on which crops are grown in a regular sequence, possibly under greenhouses or under a fixed and portable cover of an agricultural crop that is not grassland. This definition is based on the valid legislation of the Czech Republic under the Ministry of Agriculture. The vector data are continuously updated, and their accuracy is verified on the basis of field surveys. The LPIS database contains information on agricultural crops and areas, which are listed here as two thematic classes of land cover: arable land and permanent grassland.
Arable land is defined as an agricultural area on which grass or other fodder is not grown. Permanent grassland is an area where grass has been grown for more than five years, and the land is subject to the European Union's agricultural subsidy policy.
Arable land and permanent grassland polygons were extracted from LPIS and served as the reference data for further processing. All soil blocks containing permanent grassland and arable land were extracted from the LPIS vector database. Identification of the reference change soil blocks was performed on the basis of the spatial overlap between 2015 and 2019. The LPIS vector data was used as a mask. The two LPIS vector layers were exported on 30 June 2015 and 30 June 2019. The polygons thus obtained were used as reference to generate training data for the random forest classifier.

Methods Used
The first methodological step was to perform atmospheric corrections. The downloaded satellite data were converted to surface reflectance values using the Sen2Cor v. 2.5 module [89,90]. Subsequently, 20 m bands were resampled by the nearest neighbor method to the same pixel size as the bands of the visible band and near infrared (NIR) (8), i.e., 10 m. Then, all the biological predictors [67] were calculated, which are standardly implemented in the freely available software SNAP [66], including the vegetation index NDVI. Biological predictors were selected on the basis of several elaborated studies focused on plant physiology or biomass monitoring [91][92][93][94] or precision agriculture [95,96] as well as based on our previous experiences and testing [71]. The original spectral bands from the red edge bands (spectral bands 5, 6, and 7), were chosen as further input predictors. Table 2 gives a list of all the predictors used. A detailed description of the algorithm and the method of calculation of FAPAR, FCOVER, LAI, CAB, and CWC can be found in the relevant literature (the field references in Table 2).  [94,104] After calculating the biological predictors, the classification of changes was performed using the random forest algorithm [73]. Because the random forest algorithm is intended for a supervised image classification, it was necessary to create training data in the form of training points. A total of 2000 training points were created by the method of stratified random sampling on the basis of change polygons from the LPIS database. The basic classes were defined as change surfaces and no-change surfaces. The category of change areas determined the change from permanent grassland to arable land. It should be noted that the category of change areas within the image was much less extensive compared to the territory without this type of change. However, the number of reference points for the change class was defined by the same number as that for the class where no changes took place, i.e., 1000 points each. The random forest classifier tends to place more weight on the classification class for which it has more training data available [105]. However, in the case of the random forest classifier, this is the recommended solution (see [105]).
The basic set of reference points for both categories was divided into two separate datasets at a rate of 50%, i.e., 50% of the training points from the basic set was used as the training points, and the remaining 50% was used as the validation points. A set of training and validation points was created individually for each separate change classification.
Subsequently, the changes from permanent grassland to arable land were classified in an iterative way for all the tested predictors ( Table 2). The classification of the changes for each predictor and for the full time series was performed a total of 30 times. i.e., when all the data entered the classification process (see Table 1). Then, the time relevance was evaluated, i.e., the date the image was taken based on the highest average MDA. In this case, the MDA value took on specific values for each image capture date. The higher its value, the higher the relevance of the specific image capture date. The results of previous surveys have shown [82] that if the classification of changes using the random forest algorithm is performed only once, the MDA values cannot be considered stable; in other words, the specific MDA value cannot be assigned to a specific image capture date in general norm. In order to obtain relevant and stable MDA values, it is appropriate to start from a general trend, which can be obtained by simply averaging the MDA values.
The selection of the most important predictors was made empirically based on the highest average MDA after 30 classifications. The three most important predictors were selected for the univariate time series, and five of the most important predictors were selected for the multivariate time series. Another time series (Figures S2-S7) arose only from the best predictors based on the highest average MDA.
All the predictors used in Table 2 were tested as a univariate time series. The multivariate time series contained two time series. The first multivariate time series was created only from the original spectral bands comprising the red edge band, i.e., 5, 6, and 7, while the second included all the biological predictors, including the vegetation index NDVI. All methods were performed to compare differences in the overall accuracy of change as it was more appropriate to use only the original spectral bands or biological predictors (Table 2) or both. The most important predictors evaluated based on the highest average MDA were named "REDUCED", which is always added at the end of the tested time series. The designation can be standardly observed in all the figures and tables where it occurs in this study.
To evaluate the suitability of the model, a criterion was chosen that a model with a smaller number of input predictors with approximately the same accuracy and less time required for the calculation will have higher weight than the one containing all the tested predictors (Table 2 and Figure S1). Therefore, taking into account the time aspect necessary to calculate all the repeated classifications of the changes, a total of 30 iterations per unit time were undertaken. For this reason, a "model score" was designed according to the following relationship: where T is the total time needed for the calculation (min), and F1 is the average F score for the change class (permanent grassland to arable land) after n iterations: For a better interpretation, the quantity S was normalized between the range of values < 0; 1 > according to the following relationship: The quantity S is a simple ratio between the time required for the calculation and the overall average accuracy of classification. The closer the value of S is to 0, the better. The basic assumption is that the classification of the changes with fewer input predictors will take less time than the classification with the original number of predictors tested. The quantity S should, therefore, take on lower values if the total time needed to calculate the classification of the changes is lower.
A standard accuracy assessment was performed for each of the 30 classifications of the changes from permanent grassland to arable land. A standard error matrix was used as a criterion for evaluating the accuracy, from which the overall accuracy of classification, including user and processing accuracies, were calculated [106,107]. Although there are already more modern approaches to evaluating the accuracy of classification [108,109], we decided to use traditional approaches in evaluating the accuracy [106,107] because the evaluation of the overall accuracy was implemented in the process of classifying changes in the form of automatic scripts in the R programming language [110]. Nontraditional methods [108,109] are more difficult to calculate and thus prolong the total time taken for the necessary calculations. The experiments were performed on a desktop workstation with an AMD Ryzen 3900X processor and 64 GB RAM.
The process of detecting changes involved statistical evaluation of the results based on the overall accuracy using the Friedman test [111][112][113], which revealed statistically significant differences at the applied level of significance, α = 0.05. As this test does not have the ability to detect statistically significant differences between combinations of the tested biological predictors (Table 2 and Figure S1), for this purpose, it was necessary to use the so-called post hoc test, which is able to detect these differences. The Nemenyi test was used as the post hoc test [111]. A general scheme of the whole methodological process of this study is presented in Figure 2.

Results
A total of 22 different combinations of biological predictors were tested, including the original spectral bands 5, 6, and 7 belonging to the red edge region. Both the univariate time series ( Figures  S2-S6A) and multivariate time series (Figures S6B and S7) were tested, including their reduced form, which is shown in green boxes in Figures S2-S7.
A general overview of the importance of individual spectral bands and biological predictors (hereinafter referred to as predictors) is provided in Figures S2-S7, which shows the average values of the MDA for the original dataset. The green box indicates a targeted selection of the most important predictors that were used to further the change classification experiments and then compare them with the original dataset based on a statistical test. From the box graphs, it is possible to derive the degree of relevance of the individual predictors. Because the individual predictors were tested in the form of univariate and multivariate time series, the random forest classifier was, in this case, based on the highest average MDA value able to automatically derive the most suitable date within the time series of the partial predictor.
The most frequent acquisition date that was marked as the most important was the August acquisition date from 2015 ( Figures S2, S4A, S5B, S6, and S7). These were the fifth spectral band, the sixth spectral band, FCOVER, CWC, NDVI, the multivariate time series from spectral bands including red edge, and the multivariate time series only containing biological predictors. In the case of the seventh spectral band (Figure S3), the April date from 2018 was chosen as the most important, while the June date from 2019 was chosen as the most suitable in other cases ( Figures S3B and S4B), namely, the time series of predictors FAPAR and LAI. The September date of 2018 was chosen in only one case for CAB ( Figure S5A). The second and third positions were, to a greater extent, alternately occupied by the September date from 2018 and the August date from 2016, which supplemented the June date from 2019. Figure 3 includes scatter plots that show the scatter of the overall accuracy values, including the user and processing accuracies, after 30 replications of the classification. Figure 3A summarizes the overall accuracies. The results can be divided into two basic groups. The first group had overall average accuracy around 99% and included the fifth and seventh spectral bands of the satellite Sentinel-2, FAPAR, FCOVER, CWC, and the multivariate time series consisting of spectral bands 5, 6, and 7. Some partial classification models achieved 100% accuracy, including the cases of the vegetation index NDVI and FAPAR. The second group were biological predictors whose variance in The last step was to calculate the change from grassland to arable land in the observed case study. The calculation was performed from the result database, and the total area of changes in km 2 and the percentage of changes of the total area were calculated.

Results
A total of 22 different combinations of biological predictors were tested, including the original spectral bands 5, 6, and 7 belonging to the red edge region. Both the univariate time series (Figures S2-S6A) and multivariate time series (Figures S6B and S7) were tested, including their reduced form, which is shown in green boxes in Figures S2-S7.
A general overview of the importance of individual spectral bands and biological predictors (hereinafter referred to as predictors) is provided in Figures S2-S7, which shows the average values of the MDA for the original dataset. The green box indicates a targeted selection of the most important predictors that were used to further the change classification experiments and then compare them with the original dataset based on a statistical test. From the box graphs, it is possible to derive the degree of relevance of the individual predictors. Because the individual predictors were tested in the form of univariate and multivariate time series, the random forest classifier was, in this case, based on the highest average MDA value able to automatically derive the most suitable date within the time series of the partial predictor.
The most frequent acquisition date that was marked as the most important was the August acquisition date from 2015 ( Figures S2, S4A, S5B, S6, and S7). These were the fifth spectral band, the sixth spectral band, FCOVER, CWC, NDVI, the multivariate time series from spectral bands including red edge, and the multivariate time series only containing biological predictors. In the case of the seventh spectral band (Figure S3), the April date from 2018 was chosen as the most important, while the June date from 2019 was chosen as the most suitable in other cases ( Figures S3B and S4B), namely, the time series of predictors FAPAR and LAI. The September date of 2018 was chosen in only one case for CAB ( Figure S5A). The second and third positions were, to a greater extent, alternately occupied by the September date from 2018 and the August date from 2016, which supplemented the June date from 2019. Figure 3 includes scatter plots that show the scatter of the overall accuracy values, including the user and processing accuracies, after 30 replications of the classification. Figure 3A summarizes the overall accuracies. The results can be divided into two basic groups. The first group had overall average accuracy around 99% and included the fifth and seventh spectral bands of the satellite Sentinel-2, FAPAR, FCOVER, CWC, and the multivariate time series consisting of spectral bands 5, 6, and 7. Some partial classification models achieved 100% accuracy, including the cases of the vegetation index NDVI and FAPAR. The second group were biological predictors whose variance in values reached the lowest values of about 97% in the overall accuracy of classification. These were generally the reduced models ( Figures S2-S7) of all the tested biological predictors, including the spectral bands, where, in general, a decrease in the overall accuracy of classification of changes and an increase in the variance of values was evident. The highest variance and decrease in values of the overall accuracy of classification was in the case of FCOVER and the vegetation index NDVI. In contrast, the most stable trend regarding the overall accuracy of classification was achieved by the biological predictors FAPAR and LAI and the time series composed of the original spectral bands from the Sentinel-2 satellite.   In the case of processing accuracy for the class of observed changes ( Figure 3B), a similar trend was observed as that of the overall accuracy, where, in three specific cases, it was evident (the tested original red edge bands) that it was possible to achieve 100% accuracy with a mean value of around 99.7%. The lowest values were reached in the case of the reduced model for the multivariate time series from the original spectral bands ( Figure S6B). It can be seen that the producer's accuracy values of the change class were similar for all the tested predictors ( Table 2) and were generally less scattered than the producer's accuracy of the no-change class ( Figure 3C). The highest producer's accuracy for the change class ( Figure 3B) was achieved by classifying the seventh spectral band together with the lowest variance of the processing accuracy values, and it can therefore be described as the most suitable predictor in this direction.
Regarding the processing accuracy in the no-change class ( Figure 3C), similar trends were observed, i.e., a decrease in the accuracy of the classification models where the input predictors were reduced (green boxes in Figures S2-S7). Here, the lowest values of producer's accuracy were below the limit of 95%, specifically in the cases of the biological predictor FCOVER and the seventh spectral band. It should be noted that these are still very high producer's accuracy values. The biological predictor CAB in its reduced form was certainly a surprise as it showed the smallest variance in the processing accuracy values. Figure 3D summarizes variance of the user precision values for the change class, where the best predictors were the biological predictors LAI, FCOVER, and FAPAR. Here, the highest decrease in values were observed in the reduced models for the biological predictors FAPAR, FCOVER, and CAB and the vegetation index NDVI. There was a noticeable difference of up to 5% between the maximum and minimum values in user accuracy of the change class. The last case ( Figure 3E), i.e., user accuracy for the no-change class, generally provided very balanced results, both in the case of the original predictors and the reduced ones. Again, there were cases that achieved 100% accuracy involving a wide range of predictors, including the fifth spectral band, FCOVER in a reduced form, the NDVI vegetation index, etc. Further details can be found by comparing the results in Figure 3.
The normalized metric S (Figure 4) is a function of the time needed to calculate the classification and F1 score of the change class (change from grassland to arable land), according to which it is possible to best evaluate the relevance of individual classification models (Figures S2-S7). The lowest values of this quantity were reached by the biological predictors with models in a reduced form (green boxes in Figures S2-S7), namely, FAPAR, CWC, LAI, NDVI, and FCOVER. All the other cases showed, more or less, the same results, and it was clear that metric S (Figure 4) took into account the time aspect very significantly and allowed evaluation of the potential relevance of the partial predictor for the sought change class.
The statistical analysis based on the Friedman test [111][112][113] revealed statistically significant differences between individual biological predictors, which occurred in a large number of cases ( Figure S1). In general, statistically significant differences occurred between the reduced models ( Figures S2-S7) and the original datasets of the tested predictors. An example is the statistically significant difference between the biological predictor FAPAR and the fifth spectral band in a reduced form (Figures S2-S7). Some other examples are the statistically significant differences between the vegetation index NDVI in its original form and its reduced model ( Figure S7). The same was true for the biological predictor LAI as well as for FAPAR and FCOVER. In general, no statistically significant differences were observed for the original predictors. Other cases of statistically significant differences include the biological predictor LAI and the dataset of the vegetation indices ( Figure S6B) or a comparison of the dataset of the original spectral bands (Figures S6B and S7) with the fifth, sixth, and seventh spectral bands in a reduced form and then the biological predictors FAPAR and FCOVER in a reduced form.  Figure 5A demonstrates the spatial distribution of changes, while 5B shows the spatial distribution of the probabilities of the expected changes within the soil blocks. In addition to the classic assignment of thematic information contained in the training data, the random forest classifier also provides the probability of a given classification class, i.e., in this case, the probability of a change from permanent grassland to arable land. Based on this, it is possible to estimate the degree of uncertainty of change detection within each specific soil block based on the assigned probability. Table 3 summarizes the areas of changes/no changes and classes of the probability. The changes from permanent grassland to arable land were detected on 28 km 2 (0.8% of the total observed area). The classes with the highest probability of change (61-80% and 81-100%) covered 12.1 km 2 . When comparing the outputs in Figure 5, it is clear that the largest share of changed soil blocks was in the probability range of 41-60%, while there were also a large share of soil blocks that had an expected change of 61-80% and 81-100%. The random forest classifier thus provides a tool for expressing the degree of uncertainty of both the change and no-change class, which is a very suitable feature, especially in the case of changes in vegetation areas where the degree of uncertainty of the detected changes is relatively high due to phenological phases of vegetation. A large number of statistically significant differences was found, and their general overview is offered in Figure S1. This fact shows that the null hypothesis does not apply, which assumes that the overall accuracies of the original and reduced predictors ( Figures S2-S7) do not provide identical results. The statistical test confirmed that if we purposefully select a certain number of predictors on the basis of the highest average MDA, we will obtain different results that are statistically significant. Figure 5A demonstrates the spatial distribution of changes, while 5B shows the spatial distribution of the probabilities of the expected changes within the soil blocks. In addition to the classic assignment of thematic information contained in the training data, the random forest classifier also provides the probability of a given classification class, i.e., in this case, the probability of a change from permanent grassland to arable land. Based on this, it is possible to estimate the degree of uncertainty of change detection within each specific soil block based on the assigned probability. Table 3 summarizes the areas of changes/no changes and classes of the probability. The changes from permanent grassland to arable land were detected on 28 km 2 (0.8% of the total observed area). The classes with the highest probability of change (61-80% and 81-100%) covered 12.1 km 2 . When comparing the outputs in Figure 5, it is clear that the largest share of changed soil blocks was in the probability range of 41-60%, while there were also a large share of soil blocks that had an expected change of 61-80% and 81-100%. The random forest classifier thus provides a tool for expressing the degree of uncertainty of both the change and no-change class, which is a very suitable feature, especially in the case of changes in vegetation areas where the degree of uncertainty of the detected changes is relatively high due to phenological phases of vegetation.

Discussion
The main objective of this study was to develop and test a methodology of change detection from permanent grassland to arable land using the random forest classifier and then to identify the most suitable predictors of the time series of images on the basis of the selected MDA parameter.
For this reason, the random forest classifier and its internal metric MDA were used and tested. This study took into account not only the average MDA of the input predictors (time frame), which, in our case, were the biological predictors (Table 2) and the original spectral bands, but also the time required to calculate the classification changes combined with the overall accuracy and the proposed S metric (see Equations (1)−(3)). There were a total of three criteria according to which the relevance of individual predictors were evaluated: (1) the time acquisition of the satellite images; (2) the speed of the calculation of change detection in accordance with the overall accuracy of the change class; and (3) the overall accuracy of change detection, which was statistically evaluated based on the Friedman test [111][112][113].
The overall accuracy of the detection of changes from permanent grassland to arable land was subject to a statistical evaluation based on the Friedman test ( Figure S1). Statistically significant differences between individual cases were detected (Table 2 and Figure S1). Statistically significant

Discussion
The main objective of this study was to develop and test a methodology of change detection from permanent grassland to arable land using the random forest classifier and then to identify the most suitable predictors of the time series of images on the basis of the selected MDA parameter.
For this reason, the random forest classifier and its internal metric MDA were used and tested. This study took into account not only the average MDA of the input predictors (time frame), which, in our case, were the biological predictors (Table 2) and the original spectral bands, but also the time required to calculate the classification changes combined with the overall accuracy and the proposed S metric (see Equations (1)−(3)). There were a total of three criteria according to which the relevance of individual predictors were evaluated: (1) the time acquisition of the satellite images; (2) the speed of the calculation of change detection in accordance with the overall accuracy of the change class; and (3) the overall accuracy of change detection, which was statistically evaluated based on the Friedman test [111][112][113].
The overall accuracy of the detection of changes from permanent grassland to arable land was subject to a statistical evaluation based on the Friedman test ( Figure S1). Statistically significant differences between individual cases were detected (Table 2 and Figure S1). Statistically significant differences were detected between the input predictors in a comprehensive form and their reduced models (green boxes in Figures S2-S7). The reduced model achieved a high classification accuracy ranging from 95 to 100%, including the user and processing accuracies for both the change class and the no-change class (Figure 3). If we compare our results with related studies, Xu et al. [49] evaluated changes between 1984 and 2016 in selected areas in Africa using the random forest classifier and Landsat data and achieved accuracy oscillating between 56 and 94%. Weeks et al. [53] used Landsat data for mapping grassland changes in New Zealand. The authors applied several traditional change detection techniques, such as image differencing, postclassification change detection, and visual interpretation, and reported lower accuracies of 47 to 56% in the case of image differencing and postclassification change detection and up to 98% accuracy in the case of visual interpretation. The random forest classifier was also used for mapping lowland native grassland communities in Tasmania based on Landsat and Worldview 2 data [75]. The authors of this study reached overall accuracies between 54 and 87%.
To evaluate the stability of the importance of input predictors, it is appropriate to repeat the image classification several times [80,82]. Based on this process, the best single model is selected using MDA or MDG. In some cases, it is not clear whether it is more appropriate to use MDA or MDG to detect the relevant predictors [83]. There are studies that say MDG is more appropriate [72], while others say MDA is more appropriate [73,82]. This study used MDA as a criterion to evaluate the time the images were taken (importance of the predictors) and an iterative approach to classify changes and generalized information based on the average MDA. The results confirm that MDA is able to select relevant predictors, and it can therefore be reliably used for feature selection based on the methodology proposed in this study. Proper selection of the relevance of biological variables tested in this study can be derived from stable MDA values, and the random forest classifier may respond relatively reliably to changes in the values of the input predictors. Thus, one of the goals of this study was fulfilled.
In this study, a total of 22 datasets consisting of individual biological predictors (Table 2) were tested, including the red edge bands, either in a univariate or multivariate form. The five most important predictors (FAPAR, CWC, LAI, NDVI, and FCOVER in a reduced form) can be marked as the most suitable based on the results of metric S (Figure 4). These biological predictors (Table 2) can contribute to the overall overview of the plant physiology in terms of photosynthesis, respiration, and transpiration [114,115] as well as biomass content [91]. Many studies have concentrated on modelling the relationship between biological variables, such as LAI or FAPAR [116][117][118][119]. In comparison to the traditionally used vegetation indices derived from remote sensing data, biological predictors provide complex and detailed information about physiological processes in plants. It was demonstrated that biological predictors (Table 2) provide reliable and accurate results and can serve as an alternative to standard vegetation indices, such as NDVI. This information is very valuable in several applications, e.g., in precision farming [95,96].
Among the most similar studies that have addressed determination of the best predictor for the purpose of detecting changes is the study by Klouček et al. [51]. To detect changes from permanent grassland to arable land, the authors used a pair of scenes from the Landsat 8 satellite between 2013 and 2016. The method of logistic regression was used to select the best predictors, and subsequent changes were classified using the support vector machines algorithm [103][104][105]. The best change detection was performed with 89.80% accuracy (Kappa 0.63).
The correct selection of the bitemporal images that replicate the phenological phases of vegetation may seem to be a suitable alternative, as recommended by Šandera and Štych [71], who, in their study, used a pair of Landsat images to detect changes similar to the above case [51] using boosting methods to classify changes from arable land to permanent grassland. The highest overall accuracy of 89.51% was achieved.
Our study used multitemporal Sentinel-2 data, including selected spectral bands comprising the red edge bands. The most valuable findings in this study can be treated as demonstration of the effectiveness of the Sentinel-2 satellite and its importance in agricultural mapping. Biological predictors have been freely available for a large number of users since the Sentinel-2 satellite launch. Our study also indirectly shows that optical data from the Sentinel-2 satellite with a higher spatial and temporal resolution are able to achieve high accuracy in detecting changes from grassland to arable land. On the other hand, the Sentinel-2 data have a limit to provide cloudless images for such a large area as our case study in the vegetation period. A similar study in which multitemporal Sentinel-2 data was used is the study of Grabska et al. [120], where the authors tried to reveal the time aspect of the acquisition of a satellite image and its overall effect on the accuracy of classification of the species composition of forest stands in mountainous terrain.
The reference data from the LPIS database can be considered the main shortcoming of our study, with errors in the form of outdated information in this database also being possible. In situ checks are performed randomly and not systematically, and 100% accuracy of all the LPIS data cannot therefore be guaranteed. This shortcoming can be eliminated in the future on the basis of a systematic field survey conducted by ourselves, where selected soil blocks will be monitored at regular intervals and supplemented with knowledge from the LPIS database.
The method we have developed based on the random forest classifier and its ability to predict probabilities of potential changes ( Figure 5) can serve as a source for targeted in situ inspections during the vegetation season. These targeted inspections can reveal potential false changes in the final landcover map/LPIS database and sensitivity of the random forest classifier to noise. The proposed procedure can be used to partially automate change detection and especially to update the LPIS database.

Conclusions
In this study, it was demonstrated that the random forest classifier and its internal MDA metric are able to successfully detect relevant predictors for changes from permanent grassland to arable land. It was demonstrated that if a multivariate time series of selected predictors is used, which are subsequently reduced to a selected number of the most important ones, the relevant results can be determined using the proposed normalized metric S. Empirical selection of the most important predictors in the form of very high classification accuracy of change detection from permanent grassland to arable land is of great significance. FAPAR, FCOVER, LAI, CWC, and NDVI can be considered the most relevant predictors here. Overall, all the set goals of this study were met. Future research will implement average MDG as an additional metric to evaluate the relevance of predictors provided by the random forest classifier. The proposed procedure can be used to partially automate change detection and especially to update the LPIS database. This study was performed in the Czech Republic. It would be desirable to carry out testing and evaluation both in other countries with similar climatic conditions and in regions with different geographical conditions. The input data, scripts, and other tools that were used to carry out this study, including instructions on how to use them, is freely available within the Github repository.

Supplementary Materials:
The following are available online at https://1drv.ms/u/s!Ah28WO5IO-6ZgmZuBugDHHpV18BR?e=gRmG8J. Figure S1: Pairwise comparison of the p values of the tested biological predictors based on the post-hoc Nemenyi test at 95% level of significance, Figure S2: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for Spectral Bands 5 and 6 of Sentinel 2 Satellite, Figure S3: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for Spectral Bands 7 and FAPAR, Figure S4: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for FCOVER and LAI, Figure S5: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for CAB and CWC, Figure S6: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for NDVI and Spectral Bands 5, 6, 7 together as one dataset, Figure S7: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for FAPAR, FCOVER, CAB, CWC and NDVI together as one dataset.
Sample data and R scripts used for this study are available within Github: https://github.com/Iricek/Remote-Sensing-in-R.
Funding: This study was supported by the European Union's Caroline Herschel Framework Partnership Agreement on Copernicus User Uptake under grant agreement no. FPA 275/G/GRO/COPE/17/10042, project FPCUP (Framework Partnership Agreement on Copernicus User Uptake), Action 2019-2-49 "Developing supports for monitoring and reporting of GHG emissions and removals from land use, land use change, and forestry" (219/SI2.818795/07 (CLIMA)) and the UNCE "Charles University Research Center program UNCE/HUM/018".