Next Article in Journal
Effects of Land Certification for Rural Farm Households in Ethiopia: Evidence from Gozamin District, Ethiopia
Previous Article in Journal
An Evaluation of Public Spaces with the Use of Direct and Remote Methods
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Selecting Relevant Biological Variables Derived from Sentinel-2 Data for Mapping Changes from Grassland to Arable Land Using Random Forest Classifier

EO4Landscape Research Team, Department of Applied Geoinformatics and Cartography, Faculty of Science, Charles University, Albertov 6, 128 42 Prague 2, Czech Republic
*
Author to whom correspondence should be addressed.
Land 2020, 9(11), 420; https://doi.org/10.3390/land9110420
Submission received: 4 September 2020 / Revised: 19 October 2020 / Accepted: 24 October 2020 / Published: 30 October 2020

Abstract

:
Permanent grassland is one of the monitored categories of land use, land use change, and forestry (LULUCF) within the climate concept and greenhouse gas reduction policy (Regulation (EU) 2018/841). Mapping the conditions and changes of permanent grasslands is thus very important. The area of permanent grassland is strongly influenced by agricultural subsidy policies. Over the course of history, it is possible to trace different shares of permanent grassland within agricultural land and areas with significant changes from grassland to arable land. The need for monitoring permanent grassland and arable land has been growing in recent years. New subsidy policies determining farm management are beginning to affect land use, especially in countries that have joined the EU in recent waves. The large amount of freely available satellite data enables this monitoring to take place, mainly owing to data products of the Copernicus program. There are a large number of parameters (predictors) that can be calculated from satellite data, but finding the right combination is very difficult. This study presents a methodical, systematic procedure using the random forest classifier and its internal metric of mean decrease accuracy (MDA) to select the most suitable predictors to detect changes from permanent grassland to arable land. The relevance of suitable predictors takes into account the date of the satellite image, the overall accuracy of change detection, and the time required for calculations. Biological predictors, such as leaf area index (LAI), fraction absorbed photosynthetically active radiation (FAPAR), normalized difference vegetation index (NDVI), etc. were tested in the form of a time series from the Sentinel-2 satellite, and the most suitable ones were selected. FAPAR, canopy water content (CWC), and LAI seemed to be the most suitable. The proposed change detection procedure achieved a very high accuracy of more than 95% within the study site with an area of 8766 km2.

1. Introduction

The area and structure of agricultural land is significantly influenced by the development of methods and technologies of agricultural management. In recent years, in the context of agricultural management and the associated development of land use, it has influenced the effects of the global market and subsidy policies, especially agricultural, energy, and landscaping policies. These are mainly specific subsidy programs within the European Union, where subsidies are provided for agriculturally oriented lands for specific use of the given land unit [1,2,3,4]. As a result of newly introduced subsidy policies and changes in the food market, agricultural production and the structure of agricultural crops have changed in recent years. Mapping the structure of agricultural land use is very important [5], especially the structure of cultivated crops, changes in the share of individual agricultural areas, and the emphasis on changes between permanent grassland and arable land [5,6,7]. Permanent grasslands are important landscape components with high ecological value [8] and strong effect on reducing soil erosion [9]. Constant reduction in the area of these biomes has a very negative effect not only on the landscape as a whole in Europe [10] but also on the absorption of greenhouse gases due to the use of land for agricultural purposes [11].
It is obvious that changes from grassland to arable land not only play a major role in production change [12] but also influence the local landscape and rural development [13]. Grasslands are a very valuable habitat in terms of soil erosion reduction [14], which is a worldwide problem [15]. On the other hand, arable land is considered as an area with low soil erosion protection [16]. Changes from grassland to arable land may lead to biodiversity loss [17,18], especially in semiarid regions, where extreme climate conditions occur [19]. Mapping changes from grassland to arable land is essential to monitor both agriculture and climate conditions in order to find the right balance between needs of humankind and planetary sources [8]. Remote sensing technologies offer a fast and efficient way to map large-scale areas, especially in cases where a lot of satellite data are freely available, such as with Sentinel-2 satellites [7].
European authorities and national authorities of individual states, with the Czech Republic being no exception [20], provide considerable funding for the development of monitoring/control systems for the use of agricultural land. For these purposes, a specifically oriented geographic information system called the land parcel identification system (LPIS) was developed in the Czech Republic [21], in which registered land blocks (basic unit LPIS) are monitored on the basis of aerial and satellite images of very high spatial resolution. Information on users and specific use of the type of agricultural land is stored as attribute data of individual vector layers (http://eagri.cz/public/app/lpisext/lpis/verejny2/plpis/). The control and validation of these data take place on the basis of a field survey and using remote sensing methods. Monitoring changes within LPIS soil blocks offers effective standardization and time series from the satellite data, thereby providing a promising method to monitor the current use and changes of agricultural land as well as check for any deviations from the registered state [22,23].
It follows from the above that effectively monitoring the state of and changes in the use of agricultural land with the help of remote sensing is a highly relevant topic. From the point of view of both agricultural and environmentally oriented policies, there is high need for up-to-date information on the area and changes of permanent grassland and arable land [24,25,26,27], especially larger territorial units [28,29,30]. Recently, a number of methods have been developed to detect conditions and changes using remote sensing technologies. Freely available satellite images from Landsat and Sentinel-2 satellites, which contain extensive data archives [31,32,33,34], are widely used for this purpose. These archives allow time series suitable for monitoring land use/land cover changes to be created. Furthermore, there is a need to automate classification processes [35] so that satellite image time series can be processed quickly and efficiently with the required degree of accuracy. In this direction, certain progress has been made with data archives from the Sentinel-2 satellite, which, compared to Landsat, offers a higher spatial and temporal resolution as well as specifically designed spectral bands in the red edge [36,37,38,39]. The higher temporal resolution of Sentinel-2 data is suitable for use in a time series [40,41,42,43,44], which allows oscillations between individual phenological phases of vegetation to be captured.
To date, a wide range of specific studies have been published to map agricultural areas based on multitemporal data [22,40,41,45,46,47,48,49,50]. Although significant progress has been made in the accuracy of classification of agricultural lands, the mapping of changes in permanent grassland and arable land remains a relevant research challenge [51,52,53,54]. Mapping changes from permanent grassland to arable land is a relatively challenging matter as the change detection process requires capturing and accurately analyzing changes during individual phenological phases [55], which are highly variable due to the high heterogeneity in arable land/permanent grassland use and different local and climatic conditions [56].
Different spectral bands and vegetation indices are widely used in change detection methods using time data [39,57,58,59]. From this point of view, a time series of the satellite data can be divided according to whether one variable is used, i.e., a single vegetation index for a defined time period [41,60,61,62], or more variables are used, for example, the original spectral bands + vegetation index, optional radar data, and optical data [54,63,64,65]. In addition to the possibility of calculating traditionally conceived indices, data from Sentinel-2 offer calculations of specific indices that are freely available in the Sentinel Application Platform (SNAP) software [66]. These mainly consist of the following biological predictors: fraction absorbed photosynthetically active radiation (FAPAR), leaf area index (LAI), fractional vegetation cover (FCOVER), content of chlorophyll A in the plant leaves (CAB), and canopy water content (CWC) [39,66,67].
An important methodological aspect of change detection is the selection of predictors that best describe and detect the observed phenomena [68,69,70]. This is mainly the effect of the time acquisition of a satellite image within a time series and a specific biological predictor or spectral band as many predictors are correlated and bring redundant information to the identification of the observed phenomena [51,71,72]. To date, various methods have been developed and tested to determine suitable predictors. A specific example is the use of the nonparametric classifier random forest [73], which has proven its reliability in many studies in remote sensing [49,53,61,74,75,76,77]. The random forest classifier [73] is a nonparametric classifier that belongs to the family of decision trees. Its strengths include an overfitting resistance, the ability to classify large data volumes, and, above all, the identification of suitable predictors based on its internal mean decrease accuracy (MDA) and mean decrease Gini (MDG) index metrics [73]. Research and extensive discussions are currently underway as to which of the above metrics is more appropriate to reveal the relevance of the most important predictors [78,79,80]. In principle, both MDA and MDG values epend on mutual correlation of the input predictors but mainly on the number of decision trees [81]. In our case, 1000 decision trees were constantly set up. Another parameter is mtry, which determines the number of predictors in each node of the tree structure [73]. The mtry parameter is calculated as the square root of the total number of predictors [77] entering the classification process.
For this reason, according to some studies, it is recommended that the classification for the selection of significant predictors is repeated as the list of important predictors derived from one model of the random forest classifier may be skewed [80,82]. The final selection of suitable predictors based on the random forest classifier can take place empirically, where the most suitable predictors are selected based on the experience of the producer [78,83] or undertaken fully automatically [81,82,83,84]. The disadvantage of the fully automatic methods is that they are derived from a single model of the random forest classifier, which, according to Millard and Richardson [82], can lead to misleading results. Therefore, there is a continuing need to conduct studies to select suitable predictors and adapt them for specific purposes, such as detecting changes from permanent grassland to arable land.
The main objectives of this study were as follows:
  • developing and testing a methodology to detect changes from permanent grassland to arable land using the random forest classifier with the help of Sentinel-2 time series data;
  • identifying the most suitable predictors of the time series of images [22] on the basis of the selected MDA parameter;
  • finding statistically significant differences based on the overall average accuracy of the detection of changes between individual predictors [39]; and
  • discussing the achieved results and comparing them with similarly oriented studies.
The following research question was formulated:
  • Does one multitemporal predictor or a combination of them provide the best results within a time series?

2. Study Area

The study site is located in the Czech Republic, including the administrative units of the Pilsen, Karlovy Vary, and Ústí nad Labem regions. The central point of the site of interest is located at coordinates X = 358,773 and Y = 554,9188. The corner points have the coordinates Xmin = 300,475, Ymin = 550,1255, Xmax = 409,195, and Ymax = 559,9195. The information is all in the WGS84 UTM 33N coordinate system. The whole area is approximately 8766 km2 (Figure 1). The area of interest is vegetatively, topographically, and ecologically highly variable with frequent occurrence of permanent grassland and arable land. The study site was deliberately chosen due to the increased incidence of changes from permanent grassland to arable land. When selecting the study site, the criterion of localization within one scene was taken into account, but this requirement could not be met due to the occurrence of clouds. Changes from permanent grassland to arable land were detected based on the vector overlap of layers from the LPIS database. This vector overlay also served as a reference data source for training the random forest classifier.

3. Materials and Methods

3.1. Input Data

A time series of optical data from the Sentinel-2 satellite was used to process this study. A total of 5 satellite images were selected (Table 1). As the images were selected based on a predetermined criterion, i.e., images from 2015 to 2019, they had to be completely cloudless. In this case, the cloud cover was less than 5%. The satellite data were selected so that it was possible to find the highest possible number of cloudless images for each year between 2015 and 2019 during the growing season, which ranges from April to the end of September. According to Esch et al. [14], the most suitable period for capturing changes between grassland and arable land is immediately after the harvest of field crops, i.e., late summer or early autumn. A similar view is held by Možný et al. [85,86,87]. As one of the main tasks of this study was to determine the importance of time-lapse photography, it was necessary to include multitemporal data throughout the growing season. The time period from April to September was chosen due to a significant increase in temperature over recent years, especially in the Ore Mountains [88], which also includes part of the selected study site. Due to the general increase in the temperature, it can be assumed that all the field crops and grass species will be in the full phase of phenological growth at this time interval. Therefore, their spectral response will overlap, and it will be possible to better evaluate the effect of time-lapse satellite imagery. Potentially, it is possible to better evaluate sensitivity of the random forest classifier to the content of redundant information in the predictors that are subject to change detection. The images were downloaded from the United States Geological Survey (USGS) data archive [31] at Level 1C processing, with the images being manually converted to the surface reflectance. Table 1 documents the data used. In total, only 5 images were used as it was not possible to obtain a complete time series for each year (2015, 2016, 2017, 2018, and 2019) in a cloudless state between April and September. For each year, only completely cloudless images were obtained, as shown in Table 1. A time series was created from the selected images, which were subjected to partial analyses.
Vector layers from the LPIS database were used as the reference data. LPIS is a geographic information system that is specially designed for the monitoring and registration of agricultural activities in the Czech Republic. The basic unit of this system is a soil block, which is defined as the agricultural land on which crops are grown in a regular sequence, possibly under greenhouses or under a fixed and portable cover of an agricultural crop that is not grassland. This definition is based on the valid legislation of the Czech Republic under the Ministry of Agriculture. The vector data are continuously updated, and their accuracy is verified on the basis of field surveys. The LPIS database contains information on agricultural crops and areas, which are listed here as two thematic classes of land cover: arable land and permanent grassland.
Arable land is defined as an agricultural area on which grass or other fodder is not grown.
Permanent grassland is an area where grass has been grown for more than five years, and the land is subject to the European Union’s agricultural subsidy policy.
Arable land and permanent grassland polygons were extracted from LPIS and served as the reference data for further processing. All soil blocks containing permanent grassland and arable land were extracted from the LPIS vector database. Identification of the reference change soil blocks was performed on the basis of the spatial overlap between 2015 and 2019. The LPIS vector data was used as a mask. The two LPIS vector layers were exported on 30 June 2015 and 30 June 2019. The polygons thus obtained were used as reference to generate training data for the random forest classifier.

3.2. Methods Used

The first methodological step was to perform atmospheric corrections. The downloaded satellite data were converted to surface reflectance values using the Sen2Cor v. 2.5 module [89,90]. Subsequently, 20 m bands were resampled by the nearest neighbor method to the same pixel size as the bands of the visible band and near infrared (NIR) (8), i.e., 10 m. Then, all the biological predictors [67] were calculated, which are standardly implemented in the freely available software SNAP [66], including the vegetation index NDVI. Biological predictors were selected on the basis of several elaborated studies focused on plant physiology or biomass monitoring [91,92,93,94] or precision agriculture [95,96] as well as based on our previous experiences and testing [71]. The original spectral bands from the red edge bands (spectral bands 5, 6, and 7), were chosen as further input predictors. Table 2 gives a list of all the predictors used. A detailed description of the algorithm and the method of calculation of FAPAR, FCOVER, LAI, CAB, and CWC can be found in the relevant literature (the field references in Table 2).
After calculating the biological predictors, the classification of changes was performed using the random forest algorithm [73]. Because the random forest algorithm is intended for a supervised image classification, it was necessary to create training data in the form of training points. A total of 2000 training points were created by the method of stratified random sampling on the basis of change polygons from the LPIS database. The basic classes were defined as change surfaces and no-change surfaces. The category of change areas determined the change from permanent grassland to arable land. It should be noted that the category of change areas within the image was much less extensive compared to the territory without this type of change. However, the number of reference points for the change class was defined by the same number as that for the class where no changes took place, i.e., 1000 points each. The random forest classifier tends to place more weight on the classification class for which it has more training data available [105]. However, in the case of the random forest classifier, this is the recommended solution (see [105]).
The basic set of reference points for both categories was divided into two separate datasets at a rate of 50%, i.e., 50% of the training points from the basic set was used as the training points, and the remaining 50% was used as the validation points. A set of training and validation points was created individually for each separate change classification.
Subsequently, the changes from permanent grassland to arable land were classified in an iterative way for all the tested predictors (Table 2). The classification of the changes for each predictor and for the full time series was performed a total of 30 times. i.e., when all the data entered the classification process (see Table 1). Then, the time relevance was evaluated, i.e., the date the image was taken based on the highest average MDA. In this case, the MDA value took on specific values for each image capture date. The higher its value, the higher the relevance of the specific image capture date. The results of previous surveys have shown [82] that if the classification of changes using the random forest algorithm is performed only once, the MDA values cannot be considered stable; in other words, the specific MDA value cannot be assigned to a specific image capture date in general norm. In order to obtain relevant and stable MDA values, it is appropriate to start from a general trend, which can be obtained by simply averaging the MDA values.
The selection of the most important predictors was made empirically based on the highest average MDA after 30 classifications. The three most important predictors were selected for the univariate time series, and five of the most important predictors were selected for the multivariate time series. Another time series (Figures S2–S7) arose only from the best predictors based on the highest average MDA.
All the predictors used in Table 2 were tested as a univariate time series. The multivariate time series contained two time series. The first multivariate time series was created only from the original spectral bands comprising the red edge band, i.e., 5, 6, and 7, while the second included all the biological predictors, including the vegetation index NDVI. All methods were performed to compare differences in the overall accuracy of change as it was more appropriate to use only the original spectral bands or biological predictors (Table 2) or both. The most important predictors evaluated based on the highest average MDA were named “REDUCED”, which is always added at the end of the tested time series. The designation can be standardly observed in all the figures and tables where it occurs in this study.
To evaluate the suitability of the model, a criterion was chosen that a model with a smaller number of input predictors with approximately the same accuracy and less time required for the calculation will have higher weight than the one containing all the tested predictors (Table 2 and Figure S1). Therefore, taking into account the time aspect necessary to calculate all the repeated classifications of the changes, a total of 30 iterations per unit time were undertaken. For this reason, a “model score” was designed according to the following relationship:
S = T F 1
where T is the total time needed for the calculation (min), and F1 is the average F score for the change class (permanent grassland to arable land) after n iterations:
F 1 = 2 × ( U s e r s   A c c u r a c y × P r o d u c e r s   A c c u r a c y ) ( U s e r s   A c c u r a c y + P r o d u c e r s   A c c u r a c y ) ,
For a better interpretation, the quantity S was normalized between the range of values < 0; 1 > according to the following relationship:
S = x i m i n ( x ) m a x ( x ) m i n ( x ) ,
The quantity S is a simple ratio between the time required for the calculation and the overall average accuracy of classification. The closer the value of S is to 0, the better. The basic assumption is that the classification of the changes with fewer input predictors will take less time than the classification with the original number of predictors tested. The quantity S should, therefore, take on lower values if the total time needed to calculate the classification of the changes is lower.
A standard accuracy assessment was performed for each of the 30 classifications of the changes from permanent grassland to arable land. A standard error matrix was used as a criterion for evaluating the accuracy, from which the overall accuracy of classification, including user and processing accuracies, were calculated [106,107]. Although there are already more modern approaches to evaluating the accuracy of classification [108,109], we decided to use traditional approaches in evaluating the accuracy [106,107] because the evaluation of the overall accuracy was implemented in the process of classifying changes in the form of automatic scripts in the R programming language [110]. Nontraditional methods [108,109] are more difficult to calculate and thus prolong the total time taken for the necessary calculations. The experiments were performed on a desktop workstation with an AMD Ryzen 3900X processor and 64 GB RAM.
The process of detecting changes involved statistical evaluation of the results based on the overall accuracy using the Friedman test [111,112,113], which revealed statistically significant differences at the applied level of significance, α = 0.05. As this test does not have the ability to detect statistically significant differences between combinations of the tested biological predictors (Table 2 and Figure S1), for this purpose, it was necessary to use the so-called post hoc test, which is able to detect these differences. The Nemenyi test was used as the post hoc test [111]. A general scheme of the whole methodological process of this study is presented in Figure 2.
The last step was to calculate the change from grassland to arable land in the observed case study. The calculation was performed from the result database, and the total area of changes in km2 and the percentage of changes of the total area were calculated.

4. Results

A total of 22 different combinations of biological predictors were tested, including the original spectral bands 5, 6, and 7 belonging to the red edge region. Both the univariate time series (Figures S2–S6A) and multivariate time series (Figures S6B and S7) were tested, including their reduced form, which is shown in green boxes in Figures S2–S7.
A general overview of the importance of individual spectral bands and biological predictors (hereinafter referred to as predictors) is provided in Figures S2–S7, which shows the average values of the MDA for the original dataset. The green box indicates a targeted selection of the most important predictors that were used to further the change classification experiments and then compare them with the original dataset based on a statistical test. From the box graphs, it is possible to derive the degree of relevance of the individual predictors. Because the individual predictors were tested in the form of univariate and multivariate time series, the random forest classifier was, in this case, based on the highest average MDA value able to automatically derive the most suitable date within the time series of the partial predictor.
The most frequent acquisition date that was marked as the most important was the August acquisition date from 2015 (Figures S2, S4A, S5B, S6, and S7). These were the fifth spectral band, the sixth spectral band, FCOVER, CWC, NDVI, the multivariate time series from spectral bands including red edge, and the multivariate time series only containing biological predictors. In the case of the seventh spectral band (Figure S3), the April date from 2018 was chosen as the most important, while the June date from 2019 was chosen as the most suitable in other cases (Figures S3B and S4B), namely, the time series of predictors FAPAR and LAI. The September date of 2018 was chosen in only one case for CAB (Figure S5A). The second and third positions were, to a greater extent, alternately occupied by the September date from 2018 and the August date from 2016, which supplemented the June date from 2019.
Figure 3 includes scatter plots that show the scatter of the overall accuracy values, including the user and processing accuracies, after 30 replications of the classification. Figure 3A summarizes the overall accuracies. The results can be divided into two basic groups. The first group had overall average accuracy around 99% and included the fifth and seventh spectral bands of the satellite Sentinel-2, FAPAR, FCOVER, CWC, and the multivariate time series consisting of spectral bands 5, 6, and 7. Some partial classification models achieved 100% accuracy, including the cases of the vegetation index NDVI and FAPAR. The second group were biological predictors whose variance in values reached the lowest values of about 97% in the overall accuracy of classification. These were generally the reduced models (Figures S2–S7) of all the tested biological predictors, including the spectral bands, where, in general, a decrease in the overall accuracy of classification of changes and an increase in the variance of values was evident. The highest variance and decrease in values of the overall accuracy of classification was in the case of FCOVER and the vegetation index NDVI. In contrast, the most stable trend regarding the overall accuracy of classification was achieved by the biological predictors FAPAR and LAI and the time series composed of the original spectral bands from the Sentinel-2 satellite.
In the case of processing accuracy for the class of observed changes (Figure 3B), a similar trend was observed as that of the overall accuracy, where, in three specific cases, it was evident (the tested original red edge bands) that it was possible to achieve 100% accuracy with a mean value of around 99.7%. The lowest values were reached in the case of the reduced model for the multivariate time series from the original spectral bands (Figure S6B). It can be seen that the producer’s accuracy values of the change class were similar for all the tested predictors (Table 2) and were generally less scattered than the producer’s accuracy of the no-change class (Figure 3C). The highest producer’s accuracy for the change class (Figure 3B) was achieved by classifying the seventh spectral band together with the lowest variance of the processing accuracy values, and it can therefore be described as the most suitable predictor in this direction.
Regarding the processing accuracy in the no-change class (Figure 3C), similar trends were observed, i.e., a decrease in the accuracy of the classification models where the input predictors were reduced (green boxes in Figures S2–S7). Here, the lowest values of producer’s accuracy were below the limit of 95%, specifically in the cases of the biological predictor FCOVER and the seventh spectral band. It should be noted that these are still very high producer’s accuracy values. The biological predictor CAB in its reduced form was certainly a surprise as it showed the smallest variance in the processing accuracy values. Figure 3D summarizes variance of the user precision values for the change class, where the best predictors were the biological predictors LAI, FCOVER, and FAPAR. Here, the highest decrease in values were observed in the reduced models for the biological predictors FAPAR, FCOVER, and CAB and the vegetation index NDVI. There was a noticeable difference of up to 5% between the maximum and minimum values in user accuracy of the change class. The last case (Figure 3E), i.e., user accuracy for the no-change class, generally provided very balanced results, both in the case of the original predictors and the reduced ones. Again, there were cases that achieved 100% accuracy involving a wide range of predictors, including the fifth spectral band, FCOVER in a reduced form, the NDVI vegetation index, etc. Further details can be found by comparing the results in Figure 3.
The normalized metric S (Figure 4) is a function of the time needed to calculate the classification and F1 score of the change class (change from grassland to arable land), according to which it is possible to best evaluate the relevance of individual classification models (Figures S2–S7). The lowest values of this quantity were reached by the biological predictors with models in a reduced form (green boxes in Figures S2–S7), namely, FAPAR, CWC, LAI, NDVI, and FCOVER. All the other cases showed, more or less, the same results, and it was clear that metric S (Figure 4) took into account the time aspect very significantly and allowed evaluation of the potential relevance of the partial predictor for the sought change class.
The statistical analysis based on the Friedman test [111,112,113] revealed statistically significant differences between individual biological predictors, which occurred in a large number of cases (Figure S1). In general, statistically significant differences occurred between the reduced models (Figures S2–S7) and the original datasets of the tested predictors. An example is the statistically significant difference between the biological predictor FAPAR and the fifth spectral band in a reduced form (Figures S2–S7). Some other examples are the statistically significant differences between the vegetation index NDVI in its original form and its reduced model (Figure S6A). The same was true for the biological predictor LAI as well as for FAPAR and FCOVER. In general, no statistically significant differences were observed for the original predictors. Other cases of statistically significant differences include the biological predictor LAI and the dataset of the vegetation indices (Figure S7) or a comparison of the dataset of the original spectral bands (Figures S6B and S7) with the fifth, sixth, and seventh spectral bands in a reduced form and then the biological predictors FAPAR and FCOVER in a reduced form.
A large number of statistically significant differences was found, and their general overview is offered in Figure S1. This fact shows that the null hypothesis does not apply, which assumes that the overall accuracies of the original and reduced predictors (Figures S2–S7) do not provide identical results. The statistical test confirmed that if we purposefully select a certain number of predictors on the basis of the highest average MDA, we will obtain different results that are statistically significant.
Figure 5A demonstrates the spatial distribution of changes, while 5B shows the spatial distribution of the probabilities of the expected changes within the soil blocks. In addition to the classic assignment of thematic information contained in the training data, the random forest classifier also provides the probability of a given classification class, i.e., in this case, the probability of a change from permanent grassland to arable land. Based on this, it is possible to estimate the degree of uncertainty of change detection within each specific soil block based on the assigned probability. Table 3 summarizes the areas of changes/no changes and classes of the probability. The changes from permanent grassland to arable land were detected on 28 km2 (0.8% of the total observed area). The classes with the highest probability of change (61–80% and 81–100%) covered 12.1 km2. When comparing the outputs in Figure 5, it is clear that the largest share of changed soil blocks was in the probability range of 41–60%, while there were also a large share of soil blocks that had an expected change of 61–80% and 81–100%. The random forest classifier thus provides a tool for expressing the degree of uncertainty of both the change and no-change class, which is a very suitable feature, especially in the case of changes in vegetation areas where the degree of uncertainty of the detected changes is relatively high due to phenological phases of vegetation.

5. Discussion

The main objective of this study was to develop and test a methodology of change detection from permanent grassland to arable land using the random forest classifier and then to identify the most suitable predictors of the time series of images on the basis of the selected MDA parameter.
For this reason, the random forest classifier and its internal metric MDA were used and tested. This study took into account not only the average MDA of the input predictors (time frame), which, in our case, were the biological predictors (Table 2) and the original spectral bands, but also the time required to calculate the classification changes combined with the overall accuracy and the proposed S metric (see Equations (1)−(3)). There were a total of three criteria according to which the relevance of individual predictors were evaluated: (1) the time acquisition of the satellite images; (2) the speed of the calculation of change detection in accordance with the overall accuracy of the change class; and (3) the overall accuracy of change detection, which was statistically evaluated based on the Friedman test [111,112,113].
The overall accuracy of the detection of changes from permanent grassland to arable land was subject to a statistical evaluation based on the Friedman test (Figure S1). Statistically significant differences between individual cases were detected (Table 2 and Figure S1). Statistically significant differences were detected between the input predictors in a comprehensive form and their reduced models (green boxes in Figures S2–S7). The reduced model achieved a high classification accuracy ranging from 95 to 100%, including the user’s and producer’s accuracies for both the change class and the no-change class (Figure 3). If we compare our results with related studies, Xu et al. [49] evaluated changes between 1984 and 2016 in selected areas in Africa using the random forest classifier and Landsat data and achieved accuracy oscillating between 56 and 94%. Weeks et al. [53] used Landsat data for mapping grassland changes in New Zealand. The authors applied several traditional change detection techniques, such as image differencing, postclassification change detection, and visual interpretation, and reported lower accuracies of 47 to 56% in the case of image differencing and postclassification change detection and up to 98% accuracy in the case of visual interpretation. The random forest classifier was also used for mapping lowland native grassland communities in Tasmania based on Landsat and Worldview 2 data [75]. The authors of this study reached overall accuracies between 54 and 87%.
To evaluate the stability of the importance of input predictors, it is appropriate to repeat the image classification several times [80,82]. Based on this process, the best single model is selected using MDA or MDG. In some cases, it is not clear whether it is more appropriate to use MDA or MDG to detect the relevant predictors [83]. There are studies that say MDG is more appropriate [72], while others say MDA is more appropriate [73,82]. This study used MDA as a criterion to evaluate the time the images were taken (importance of the predictors) and an iterative approach to classify changes and generalized information based on the average MDA. The results confirm that MDA is able to select relevant predictors, and it can therefore be reliably used for feature selection based on the methodology proposed in this study. Proper selection of the relevance of biological variables tested in this study can be derived from stable MDA values, and the random forest classifier may respond relatively reliably to changes in the values of the input predictors. Thus, one of the goals of this study was fulfilled.
In this study, a total of 22 datasets consisting of individual biological predictors (Table 2) were tested, including the red edge bands, either in a univariate or multivariate form. The five most important predictors (FAPAR, CWC, LAI, NDVI, and FCOVER in a reduced form) can be marked as the most suitable based on the results of metric S (Figure 4). These biological predictors (Table 2) can contribute to the overall overview of the plant physiology in terms of photosynthesis, respiration, and transpiration [114,115] as well as biomass content [91]. Many studies have concentrated on modelling the relationship between biological variables, such as LAI or FAPAR [116,117,118,119]. In comparison to the traditionally used vegetation indices derived from remote sensing data, biological predictors provide complex and detailed information about physiological processes in plants. It was demonstrated that biological predictors (Table 2) provide reliable and accurate results and can serve as an alternative to standard vegetation indices, such as NDVI. This information is very valuable in several applications, e.g., in precision farming [95,96].
Among the most similar studies that have addressed determination of the best predictor for the purpose of detecting changes is the study by Klouček et al. [51]. To detect changes from permanent grassland to arable land, the authors used a pair of scenes from the Landsat 8 satellite between 2013 and 2016. The method of logistic regression was used to select the best predictors, and subsequent changes were classified using the support vector machines algorithm [103,104,105]. The best change detection was performed with 89.80% accuracy (Kappa 0.63).
The correct selection of the bitemporal images that replicate the phenological phases of vegetation may seem to be a suitable alternative, as recommended by Šandera and Štych [71], who, in their study, used a pair of Landsat images to detect changes similar to the above case [51] using boosting methods to classify changes from arable land to permanent grassland. The highest overall accuracy of 89.51% was achieved.
Our study used multitemporal Sentinel-2 data, including selected spectral bands comprising the red edge bands. The most valuable findings in this study can be treated as demonstration of the effectiveness of the Sentinel-2 satellite and its importance in agricultural mapping. Biological predictors have been freely available for a large number of users since the Sentinel-2 satellite launch. Our study also indirectly shows that optical data from the Sentinel-2 satellite with a higher spatial and temporal resolution are able to achieve high accuracy in detecting changes from grassland to arable land. On the other hand, the Sentinel-2 data have a limit to provide cloudless images for such a large area as our case study in the vegetation period. A similar study in which multitemporal Sentinel-2 data was used is the study of Grabska et al. [120], where the authors tried to reveal the time aspect of the acquisition of a satellite image and its overall effect on the accuracy of classification of the species composition of forest stands in mountainous terrain.
The reference data from the LPIS database can be considered the main shortcoming of our study, with errors in the form of outdated information in this database also being possible. In situ checks are performed randomly and not systematically, and 100% accuracy of all the LPIS data cannot therefore be guaranteed. This shortcoming can be eliminated in the future on the basis of a systematic field survey conducted by ourselves, where selected soil blocks will be monitored at regular intervals and supplemented with knowledge from the LPIS database.
The method we have developed based on the random forest classifier and its ability to predict probabilities of potential changes (Figure 5) can serve as a source for targeted in situ inspections during the vegetation season. These targeted inspections can reveal potential false changes in the final landcover map/LPIS database and sensitivity of the random forest classifier to noise. The proposed procedure can be used to partially automate change detection and especially to update the LPIS database.

6. Conclusions

In this study, it was demonstrated that the random forest classifier and its internal MDA metric are able to successfully detect relevant predictors for changes from permanent grassland to arable land. It was demonstrated that if a multivariate time series of selected predictors is used, which are subsequently reduced to a selected number of the most important ones, the relevant results can be determined using the proposed normalized metric S. Empirical selection of the most important predictors in the form of very high classification accuracy of change detection from permanent grassland to arable land is of great significance. FAPAR, FCOVER, LAI, CWC, and NDVI can be considered the most relevant predictors here. Overall, all the set goals of this study were met. Future research will implement average MDG as an additional metric to evaluate the relevance of predictors provided by the random forest classifier. The proposed procedure can be used to partially automate change detection and especially to update the LPIS database. This study was performed in the Czech Republic. It would be desirable to carry out testing and evaluation both in other countries with similar climatic conditions and in regions with different geographical conditions. The input data, scripts, and other tools that were used to carry out this study, including instructions on how to use them, is freely available within the Github repository.

Supplementary Materials

The following are available online at https://1drv.ms/u/s!Ah28WO5IO-6ZgmZuBugDHHpV18BR?e=gRmG8J. Figure S1: Pairwise comparison of the p values of the tested biological predictors based on the post-hoc Nemenyi test at 95% level of significance, Figure S2: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for Spectral Bands 5 and 6 of Sentinel 2 Satellite, Figure S3: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for Spectral Bands 7 and FAPAR, Figure S4: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for FCOVER and LAI, Figure S5: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for CAB and CWC, Figure S6: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for NDVI and Spectral Bands 5, 6, 7 together as one dataset, Figure S7: Average Mean Decrease Accuracy after 30 iterations of classification of changes from permanent grasslands to arable lands for FAPAR, FCOVER, CAB, CWC and NDVI together as one dataset.
Sample data and R scripts used for this study are available within Github: https://github.com/Iricek/Remote-Sensing-in-R.

Author Contributions

Conceptualization, J.Š. and P.Š.; methodology, J.Š. and P.Š; software, J.Š.; validation, J.Š. and P.Š.; formal analysis, J.Š.; investigation, J.Š. and P.Š.; resources, J.Š.; data curation, J.Š.; writing—original draft preparation, J.Š. and P.Š.; visualization, J.Š.; supervision, P.Š.; funding acquisition, P.Š.; senior author, P.Š. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the European Union’s Caroline Herschel Framework Partnership Agreement on Copernicus User Uptake under grant agreement no. FPA 275/G/GRO/COPE/17/10042, project FPCUP (Framework Partnership Agreement on Copernicus User Uptake), Action 2019-2-49 “Developing supports for monitoring and reporting of GHG emissions and removals from land use, land use change, and forestry” (219/SI2.818795/07 (CLIMA)) and the UNCE “Charles University Research Center program UNCE/HUM/018”.

Acknowledgments

We would like to thank the European Union’s Caroline Herschel Framework Partnership Agreement on Copernicus User Uptake under grant agreement no. FPA 275/G/GRO/COPE/17/10042, project FPCUP (Framework Partnership Agreement on Copernicus User Uptake), Action 2019-2-49 “Developing supports for monitoring and reporting of GHG emissions and removals from land use, land use change, and forestry” (219/SI2.818795/07 (CLIMA)) for their support. We would like to thank the UNCE “Charles University Research Center program UNCE/HUM/018” for their support. We would like to thank the three anonymous reviewers and academic editor for their suggestions and feedback, which helped us to improve the quality of the article.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. An Overview of the Common Agricultural Policy (CAP) in Ireland and Potential Regional and Sectoral Implications of the Future Reforms; Dublin. 2018. Available online: https://data.oireachtas.ie/ie/oireachtas/parliamentaryBudgetOffice/2018/2018-08-17_an-overview-of-the-common-agricultural-policy-cap-in-ireland-and-potential-regional-and-sectoral-implications-of-future-reforms_en.pdf (accessed on 15 June 2020).
  2. European Commission Food, Farming, Fisheries. Available online: https://ec.europa.eu/info/food-farming-fisheries (accessed on 4 May 2020).
  3. European Commission Income Support. Available online: https://ec.europa.eu/info/food-farming-fisheries/key-policies/common-agricultural-policy/income-support (accessed on 4 May 2020).
  4. European Commission Sustainable Land Use (Greening). Available online: https://ec.europa.eu/info/food-farming-fisheries/key-policies/common-agricultural-policy/income-support/greening_en (accessed on 4 May 2020).
  5. Soons, M.B.; Messelink, J.H.; Jongejans, E.; Heil, G.W. Habitat fragmentation reduces grassland connectivity for both short-distance and long-distance wind-dispersed forbs. J. Ecol. 2005, 93, 1214–1225. [Google Scholar] [CrossRef]
  6. Toland, J.; Jones, W.; Eldridge, J.; Thorpe, E.; O’hara, E. LIFE and Europe’s Grasslands: Restoring a Forgotten Habitat; European Communities: Brussels, Belgium, 2008; ISBN 978-92-79-10159-5. [Google Scholar]
  7. Schmidt, S.; Alewell, C.; Meusburger, K. Mapping spatio-temporal dynamics of the cover and management factor (C-factor) for grasslands in Switzerland. Remote Sens. Environ. 2018, 211, 89–104. [Google Scholar] [CrossRef]
  8. Weigelt, A.; Weisser, W.W.; Buchmann, N.; Scherer-Lorenzen, M. Biodiversity for multifunctional grasslands: Equal productivity in high-diversity low-input and low-diversity high-input systems. Biogeosciences 2009, 6, 1695–1706. [Google Scholar] [CrossRef] [Green Version]
  9. Auerswald, K.; Fiener, P. Soil organic carbon storage following conversion from cropland to grassland on sites differing in soil drainage and erosion history. Sci. Total Environ. 2019, 661, 481–491. [Google Scholar] [CrossRef] [PubMed]
  10. Ceballos, G.; Davidson, A.; List, R.; Pacheco, J.; Manzano-Fischer, P.; Santos-Barrera, G.; Cruzado, J. Rapid decline of a grassland system and its ecological and conservation implications. PLoS ONE 2010, 5, e8562. [Google Scholar] [CrossRef] [Green Version]
  11. Ammann, C.; Neftel, A.; Jocher, M.; Fuhrer, J.; Leifeld, J. Effect of management and weather variations on the greenhouse gas budget of two grasslands during a 10-year experiment. Agric. Ecosyst. Environ. 2020, 292, 106814. [Google Scholar] [CrossRef]
  12. Wimberly, M.C.; Janssen, L.L.; Hennessy, D.A.; Luri, M.; Chowdhury, N.M.; Feng, H. Cropland expansion and grassland loss in the eastern Dakotas: New insights from a farm-level survey. Land Use Policy 2017, 63, 160–173. [Google Scholar] [CrossRef]
  13. Verma, P.; Raghubanshi, A.S. Rural development and land use land cover change in a rapidly developing agrarian South Asian landscape. Remote Sens. Appl. Soc. Environ. 2019, 14, 138–147. [Google Scholar] [CrossRef]
  14. Schmidt, S.; Alewell, C.; Meusburger, K. Monthly RUSLE soil erosion risk of Swiss grasslands. J. Maps 2019, 15, 247–256. [Google Scholar] [CrossRef] [Green Version]
  15. Ahmad, N.S.B.N.; Mustafa, F.B.; Gideon, D. A systematic review of soil erosion control practices on the agricultural land in Asia. Int. Soil Water Conserv. Res. 2020, 8, 103–115. [Google Scholar] [CrossRef]
  16. Eshel, G.; Egozi, R.; Goldwasser, Y.; Kashti, Y.; Fine, P.; Hayut, E.; Kazukro, H.; Rubin, B.; Dar, Z.; Keisar, O.; et al. Benefits of growing potatoes under cover crops in a Mediterranean climate. Agric. Ecosyst. Environ. 2015, 211, 1–9. [Google Scholar] [CrossRef]
  17. Sperry, K.P.; Hilfer, H.; Lane, I.; Petersen, J.; Dixon, P.M.; Sullivan, L.L. Species diversity and dispersal traits alter biodiversity spillover in reconstructed grasslands. J. Appl. Ecol. 2019, 56, 2216–2224. [Google Scholar] [CrossRef]
  18. Pool, D.B.; Panjabi, A.O.; Macias-Duarte, A.; Solhjem, D.M. Rapid expansion of croplands in Chihuahua, Mexico threatens declining North American grassland bird species. Biol. Conserv. 2014, 170, 274–281. [Google Scholar] [CrossRef]
  19. Clay, D.E.; Clay, S.A.; Reitsma, K.D.; Dunn, B.H.; Smart, A.J.; Carlson, G.G.; Horvath, D.; Stone, J.J. Does the conversion of grasslands to row crop production in semi-arid areas threaten global food supplies? Glob. Food Sec. 2014, 3, 22–30. [Google Scholar] [CrossRef] [Green Version]
  20. SZIF We Support the Czech Agriculture. Available online: https://www.szif.cz/en?setCookie=true (accessed on 15 May 2020).
  21. Trojáček, P. New Land Parcel Identification System for Agricultural Subsidies in the Czech Republic. In Geoinformation for European-Wide Integration, Proceedings of the 22 nd EARSeL Symposium, Prague, Czech Republic, 4–6 June 2002; IOS Press: Amsterdam, The Netherlands, 2002. [Google Scholar]
  22. Esch, T.; Metz, A.; Marconcini, M.; Keil, M. Combined use of multi-seasonal high and medium resolution satellite imagery for parcel-related mapping of cropland and grassland. Int. J. Appl. Earth Obs. Geoinf. 2014, 28, 230–237. [Google Scholar] [CrossRef]
  23. SZIF National Subsidies. Available online: https://www.szif.cz/en/national_subsidies (accessed on 15 May 2020).
  24. Barrett, B.; Nitze, I.; Green, S.; Cawkwell, F. Assessment of multi-temporal, multi-sensor radar and ancillary spatial data for grasslands monitoring in Ireland using machine learning approaches. Remote Sens. Environ. 2014, 152, 109–124. [Google Scholar] [CrossRef] [Green Version]
  25. Hajj, M.E.; Baghdadi, N.; Belaud, G.; Zribi, M.; Cheviron, B.; Courault, D.; Hagolle, O.; Charron, F. Irrigated grassland monitoring using a time series of terraSAR-X and COSMO-skyMed X-Band SAR Data. Remote Sens. 2014, 6, 10002–10032. [Google Scholar] [CrossRef] [Green Version]
  26. Dusseux, P.; Corpetti, T.; Hubert-Moy, L.; Corgne, S. Combined use of multi-temporal optical and radar satellite images for grassland monitoring. Remote Sens. 2014, 6, 6163–6182. [Google Scholar] [CrossRef] [Green Version]
  27. Carlier, L.; Rotar, I.; Vlahova, M.; Vidican, R. Importance and functions of grasslands. Not. Bot. Horti Agrobot. Cluj-Napoca 2009, 37, 25. [Google Scholar]
  28. Lobell, D.B.; Field, C.B. Global scale climate--crop yield relationships and the impacts of recent warming. Environ. Res. Lett. 2007, 2, 14002. [Google Scholar] [CrossRef]
  29. Vertès, F.; Hatch, D.; Velthof, G.; Taube, F.; Laurent, F.; Loiseau, P.; Recous, S. Short-Term and Cumulative Effects of Grassland Cultivation on Nitrogen and Carbon Cycling in Ley-Arable Rotations. In Permanent and Temporary Grassland: Plant, Environment and Economy, Proceedings of the 14th Symposium of the European Grassland Federation, Ghent, Belgium, 3–5 September 2007; Instituut voor Landbouw- en Visserijonderzoek (ILVO): Merelbeke, Belgium, 2007; pp. 227–246. [Google Scholar]
  30. Arrouays, D.; Deslais, W.; Badeau, V. The carbon content of topsoil and its geographical distribution in France. Soil Use Manag. 2001, 17, 7–11. [Google Scholar] [CrossRef]
  31. USGS Earth Explorer. Available online: https://earthexplorer.usgs.gov/ (accessed on 20 August 2020).
  32. Wulder, M.A.; White, J.C.; Goward, S.N.; Masek, J.G.; Irons, J.R.; Herold, M.; Cohen, W.B.; Loveland, T.R.; Woodcock, C.E. Landsat continuity: Issues and opportunities for land cover monitoring. Remote Sens. Environ. 2008, 112, 955–969. [Google Scholar] [CrossRef]
  33. Woodcock, C.E.; Macomber, S.A.; Pax-Lenney, M.; Cohen, W.B. Monitoring large areas for forest change using Landsat: Generalization across space, time and Landsat sensors. Remote Sens. Environ. 2001, 78, 194–203. [Google Scholar] [CrossRef] [Green Version]
  34. Drusch, M.; Del Bello, U.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.; Martimort, P.; et al. Sentinel-2: ESA’s optical high-resolution mission for GMES operational services. Remote Sens. Environ. 2012, 120, 25–36. [Google Scholar] [CrossRef]
  35. Szostak, M.; Hawryło Pawełand Piela, D. Using of Sentinel-2 images for automation of the forest succession detection. Eur. J. Remote Sens. 2018, 51, 142–149. [Google Scholar] [CrossRef]
  36. Delegido, J.; Verrelst, J.; Alonso, L.; Moreno, J. Evaluation of sentinel-2 red-edge bands for empirical estimation of green LAI and chlorophyll content. Sensors 2011, 11, 7063–7081. [Google Scholar] [CrossRef] [Green Version]
  37. Fernández-Manso, A.; Fernández-Manso, O.; Quintano, C. SENTINEL-2A red-edge spectral indices suitability for discriminating burn severity. Int. J. Appl. Earth Obs. Geoinf. 2016, 50, 170–175. [Google Scholar] [CrossRef]
  38. Qiu, S.; He, B.; Yin, C.; Liao, Z. Assessments of Sentinel 2 vegetation red-edge spectral bands for improving land cover classification. Proc. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 871–874. [Google Scholar] [CrossRef] [Green Version]
  39. Bannari, A.; Morin, D.; Bonn, F.; Huete, A.R. A review of vegetation indices. Remote Sens. Rev. 1995, 13, 95–120. [Google Scholar] [CrossRef]
  40. Csillik, O.; Belgiu, M. Cropland Mapping from Sentinel-2 Time Series Data Using Object-Based Image Analysis. In Proceedings of the 20th AGILE International Conference on Geographic Information Science Societal Geo-Innovation Celebrating, Wageningen, The Netherlands, 9–12 May 2017; Volume 20. [Google Scholar]
  41. Belgiu, M.; Csillik, O. Sentinel-2 cropland mapping using pixel-based and object-based time-weighted dynamic time warping analysis. Remote Sens. Environ. 2018, 204, 509–523. [Google Scholar] [CrossRef]
  42. Dong, Q.; Chen, X.; Chen, J.; Zhang, C.; Liu, L.; Cao, X.; Zang, Y.; Zhu, X.; Cui, X. Mapping Winter Wheat in North China Using Sentinel 2A/B Data: A Method Based on Phenology-Time Weighted Dynamic Time Warping. Remote Sens. 2020, 12, 1274. [Google Scholar] [CrossRef] [Green Version]
  43. Xu, Y.; Yu, L.; Cai, Z.; Zhao, J.; Peng, D.; Li, C.; Lu, H.; Yu, C.; Gong, P. Exploring intra-annual variation in cropland classification accuracy using monthly, seasonal, and yearly sample set. Int. J. Remote Sens. 2019, 40, 8748–8763. [Google Scholar] [CrossRef]
  44. de Oliveira Santos, C.; Augusto Camargo Lamparelli, R.; Kelly Dantas Araújo Figueiredo, G.; Dupuy, S.; Boury, J.; Luciano AC dos, S.; Torres R da, S.; Le Maire, G. Classification of crops, pastures, and tree plantations along the season with multi-sensor image time series in a subtropical agricultural region. Remote Sens. 2019, 11, 334. [Google Scholar] [CrossRef] [Green Version]
  45. Stefanski, J.; Mack, B.; Waske, B. Optimization of object-based image analysis with random forests for land cover mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2492–2504. [Google Scholar] [CrossRef]
  46. Chen, J.; Liu, H.; Chen, J.; Peng, S. Trend forecast based approach for cropland change detection using Lansat-derived time-series metrics. Int. J. Remote Sens. 2018, 39, 7587–7606. [Google Scholar] [CrossRef]
  47. Yin, H.; Prishchepov, A.V.; Kuemmerle, T.; Bleyhl, B.; Buchner, J.; Radeloff, V.C. Mapping agricultural land abandonment from spatial and temporal segmentation of Landsat time series. Remote Sens. Environ. 2018, 210, 12–24. [Google Scholar] [CrossRef]
  48. Pflugmacher, D.; Rabe, A.; Peters, M.; Hostert, P. Mapping pan-European land cover using Landsat spectral-temporal metrics and the European LUCAS survey. Remote Sens. Environ. 2019, 221, 583–595. [Google Scholar] [CrossRef]
  49. Xu, Y.; Yu, L.; Zhao, F.R.; Cai, X.; Zhao, J.; Lu, H.; Gong, P. Tracking annual cropland changes from 1984 to 2016 using time-series Landsat images with a change-detection and post-classification approach: Experiments from three sites in Africa. Remote Sens. Environ. 2018, 218, 13–31. [Google Scholar] [CrossRef]
  50. Atzberger, C. Advances in remote sensing of agriculture: Context description, existing operational monitoring systems and major information needs. Remote Sens. 2013, 5, 949–981. [Google Scholar] [CrossRef] [Green Version]
  51. Klouček, T.; Moravec, D.; Komárek, J.; Lagner, O.; Štych, P. Selecting appropriate variables for detecting grassland to cropland changes using high resolution satellite data. PeerJ 2018, 6, e5487. [Google Scholar] [CrossRef]
  52. Chen, S.; Rao, P. Land degradation monitoring using multi-temporal Landsat TM/ETM data in a transition zone between grassland and cropland of northeast China. Int. J. Remote Sens. 2008, 29, 2055–2073. [Google Scholar] [CrossRef]
  53. Weeks, E.S.; Ausseil, A.-G.E.; Shepherd, J.D.; Dymond, J.R. Remote sensing methods to detect land-use/cover changes in N ew Z ealand’s ‘indigenous’ grasslands. N. Z. Geog. 2013, 69, 1–13. [Google Scholar] [CrossRef]
  54. Yang, X.; Smith, A.M.; Hill, M.J. Updating the grassland vegetation inventory using change vector analysis and functionally-based vegetation indices. Can. J. Remote Sens. 2017, 43, 62–78. [Google Scholar] [CrossRef]
  55. Pakzad, K.; Growe, S.; Heipke, C.; Liedtke, C.-E. Multitemporale Luftbildinterpretation: Strategie und Anwendung. KI 2001, 15, 10–16. [Google Scholar]
  56. Lurette, A.; Aubron, C.; Moulin, C.-H. A simple model to assess the sensitivity of grassland dairy systems to scenarios of seasonal biomass production variability. Comput. Electron. Agric. 2013, 93, 27–36. [Google Scholar] [CrossRef]
  57. Rouse, J.W., Jr.; Haas, R.H.; Schell, J.A.; Deering, D.W. Monitoring vegetation systems in the Great Plains with ERTS. In Proceedings of the Third Earth Resources Technology Satellite-1 Symposium, Washington, DC, USA, 10–14 December 1973; Volume 1, pp. 309–317. [Google Scholar]
  58. Nguyen, L.H.; Joshi, D.R.; Henebry, G.M. Improved change detection with trajectory-based approach: Application to quantify cropland expansion in South Dakota. Land 2019, 8, 57. [Google Scholar] [CrossRef] [Green Version]
  59. Nguyen, L.H.; Henebry, G.M. Characterizing land use/land cover using multi-sensor time series from the perspective of land surface phenology. Remote Sens. 2019, 11, 1677. [Google Scholar] [CrossRef] [Green Version]
  60. Xu, P.; Niu, Z.; Tang, P. Comparison and assessment of NDVI time series for seasonal wetland classification. Int. J. Digit. Earth 2018, 11, 1103–1131. [Google Scholar] [CrossRef]
  61. Choudhary, K.; Shi, W.; Boori, M.S.; Corgne, S. Agriculture Phenology Monitoring Using NDVI Time Series Based on Remote Sensing Satellites: A Case Study of Guangdong, China. Opt. Mem. Neural Netw. 2019, 28, 204–214. [Google Scholar] [CrossRef]
  62. Pan, Z.; Huang, J.; Zhou, Q.; Wang, L.; Cheng, Y.; Zhang, H.; Blackburn, G.A.; Yan, J.; Liu, J. Mapping crop phenology using NDVI time-series derived from HJ-1 A/B data. Int. J. Appl. Earth Obs. Geoinf. 2015, 34, 188–197. [Google Scholar] [CrossRef] [Green Version]
  63. Yin, L.; You, N.; Zhang, G.; Huang, J.; Dong, J. Optimizing Feature Selection of Individual Crop Types for Improved Crop Mapping. Remote Sens. 2020, 12, 162. [Google Scholar] [CrossRef] [Green Version]
  64. Ienco, D.; Interdonato, R.; Gaetano, R.; Minh, D.H.T. Combining Sentinel-1 and Sentinel-2 Satellite Image Time Series for land cover mapping via a multi-source deep learning architecture. ISPRS J. Photogramm. Remote Sens. 2019, 158, 11–22. [Google Scholar] [CrossRef]
  65. Bioresita, F.; Puissant, A.; Stumpf, A.; Malet, J.-P. Fusion of Sentinel-1 and Sentinel-2 image time series for permanent and temporary surface water mapping. Int. J. Remote Sens. 2019, 40, 9026–9049. [Google Scholar] [CrossRef]
  66. ESA SNAP Sentinel Application Platform v6.2. 2019. Available online: http://step.esa.int/main/download/snap-download/ (accessed on 15 June 2020).
  67. Weiss, M.; Baret, F. S2ToolBox Level 2 Products: LAI, FAPAR, FCOVER. Version 1.1. 2016. Available online: http//step.esa.int/docs/extra/ATBD_S2ToolBox_L2B_V1.1.pdf (accessed on 15 June 2020).
  68. Duro, D.C.; Franklin, S.E.; Dubé, M.G. A comparison of pixel-based and object-based image analysis with selected machine learning algorithms for the classification of agricultural landscapes using SPOT-5 HRG imagery. Remote Sens. Environ. 2012, 118, 259–272. [Google Scholar] [CrossRef]
  69. Duro, D.C.; Franklin, S.E.; Dubé, M.G. Multi-scale object-based image analysis and feature selection of multi-sensor earth observation imagery using random forests. Int. J. Remote Sens. 2012, 33, 4502–4526. [Google Scholar] [CrossRef]
  70. Georganos, S.; Grippa, T.; Vanhuysse, S.; Lennert, M.; Shimoni, M.; Kalogirou, S.; Wolff, E. Less is more: Optimizing classification performance through feature selection in a very-high-resolution remote sensing object-based urban application. GIScience Remote Sens. 2018, 55, 221–242. [Google Scholar] [CrossRef]
  71. Sandera, J.; Stych, P. Change detection work-flow for mapping changes from arable lands to permanent grasslands with advanced boosting methods. Geod. Vestn. 2019, 63, 379–394. [Google Scholar] [CrossRef]
  72. Nielsen, A.A.; Conradsen, K.; Simpson, J.J. Multivariate alteration detection (MAD) and MAF postprocessing in multispectral, bitemporal image data: New approaches to change detection studies. Remote Sens. Environ. 1998, 64, 1–19. [Google Scholar] [CrossRef] [Green Version]
  73. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  74. Thanh Noi, P.; Kappas, M. Comparison of random forest, k-nearest neighbor, and support vector machine classifiers for land cover classification using Sentinel-2 imagery. Sensors 2018, 18, 18. [Google Scholar] [CrossRef] [Green Version]
  75. Melville, B.; Lucieer, A.; Aryal, J. Object-based random forest classification of Landsat ETM+ and WorldView-2 satellite imagery for mapping lowland native grassland communities in Tasmania, Australia. Int. J. Appl. Earth Obs. Geoinf. 2018, 66, 46–55. [Google Scholar] [CrossRef]
  76. Shih, H.; Stow, D.A.; Tsai, Y.H. Guidance on and comparison of machine learning classifiers for Landsat-based land cover and land use mapping. Int. J. Remote Sens. 2019, 40, 1248–1274. [Google Scholar] [CrossRef]
  77. Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
  78. Boonprong, S.; Cao, C.; Chen, W.; Bao, S. Random forest variable importance spectral indices scheme for burnt forest recovery monitoring—Multilevel RF-VIMP. Remote Sens. 2018, 10, 807. [Google Scholar] [CrossRef] [Green Version]
  79. Calle, M.L.; Urrea, V. Letter to the editor: Stability of random forest importance measures. Brief. Bioinform. 2011, 12, 86–89. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  80. Behnamian, A.; Millard, K.; Banks, S.N.; White, L.; Richardson, M.; Pasher, J. A systematic approach for variable selection with random forests: Achieving stable variable importance values. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1988–1992. [Google Scholar] [CrossRef] [Green Version]
  81. Genuer, R.; Poggi, J.-M.; Tuleau-Malot, C. Variable selection using random forests. Pattern Recognit. Lett. 2010, 31, 2225–2236. [Google Scholar] [CrossRef] [Green Version]
  82. Millard, K.; Richardson, M. On the importance of training data sample selection in random forest image classification: A case study in peatland ecosystem mapping. Remote Sens. 2015, 7, 8489–8515. [Google Scholar] [CrossRef] [Green Version]
  83. Diaz-Uriarte, R.; de Andrés, S.A. Variable selection from random forests: Application to gene expression data. arXiv 2005. Available online: https://arxiv.org/pdf/q-bio/0503025.pdf (accessed on 15 June 2020).
  84. Díaz-Uriarte, R.; De Andres, S.A. Gene selection and classification of microarray data using random forest. BMC Bioinform. 2006, 7, 3. [Google Scholar]
  85. Kursa, M.B.; Rudnicki, W.R. Feature Selection with the {Boruta} Package. J. Stat. Softw. 2010, 36, 1–13. [Google Scholar] [CrossRef] [Green Version]
  86. Recursive Feature Elimination. Available online: https://topepo.github.io/caret/recursive-feature-elimination.html (accessed on 15 June 2020).
  87. Možný, M.; Brázdil, R.; Dobrovolny, P.; Trnka, M. Cereal harvest dates in the Czech Republic between 1501 and 2008 as a proxy for March--June temperature reconstruction. Clim. Chang. 2012, 110, 801–821. [Google Scholar] [CrossRef]
  88. Hoy, A.; Feske, N.; Štěpánek, P.; Skalák, P.; Schmitt, A.; Schneider, P. Climatic Changes and Their Relation to Weather Types in a Transboundary Mountainous Region in Central Europe. Sustainability 2018, 10, 2049. [Google Scholar] [CrossRef] [Green Version]
  89. Muller-Wilm, U.; Louis, J.; Richter, R.; Gascon, F.; Niezette, M. Sentinel-2 level 2A prototype processor: Architecture, algorithms and first results. In Proceedings of the ESA Living Planet Symposium, Edinburgh, UK, 9–13 September 2013; pp. 9–13. [Google Scholar]
  90. Main-Knorn, M.; Pflug, B.; Louis, J.; Debaecker, V.; Müller-Wilm, U.; Gascon, F. Sen2Cor for sentinel-2. In Proceedings of the Image and Signal Processing for Remote Sensing XXIII, Warsaw, Poland, 11–13 September 2017; Volume 10427, p. 1042704. [Google Scholar]
  91. Wu, J.S.; Fu, G. Modelling aboveground biomass using MODIS FPAR/LAI data in alpine grasslands of the Northern Tibetan Plateau. Remote Sens. Lett. 2018, 9, 150–159. [Google Scholar] [CrossRef]
  92. Wang, Y.; Xie, D.; Liu, S.; Hu, R.; Li, Y.; Yan, G. Scaling of FAPAR from the Field to the Satellite. Remote Sens. 2016, 8, 310. [Google Scholar] [CrossRef] [Green Version]
  93. Zhang, S.; Chen, H.; Fu, Y.; Niu, H.; Yang, Y.; Zhang, B. Fractional vegetation cover estimation of different vegetation types in the Qaidam Basin. Sustainability 2019, 11, 864. [Google Scholar] [CrossRef] [Green Version]
  94. Tucker, C.J. Red and photographic infrared linear combinations for monitoring vegetation. Remote Sens. Environ. 1979, 8, 127–150. [Google Scholar] [CrossRef] [Green Version]
  95. Gebbers, R.; Ehlert, D.; Adamek, R. Rapid mapping of the leaf area index in agricultural crops. Agron. J. 2011, 103, 1532–1541. [Google Scholar] [CrossRef]
  96. Mišurec, J.; Tomíček, J.; Lukeš, P.; Zemědělství, K.K. Kvantitativní Odhad Vybraných Biofyzikálních Parametrů Porostů Zemědělských Plodin na Podkladě Družicových dat Sentinel 2 a Jeho Využití pro Tvorbu Aplikačních Map Precizního Zemědělství; Praha. 2019. Available online: http://www.gisat.cz/images/upload/8fe71_certifikovana-metodika-kvantitativniho-odhadu-biofyzikalnich-parametru.pdf (accessed on 20 August 2020).
  97. Sellers, P.J. Canopy reflectance, photosynthesis and transpiration. Int. J. Remote Sens. 1985, 6, 1335–1372. [Google Scholar] [CrossRef]
  98. Gitelson, A.A.; Kaufman, Y.J.; Stark, R.; Rundquist, D. Novel algorithms for remote estimation of vegetation fraction. Remote Sens. Environ. 2002, 80, 76–87. [Google Scholar] [CrossRef] [Green Version]
  99. Bréda, N.J.J. Leaf Area Index. In Encyclopedia of Ecology; Jørgensen, S.E., Fath, B., Eds.; Elsevier: Amsterdam, The Netherlands, 2008; pp. 2148–2154. ISBN 978-0-08-045405-4. [Google Scholar]
  100. Campbell, S. GAYLON The Researcher’s Complete Guide to Leaf Area Index (LAI). Available online: https://www.metergroup.com/environment/articles/lp80-pain-free-leaf-area-index-lai/ (accessed on 15 June 2020).
  101. Qiu, F.; Chen, J.M.; Croft, H.; Li, J.; Zhang, Q.; Zhang, Y.; Ju, W. Retrieving leaf chlorophyll content by incorporating variable leaf surface reflectance in the PROSPECT model. Remote Sens. 2019, 11, 1572. [Google Scholar] [CrossRef] [Green Version]
  102. Clevers, J.G.P.W.; Kooistra, L.; Schaepman, M.E. Estimating canopy water content using hyperspectral remote sensing data. Int. J. Appl. Earth Obs. Geoinf. 2010, 12, 119–125. [Google Scholar] [CrossRef]
  103. Cernicharo, J.; Verger, A.; Camacho, F. Empirical and physical estimation of canopy water content from CHRIS/PROBA data. Remote Sens. 2013, 5, 5265–5284. [Google Scholar] [CrossRef] [Green Version]
  104. Tucker, C.J.; Newcomb, W.W.; Los, S.O.; Prince, S.D. Mean and inter-year variation of growing-season normalized difference vegetation index for the Sahel 1981-1989. Int. J. Remote Sens. 1991, 12, 1133–1135. [Google Scholar] [CrossRef]
  105. Dalponte, M.; Ørka, H.O.; Gobakken, T.; Gianelle, D.; Næsset, E. Tree species classification in boreal forests with hyperspectral data. IEEE Trans. Geosci. Remote Sens. 2012, 51, 2632–2645. [Google Scholar] [CrossRef]
  106. Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
  107. Congalton, R.G.; Green, K. Assessing the Accuracy of Remotely Sensed Data: Principles and Practices; CRC Press: Boca Raton, FL, USA, 2008; ISBN 1420055135. [Google Scholar]
  108. Olofsson, P.; Foody, G.M.; Herold, M.; Stehman, S.V.; Woodcock, C.E.; Wulder, M.A. Good practices for estimating area and assessing accuracy of land change. Remote Sens. Environ. 2014, 148, 42–57. [Google Scholar] [CrossRef]
  109. Olofsson, P.; Foody, G.M.; Stehman, S.V.; Woodcock, C.E. Making better use of accuracy data in land change studies: Estimating accuracy and area and quantifying uncertainty using stratified estimation. Remote Sens. Environ. 2013, 129, 122–131. [Google Scholar] [CrossRef]
  110. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2019; Available online: https://www.r-project.org/ (accessed on 20 August 2020).
  111. Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
  112. Friedman, M. The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J. Am. Stat. Assoc. 1937, 32, 675–701. [Google Scholar] [CrossRef]
  113. Friedman, M. A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 1940, 11, 86–92. [Google Scholar] [CrossRef]
  114. Baret, F.; Weiss, M.; Lacaze, R.; Camacho, F.; Makhmara, H.; Pacholcyzk, P.; Smets, B. GEOV1: LAI and FAPAR essential climate variables and FCOVER global time series capitalizing over existing products. Part1: Principles of development and production. Remote Sens. Environ. 2013, 137, 299–309. [Google Scholar] [CrossRef]
  115. Fuster, B.; Sánchez-Zapero, J.; Camacho, F.; Garcia-Santos, V.; Verger, A.; Lacaze, R.; Weiss, M.; Baret, F.; Smets, B. Quality Assessment of PROBA-V LAI, fAPAR and fCOVER Collection 300 m Products of Copernicus Global Land Service. Remote Sens. 2020, 12, 1017. [Google Scholar] [CrossRef] [Green Version]
  116. Zhang, F.; Zhou, G.; Nilsson, C. Remote estimation of the fraction of absorbed photosynthetically active radiation for a maize canopy in Northeast China. J. Plant Ecol. 2015, 8, 429–435. [Google Scholar] [CrossRef] [Green Version]
  117. North, P.R.J. Estimation of fAPAR, LAI, and vegetation fractional cover from ATSR-2 imagery. Remote Sens. Environ. 2002, 80, 114–121. [Google Scholar] [CrossRef]
  118. Li, W.; Weiss, M.; Waldner, F.; Defourny, P.; Demarez, V.; Morin, D.; Hagolle, O.; Baret, F. A generic algorithm to estimate LAI, FAPAR and FCOVER variables from SPOT4_HRVIR and Landsat sensors: Evaluation of the consistency and comparison with ground measurements. Remote Sens. 2015, 7, 15494–15516. [Google Scholar] [CrossRef] [Green Version]
  119. Hu, Q.; Yang, J.; Xu, B.; Huang, J.; Memon, M.S.; Yin, G.; Zeng, Y.; Zhao, J.; Liu, K. Evaluation of Global Decametric-Resolution LAI, FAPAR and FVC Estimates Derived from Sentinel-2 Imagery. Remote Sens. 2020, 12, 912. [Google Scholar] [CrossRef] [Green Version]
  120. Grabska, E.; Hostert, P.; Pflugmacher, D.; Ostapowicz, K. Forest stand species mapping using the Sentinel-2 time series. Remote Sens. 2019, 11, 1197. [Google Scholar] [CrossRef] [Green Version]
Figure 1. Map of the study area.
Figure 1. Map of the study area.
Land 09 00420 g001
Figure 2. Flowchart of the process of change detection from permanent grassland to arable land.
Figure 2. Flowchart of the process of change detection from permanent grassland to arable land.
Land 09 00420 g002
Figure 3. Dispersion of the values of total, processing, and user accuracies of the classification of changes from permanent grassland to arable land after 30 iterations. (A) Overall accuracies of the tested time series; (B) producer’s accuracies of the change class of the tested time series; (C) producer’s accuracies of the no-change class of the tested time series; (D) user’s accuracies of the change class of the tested time series; (E) user’s accuracies of the no-change class of the tested time series.
Figure 3. Dispersion of the values of total, processing, and user accuracies of the classification of changes from permanent grassland to arable land after 30 iterations. (A) Overall accuracies of the tested time series; (B) producer’s accuracies of the change class of the tested time series; (C) producer’s accuracies of the no-change class of the tested time series; (D) user’s accuracies of the change class of the tested time series; (E) user’s accuracies of the no-change class of the tested time series.
Land 09 00420 g003
Figure 4. The course of the values of the metric S for all the tested predictors.
Figure 4. The course of the values of the metric S for all the tested predictors.
Land 09 00420 g004
Figure 5. (A) Spatial arrangement of the changes from permanent grassland to arable land within the soil blocks in the years 2015–2019 for biological predictor FAPAR, (B) Probabilities of changes from permanent grassland to arable land for the biological predictor FAPAR within the soil blocks in the years 2015–2019.
Figure 5. (A) Spatial arrangement of the changes from permanent grassland to arable land within the soil blocks in the years 2015–2019 for biological predictor FAPAR, (B) Probabilities of changes from permanent grassland to arable land for the biological predictor FAPAR within the soil blocks in the years 2015–2019.
Land 09 00420 g005
Table 1. Acquired satellite images.
Table 1. Acquired satellite images.
Image NumberAcquisition Date
113.08.2015
227.08.2016
319.04.2018
416.09.2018
503.06.2019
Table 2. Tested biological predictors.
Table 2. Tested biological predictors.
Biological VariableDefinitionReferences
FAPARfraction of absorbed photosynthetically active radiation[92,97]
FCOVERfraction of vegetation cover[93,98]
LAIleaf area index[99,100]
CABchlorophyll content in the leaf[101]
CWCcanopy water content[102,103]
NDVInormalized difference vegetation index[94,104]
Table 3. Area of the changes from permanent grassland to arable land in the years 2015–2019, including the classes of probability for the biological predictor FAPAR.
Table 3. Area of the changes from permanent grassland to arable land in the years 2015–2019, including the classes of probability for the biological predictor FAPAR.
km2% of the Total Area
Total area3465.6100.0
Probability of change (%)
0–203112.589.8
21–40292.28.4
41–6048.81.4
61–802.80.1
81–1009.30.3
Change/No Change
Change28.00.8
No Change3437.699.2
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Šandera, J.; Štych, P. Selecting Relevant Biological Variables Derived from Sentinel-2 Data for Mapping Changes from Grassland to Arable Land Using Random Forest Classifier. Land 2020, 9, 420. https://doi.org/10.3390/land9110420

AMA Style

Šandera J, Štych P. Selecting Relevant Biological Variables Derived from Sentinel-2 Data for Mapping Changes from Grassland to Arable Land Using Random Forest Classifier. Land. 2020; 9(11):420. https://doi.org/10.3390/land9110420

Chicago/Turabian Style

Šandera, Jiří, and Přemysl Štych. 2020. "Selecting Relevant Biological Variables Derived from Sentinel-2 Data for Mapping Changes from Grassland to Arable Land Using Random Forest Classifier" Land 9, no. 11: 420. https://doi.org/10.3390/land9110420

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop