Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia

Svoboda, Jan; Štych, Přemysl; Laštovička, Josef; Paluba, Daniel; Kobliuk, Natalia

doi:10.3390/rs14051189

Open AccessArticle

Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia

by

Jan Svoboda

,

Přemysl Štych

^*

,

Josef Laštovička

,

Daniel Paluba

and

Natalia Kobliuk

EO4Landscape Research Team, Department of Applied Geoinformatics and Cartography, Faculty of Science, Charles University, 12843 Prague, Czechia

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(5), 1189; https://doi.org/10.3390/rs14051189

Submission received: 30 December 2021 / Revised: 8 February 2022 / Accepted: 17 February 2022 / Published: 28 February 2022

(This article belongs to the Special Issue Remote Sensing Applications in Land Use, Land-Use Change and Forestry (LULUCF))

Download

Browse Figures

Versions Notes

Abstract

:

Land use, land-use change and forestry (LULUCF) is a greenhouse gas inventory sector that evaluates greenhouse gas changes in the atmosphere from land use and land-use change. This study focuses on the development of a Sentinel-2 data classification according to the LULUCF requirements on the cloud-based platform Google Earth Engine (GEE). The methods are tested in selected larger territorial regions (two Czech NUTS 2 units) using data collected in 2018. The Random Forest method was used for classification. In terms of classification accuracy, a combination of these parameters was tested: The Number of Trees (NT), the Variables per Split (VPS) and the Bag Fraction (BF). A total of 450 combinations of different parameters were tested. The highest accuracy classification with an overall accuracy = 89.1% and Cohen’s Kappa = 0.84 had the following combination: NT = 150, VPS = 3 and BF = 0.1. For classification purposes, a mosaic was created using the median method. The resulting mosaic consisted of all Sentinel-2 bands in 10 and 20 m spatial resolution. Altitude values derived from SRTM and NDVI variance values were also included in the classification. These added bands were the most significant in terms of Gini importance.

Keywords:

Google Earth Engine; Random Forest; LULUCF; Sentinel-2; Czechia

Graphical Abstract

1. Introduction

The land cover/land-use change (LCLUC) program is one of the most important sources of information on the development of global environmental change. LCLUC forms the primary source of data for numerous mathematical models that seek to define future development scenarios in many areas of the environment, including climate change [1,2]. The UN Secretariat on Climate Change and the adopted Paris Agreements under the United States Framework Convention on Climate Change (UNFCCC) have declared the LCLUCs monitoring to be highly relevant, as LCLUCs have a significant impact on climate change and the global carbon cycle. For these purposes, the binding regulation is provided for the inventory and reporting of relevant land use classes, so-called LULUCF—land use, land-use change and forestry (see Decision 529/2013/EU, European Commission 2013). LULUCF information is collected and reported on an international scale and is one of the main input data sources for climate change modeling and GHG (greenhouse gas) emission estimates within the IPCC (Intergovernmental Panel on Climate Change).

The development of international agreements on climate and climate policy has been shaping the role of LULUCF. Researchers are increasingly developing sophisticated research strategies to represent the global dimension of land use and assess its impact on climate mitigation [3]. Full LULUCF integration fits well with ongoing international efforts to integrate forests and other aspects into the climate policy framework, e.g., the context of REDD+ (Reduced Emissions from Deforestation and Forest Degradation) [4,5,6]. Standardized methods and accurate and harmonized LULUCF data are a key factor in accounting and evaluating the changes over a long period and modeling climate change with predictive scenarios [7,8]. Earth observation (EO) is an effective and promising tool for monitoring LCLUC [9,10]. The wide use of satellite data is currently possible mainly due to the creation of freely available archives of satellite images from different missions (e.g., Landsat and the Copernicus program). The Copernicus program has brought new possibilities to EO. ESA is launching new satellite missions called Sentinels specifically for the operational needs of the Copernicus program. The Sentinel-2 multispectral optical dataset is now available with the aim to provide data with better resolutions (spatial, temporal and spectral) than traditional data, such as Landsat images. Sentinel-2 data have been available since 2015. The images are received via two parallel missions 2A and 2B and, in the case of the overlapping scenes, the temporal resolution is less than five days [11].

EO has a prospective potential in monitoring LULUCF. In particular, medium-resolution images, such as the 30 m Landsat resolution and Sentinel-2 (i.e., 10 m) resolution, seem to be a suitable source of data for LULUCF [12]. Based on these opportunities, the EU and other international institutions are looking for new LULUCF strategies. The use of large volumes and a wide range of data causes significant difficulties related to the compatibility and harmonization of input data [13,14]. Within LULUCF, the status and development of the area of the following classes are inventoried and reported: Forest Land, Cropland, Grassland, Wetlands, Settlements and Other Land. The definition and harmonization of LCLUC inputs according to defined LULUCF classes are one of the most important tasks within international LULUCF reporting [15].

In the LCLUC classification process, machine learning methods, such as Random Forest (RF), are currently mainly used and developed. Random Forest was firstly described by [16]. This method is widely used in multitemporal LCLUC classification. For example, it was applied in [17,18]. Its essence is the creation of decision trees, where each tree individually evaluates the class to which each individual pixel belongs. The classification of a pixel into a class is assigned within the tree based on input parameters [19,20,21].

LULUCF reporting in Czechia has been exclusively based on the cadastral land use information of the Czech Office for Surveying, Mapping and Cadaster (COSMC; www.cuzk.cz, accessed on 3 September 2021). The Czech land-use representation and the land-use change identification system use COSMC data. COSMC provides the annually updated areas for all land-use categories. In addition, data obtained from the Forest Management Institute (FMI) on forests (harvest, increment, felling, etc.) are used in the LULUCF categories involving forest land. However, according to many studies, e.g., [22,23], cadastral data are not able to reflect all changes in time that occur in the landscape and do not report them fully by the LULUCF classification nomenclature. Thus, the current LULUCF reporting has several weaknesses that affect the quality of the collected data. Moreover, there is no database derived from EO data to meet the LULUCF criteria (annual update, classification nomenclature, minimum mapping unit, etc.). Therefore, this study focuses on the development of an RF-based classification method that allows the classification of Sentinel-2 data according to LULUCF requirements. Multispectral satellite data from the Sentinel-2 mission are used due to their high spatial and temporal resolution. The methodological procedures are developed and implemented on the freely accessible Google Earth Engine (GEE) cloud platform. The methodology and results of the study are in accordance with the LULUCF reporting process and are tested for selected larger territorial units in Czechia using data collected by Sentinel-2 in 2018. From the research point of view, the most important task is to find the most suitable combination of Random Forest classifier input parameters to achieve the highest classification accuracy (Number of Trees, Variables per Split and Bag Fraction). The LULUCF classification is based on a multitemporal approach, which uses several images in the observed vegetation season. The Stratified Random Sampling method [24] is used to evaluate the accuracy of the classification.

This study has the following subresearch objectives:

Development and testing of methods of mosaicking, accurate clouds detection and unmasking for Sentinel-2 data in the GEE.
Creation of the LULUCF classification nomenclature for Czechia with a detailed semantic description and maximum compatibility with LULUCF.
Based on Sentinel-2 data testing RF classification algorithms for LULUCF classification with the classification accuracy of at least 85% (Kappa index value above 0.75) in larger territorial units of Czechia, specifically, two NUTS 2 (NUTS are Nomenclature of Territorial Units for Statistics used in European Union).
Accuracy evaluation for individual LULUCF categories: Forest Land, Cropland, Grassland, Wetlands, Settlements and Other Land.
Discussion on the methodology used, data and achieved results with regard to the needs of LULUCF reporting.
Ultimately, presenting the created methods and outputs in a freely accessible research platform—GEE.

Research questions:

Is it possible to classify large area units with an overall accuracy of more than 85% on an annual basis using machine learning classification algorithms and high spatial and temporal resolution satellite data (Sentinel-2)?
For which of the LULUCF categories a higher accuracy of Sentinel-2 data classification could be achieved and which categories appear to be problematic?
What methods of cloud mosaic and cloud detection/unmasking are most suitable for data processing in the GEE cloud environment?

2. Materials and Methods

2.1. Area of Interest

Two NUTS 2 regions were analyzed for the purposes of this study, namely, Jihovýchod (CZ06) and Střední Morava (CZ07), as shown in Figure 1. The region of interest selection was guided by the criteria of a project, “Developing supports for monitoring and reporting of GHG emissions and removals from land use, land use change and forestry”, from which this study originates (https://www.copernicus-user-uptake.eu/user-uptake/details/developing-support-for-monitoring-and-reporting-of-ghg-emissions-and-removals-from-land-use-land-use-change-and-forestry-73, accessed on 22 September 2021). The total area of the region is up to 23,217 km². The region is very heterogeneous: the lowest point is at the confluence of rivers Morava and Dyje at an elevation of 150 m a. s. l., the highest point is the mountain of Praděd, reaching 1492 m a. s. l. The longest river is Morava, which forms the axis of the region and flows from north to south. The majority of the land is used for agriculture and vineyards in the lowlands in the southern parts. From south to west, east and north, the elevation of the area gradually increases. Forests begin to dominate with increasing altitude. Deciduous forests are found at lower altitudes (at the confluence of Morava and Dyje, Chřiby and Moravský kras) and coniferous forests predominate at higher altitudes (Beskydy, Jeseníky and Vysočina), which are often formed by monocultures of Norway spruce (Picea abies). The biggest cities of the area of interest are Brno (381,346 inhabitants), Olomouc (100,663 inhabitants), Zlín (74,935 inhabitants) and Jihlava (51,216 inhabitants).

2.2. Data

Freely available Sentinel-2 multispectral images from the joint ESA/European Commission Copernicus Mission were used for compositing and classification. The images were acquired in the late spring and early summer periods of 2018 and preprocessed through the Sen2Cor algorithm [11,25]. Therefore, the atmospherically corrected data (L2A) from both Sentinel-2A and Sentinel-2B satellites were used for this research. These data are provided in 10 m spatial resolution (B2 Blue, B3 Green, B4 Red, B8 NIR bands) and 20 m spatial resolution (B5–B7 and B8A Vegetation red edge and B11-B12 SWIR bands) [11,25]. Bands with a resolution of 20 m were resampled to a higher resolution of 10 m using the nearest neighbor method. Sentinel-2 images have a 12-bit radiometric resolution but are provided in a 16-bit radiometric resolution [11], specifically through unsigned integers [19] with values ranging from 0 to 65,535. Classifications were performed in GEE using JavaScript language, where the preprocessed Sentinel-2 Multispectral Instrument Level-2A dataset is available [25,26].

The digital elevation SRTM (The Shuttle Radar Topography Mission) radar data were used for classification. The dataset is provided within the GEE platform with approximately 30 m spatial resolution as an SRTM V3 (void-filled) product. The SRTM band was used as one of the input bands for the classification process.

The Copernicus CLC (Corine Land Cover) 2018 database provided within the GEE platform and the ZM 10 map data (“Základní mapa ČR v měřítku 1:10,000”, Basic map of the Czech Republic at a scale of 1:10,000; WMS from ČÚZK) and LPIS for years 2018 (“Veřejný registr půd”, Public land register available from eAGRI; in shapefile format) were used for the creation of training and validation datasets. Historical orthophotos from 2017, 2018 and 2019 (WMS from ČÚZK) and historical imageries in Google Earth Pro software were used to verify training polygons and validation points. Google Earth Pro software provides imagery with very high spatial resolution—Maxar satellite imagery with up to 0.3 m spatial resolution (from 2015 to 2021) and CNES/Airbus with up to 0.5 m spatial resolution (from 2015 to 2021), users can examine these data using the internal Time Machine plugin.

2.3. Legend

The first basic methodological step was the creation of the classification nomenclature. The classification nomenclature follows the LULUCF regulations [15], which distinguish and report the status and development of areas of the following classes: Forest Land, Cropland, Grassland, Wetlands, Settlements and Other Land. Within the area of interest, the following classes were defined:

Forest Land–vegetation can be considered a forest if it covers an area of at least 0.5 ha [27] and includes woodlands and clearcut localities where there is no forest present, but is expected to grow within the next few decades.

Cropland includes agricultural land and permanent crops, including vineyards, hop fields, gardens and orchards.

Grassland includes both natural and managed grasslands (pastures and meadows).

Settlements in addition to built-up areas also include roads, urban greenery, gardens near houses, landfills and active quarries.

Wetlands include marshlands, bodies of water and watercourses.

Other land mainly includes rocks, subalpine stands of dwarf Norway spruces (Picea abies) and nonnative shrub mountain pines (Pinus mugo) in the higher parts of the Czech mountains, as well as woodland/trees outside forest (ToF), such as groves and alleys, which cannot be considered as a forest according to the LULUCF regulations.

In the initial stage of classification, the Woodland class was created instead of the Forest Land class to highlight all the forested areas. Due to differences between LULUCF classes Forest Land and Other Land, the Woodland class was later divided according to the LULUCF regulations into Forest Land (polygons equal to or greater than 0.5 ha), and the remaining polygons (areas of less than 0.5 ha) were added to the results of the Other Land class during the post-classification process.

Detailed information on LULUCF classes, including their content description, is given in Appendix A.

2.4. Methods

The complete methodological procedure of data processing is shown in Figure 2, which defines the procedures of preprocessing, mosaicking and classification, as well as methods for assessing accuracy and post-classification steps. The following parts describe the individual steps in more detail.

2.4.1. Cloud Masking and Mosaicking

Due to the size of the area of interest, a decision was made to create a mosaic for classification purposes. The mosaic was created by using the full potential of Sentinel-2 data, i.e., using images taken from both Sentinel-2A and Sentinel-2B. All images for mosaic creation were taken in the period from May to the end of July with a total cloud cover below 75% in the whole scene. This period was used mainly because there are only the last remnants of snow cover in the peak parts of the area of interest, and the main vegetation season takes place in the selected months. At the same time, it was the period in which there seemed to be the lowest cloud cover throughout the year 2018. This set of selected images was used in a further step—cloud masking.

For the cloud masking of Sentinel-2 data in GEE, the Sentinel-2: Cloud Probability dataset (so-called s2cloudless) was used [28]. It is constituted of a single band with 20 m spatial resolution that represents the probability of cloudiness (0–100%) for each pixel of all Sentinel-2 tiles in the entire archive. The selection of this approach was inspired by [29], who compared different Landsat 8 and Sentinel-2 cloud masking approaches, and the s2cloudless dataset significantly outperformed other methods. Cloud shadow was detected using an algorithm developed in GEE [30], which is based on cloud projection intersection (defined by the solar azimuth angle obtained in each Sentinel-2 tile metadata) with low-reflectance near-infrared pixels. The next parameter to detect low-reflectance near-infrared pixels as the cloud shadow is a distance from the cloud. After testing and visual inspection, the following parameters were chosen for the used algorithm: cld_prb_thresh (cloud probability, where higher values were considered as clouds) = 40%, nir_drk_thresh (reflectance in the NIR band, where lower values were considered as cloud shadows) = 0.15 and cld_prj_dist (maximum allowed distance in km to search for cloud shadows from cloud edges) = 1 km; erosion 2 pixels (resolution 20 m/pixel) and dilation 5.5 pixels (buffer 3.5 pixels) were applied for the elimination of small features and gaps in clouds and shadows.

Figure 3 illustrates the process of cloud masking. Figure 3a shows the initial step of masked shadows, clouds and the created buffer. Figure 3b shows the final masked area applied to all the parameters. It is evident that not all pixels that are initially identified as clouds or cloud shadow (dark pixels) in Figure 3a were included in the final cloud mask. The final mask does not include objects that were eliminated by erosion, as well as dark pixels that are not within a defined distance and angle from the detected cloud.

At the next step, a median mosaic was created—inspired by [31,32]. All available images with lower than 75% cloud cover were selected. All S-2 bands with a resolution of 10/20 m and the NDVI index (calculated from bands B4 and B8) were used. Only pixels that were identified as cloud free were included in the median calculation. The 75% cloud cover threshold was chosen to avoid data gaps mainly in mountainous areas, where it was difficult to detect pixels not infected by clouds or cloud shadows. If high cloud cover is documented in the metadata of a scene, some areas may not be covered by clouds. This higher threshold made it possible to work with a larger number of images, which resulted in a cloud-free mosaic. The median approach was chosen because it is not as affected by outliers as the average value. Figure 4 shows a graph comparing the quantile value ranges and the average values calculated from the available unmasked values (May to July) for the selected training polygons (ID 331—Cropland; ID 488—Grassland). The mean values of surface reflectance of training polygon 331 in 6 bands of 10 were higher than 75% of the values from which these means were calculated. This was caused by outliers.

The mosaic also includes a band representing the variance of the NDVI values in the period from May to October. This band helps to distinguish relatively invariant surfaces such as buildings (small variance) from surfaces dynamically changing during the season, e.g., arable land, which refers to high variance of the NDVI value.

Figure 5 represents the variance of NDVI in the sample selected area. The map displayed on the left side of the image divides these values into three intervals. Forests and buildings have the lowest variance (see aerial image and ZM 10 in the middle and right map fields), and grasslands have a higher variance (visualized in yellow on ZM 10). Arable land shows the highest NDVI variance.

Another band added to the resulting mosaic was SRTM DEM containing altitude data with a spatial resolution of 30 m. The SRTM band was used together with Landsat 8 multispectral satellite data in [31]. These data were important for distinguishing similar surfaces in terms of land cover, but different land use approaches for LULUCF purposes. Examples are stone and paved surfaces, where it is necessary to distinguish blockfields (Other Land) from paved areas within the Settlements class. The resulting mosaic has a spatial resolution of 10 m. All input data with a resolution lower than 10 m (S-2 bands with a resolution of 20 m and SRTM with a resolution of 30 m) were resampled using the Nearest Neighbor method.

The significance of the bands for classification was recorded using Gini importance for the 4 parameter combinations in Appendix C. As can be seen in Appendix C, the importance of SRTM elevation and NDVI variance is the most significant, whereas the B8 band is of the least importance.

2.4.2. LULUCF Classification

The Random Forest (RF) method was selected, tested and used for classification. This method has been successfully used in the classification of multitemporal satellite data, e.g., [18,31]. Compared to other classification algorithms (CART, SVM, kNN and MLC), this method achieved the best results in many studies [17,32,33,34]. It is a method of controlled nonparametric classification using machine learning. Its essence is the creation of decision trees, where each tree individually evaluates to which class each individual pixel belongs; see [16,34]. The basic parameter is the Number of Trees (NT). Other adjustable classification parameters are the Variables per Split (VPS), Bag Fraction (BF), Max Nodes and Min Leaf Population.

2.4.3. Training Polygons

An important aspect of the resulting classification accuracy is the training data. The training polygons for this study were created by two methods. The first method is the semi-automatic creation of training polygons within the CORINE Land Cover 2018 (CLC 2018) vector layer. The second method was the manual creation of the additional training polygons.

From the CLC 2018 polygon database, the core areas of the training polygons were created using the Buffer function with the following parameter: −100 m. Inside these areas, training polygons of a circle shape with a diameter of 80 m were randomly generated. This can be seen in Figure 6, where one of these training polygons is visualized. These polygons/circles were generated with 2500 m minimal distance.

For some evaluated classes/surfaces, no training polygons were used in the procedure above. It is given by both geometric and thematic characters. For example, in the case of watercourses, this is due to the fact that no polygon in this class has a core area of 100 m inwards. Only one polygon was generated for the other land class, which, however, did not include some important elements of this class, e.g., no training polygon was created on the territory of a photovoltaic power plant. Therefore, 7 training polygons were manually created for the Other Land class, 5 of them were located in rubble fields in the Hrubý Jeseník mountains; in one case, they were rocks in the Suché skály nature reserve, and in another case, they were scrub mountain pines in the alpine vegetation zone of Hrubý Jeseník. Polygons for specific areas of mountain meadows in the Beskydy mountains were collected manually. Training polygons for peatbogs and reeds were added to the Wetlands category. Due to the high heterogeneity of the Cropland class, some polygons were added to cover some specific types of land cover. It was found during preliminary classification testing that some areas within the Cropland class were misclassified as other classes. As a result, some additional training polygons were manually added at these localities. Polygons that were deforested due to droughts and bark-beetle disturbances (Ips typographus) and are currently in the initial stages of forest growth were also created. Cropland and Grassland polygons were also manually added to better differentiate these surfaces. A total of 299 training polygons were created; their distribution within the area of interest can be seen in Figure 7. The number and structure of manually added training polygons are shown in Table 1.

A conversion table of CLC to LULUCF classes was created to systematically determine the LULUCF class. The individual training polygons determined from CLC belong to the LULUCF class, which was decided based on this conversion table, as presented in Appendix B.

All created training polygons from the CLC were verified using an orthophoto to check if their declared land cover matched the LULUCF class. This verification was performed using orthophotos from ČÚZK available for the area of interest. Most of the images of the area were captured in 2018, and only the western parts of the area were missing images from the same year; therefore, images from 2017 and 2019 were used. If the land cover of the checked training polygon did not change in aerial photographs in this time interval (2017–2019), there was no reason to consider the class incorrectly assigned. If the specified orthophoto land cover did not match the declared CLC land cover, the training polygon was deleted or manually adjusted to be within the declared land cover. Therefore, emphasis was placed on the polygon lying in its entirety in one class without interfering with other classes. The affiliation of training polygons to the Grassland class was checked using LPIS data and ZM 10 maps from ČÚZK. This detailed inspection was carried out due to the difficult to distinguish Cropland and Grassland classes using orthophotos.

2.4.4. Parameters of Classification

In the classification process, the most important task was to define a combination of parameter settings that would deliver the highest accuracy. The Number of Trees parameter was tested from 50 to 400 at 25-tree intervals, the Variable per Split parameter was tested from 1 to 6 at 1-variable interval and the Bag Fraction parameter was tested from 0.1 to 0.5 at 0.1-fraction intervals. The other Max Nodes parameters were left with the default value ‘NULL’, that is, without limits, as well as the default value of 1 for the min Leaf Population. A total of 450 combinations of the parameters Number of Trees, Variables per Split and Bag Fraction were generated and evaluated.

Per-pixel classification often brings a ‘salt-and-pepper’ effect. This effect was eliminated by filtering and replacing isolated pixels with neighboring values. At this step, the areas represented by one pixel were eliminated and replaced by the majority value of the pixels in the 3 × 3 kernel window filtering. The point of this step is documented in Figure 8a,b. The main complication is the pixels on the borders between two classes, or in the case of tree growth, trees can cast shadows into their immediate surroundings, which are mostly incorrectly classified as Wetlands. These lonely pixels were filtered. The minimum mapping unit of the classification is 2 pixels, i.e., 200 m².

2.4.5. Accuracy Assessment

The validation points were created in the ESRI ArcGIS Pro software, where the Create Accuracy Assessment Points tool (Spatial Analyst) was used. A total of 2235 points were created (in WGS 84/UTM zone 33N EPSG: 32633 coordinate system), and the Stratified Sampling method based on preliminary classification testing was used for the accuracy assessment and random creation of control points [24]. The affiliation of control points to the LULUCF class was performed in the same way as in the case of training polygons; see Section 2.4.3. Due to the low number of control points generated randomly for the Other Land, 31 points in this category were manually created. For effective validation, an innovative algorithm was created in the cloud-based platform GEE. The control points were uploaded to GEE, where the Classifier package was used to validate the classifications. Specifically, the classifier.confusionMatrix() function was used for confusion matrices and the errorMatrix() function for overall accuracy [35] and the ConfusionMatrix.kappa() function for the Kappa index by [36]. The Kappa index value (Cohen’s Kappa) was calculated for each combination of input parameters. The combinations of input parameters that achieved the highest Kappa index value were selected, and validation matrices and overall accuracy were subsequently generated for them. The combinations of parameters with the highest accuracy were selected as the most suitable for classification.

2.4.6. Post-Processing Classification

According to the definitions of LULUCF, tree growth with an area of less than 0.5 ha cannot be considered as a forest, but as trees outside forest (ToF). For this reason, all Woodland growths with an area of less than 5000 m² (less than 50 pixels) were converted to the Other Land class. Figure 8b the state before the division of the Woodland class into Forest Land and ToF and in Figure 8c the state after the division. Based on this step, the minimum mapping unit for Forest Land differed from the other categories and was 5000 m² (0.5 ha).

3. Results

3.1. Influence of Parameter Selection on the Resulting Accuracy of Classification

One of the main goals of the study was to develop an RF-based classification method that allows the classification of Sentinel-2 data in GEE according to the LULUCF requirements. In terms of classification, the most important task was to find the most suitable combination of RF classifier input parameters that will lead to the highest accuracy of the LULUCF classification. The evaluated parameters were Number of Trees, Variables per Split and Max Nodes. For this purpose, an innovative script was developed in the GEE environment, which can evaluate hundreds of combinations of input parameters in a short time and use the overall accuracy and Kappa index to select the combination with the highest accuracy achieved.

From the achieved results of individual combinations (Appendix E), it was seen that the highest value of the Kappa index (κ) was achieved in the case of a combination of the settings of the parameters NT: 150, VPS: 3 and BF: 0.1 with the value κ = 0.8383. At the same time, this setting of the examined parameters achieved the highest overall accuracy of all combinations, with a value of 89.01%. The results of κ and the overall accuracy for the individual parameter combinations are given in Appendix E.

Upon closer inspection of the settings and relevance of individual parameters, the average values of κ (calculated from testing control points) for individual values of input parameters are documented in Figure 9. In the case of the NT parameter, each average value was calculated from a total of 30 κ values (6 combinations of the VPS parameter and 5 combinations of the Bag Fraction parameter), the Variables per Split parameter was calculated from 75 values and the Bag Fraction was evaluated from 90 κ values. The parameters values used for the final (most accurate) classification are highlighted in red. If we look at the values obtained from individual parameters, then the average value κ of the NT parameter had an ascending character, together with the number of trees, and the highest value of κ was reached at the maximum number of trees (400). However, the increase in the value of κ from 150 was no longer as significant. The value of 150 trees was evaluated as the most suitable in combination with other parameters. For the VPS parameters, the highest average value of κ was reached at the parameter value of 2. A comparatively lower average value was obtained by the value of parameter 3, which was selected for the combination of the final classification. This parameter also had the largest range of the minimum and maximum average values of κ, which may indicate that this parameter has a significant effect on the resulting combination for the most accurate classification. For the Bag Fraction parameter, the highest average value of κ matched the finally selected value of 0.1. Other BF parameter settings evaluated had lower κ values.

Figure 10 describes in detail the average values of κ for the pair of evaluated parameters. The average values of the combination of VPS and NT (VPS/NT) were calculated from 5 values of κ, the combinations of BF and NT (BF/NT) were calculated from 6 values of κ and the combinations of VPS and BF (VPS/BF) were calculated from 15 values.

When evaluating a VPS/NT combination, the VPS parameter values are expressed in rows and NT in columns. The highest values of κ were reached by the combination NT = 275 and VPS = 2 (NT = 150, VPS = 2 are the values used in the final combination of three parameters for classification). The values of κ did not change much in the rows, unlike the evaluated NT parameters in the columns. Therefore, the average value of κ is more affected by the VPS parameter than the number of trees (especially obvious when setting the number of trees above 50). The high relevance of the VPS parameter is also evidenced by the evaluation of the VPS/BF parameter combination; in this case, the VPS influence appears even stronger. The combination with the highest value of κ (BF = 1, VPS = 3) was the same, which was selected as the most suitable in the combination of all three parameters. When comparing these two parameters, the largest difference between the maximum and minimum average values of κ was evident. On the contrary, the smallest differences in the variability of κ values could be seen by comparing combinations of BF/NT parameters. The values of κ were found in the relatively narrow range of 0.811–0.820. BF = 0.1 and NT = 150 are the values contained in the resulting combination of three parameters.

3.2. Accuracy Assessment of the Classification

In addition to the calculation of κ, the values of overall accuracy (OA) and validation matrices were calculated for a detailed evaluation of the accuracy of the classification. The validation matrix for the resulting combination of classification parameters (NT: 150, VPS: 3 and BF: 0.1) is documented in Table 2. The overall accuracy reached 89.01%, and Cohen’s Kappa was 0.8383 for this combination. When a closer inspection of the producer accuracy and user accuracy values was performed, Settlements were most often misclassified as the Cropland class. On the contrary, the Other Land class was most often misclassified as Settlements. More validation points of Settlements were classified as another class than validation points of other classes classified as Settlements. It follows that the Settlements class should have a slightly undervalued classified area. Cropland was most often misclassified as Grassland and, conversely, Grassland was most often misclassified as Cropland. Changing the Grassland class to the Cropland class with an amount of 54 points was the most common change in the classification and accounted for about one-fifth of all errors (2.24% of all validation points). The Cropland class appeared to be slightly overvalued (especially at the expense of Grassland). The Woodland class was most often misclassified as Grassland and vice versa. This could be caused by grass-like clearings that formed in the forests after the trees were logged. This type of forest could be observed within the area of interest due to bark beetle calamities and droughts consequences. The Wetlands control points were most often misclassified as the Cropland class. The main cause of this phenomenon could be seen in the location of control points in places where there are swamps, reeds and peat bogs, i.e., wetlands with a more substantial representation of the vegetation component. When comparing user accuracy and producer accuracy, the Wetlands class appeared to be slightly underestimated. The Other Land class seemed to be clearly underestimated, as less than half of the points belonging to this class were correctly classified (confusion mainly with Settlements). On the other hand, no control point from any other class was classified as Other Land.

The LULUCF classification was performed in the area of interest with a heterogeneous character both from a physical–geographical and socio-economic point of view. The heterogeneous character of the area was determined by the diverse representation of individual LULUCF classes. From the results documented in Table 3 and Figure 11, it is clear that the Cropland class had the highest area in 2018 with more than 42% of the total area. Grassland class occupied over 15%. The total area of agricultural land, i.e., Cropland and Grassland, accounted for 58%. Around 36% of the area of interest was classified as Forest Land, which is close to the average forest area in the Czech Republic (34%). The Settlements area was close to 5%. The remaining two categories did not exceed 1%, the Wetlands class occupied 0.77% and the Other Land occupied 0.96% of the area.

Figure 10 shows the spatial distribution of the classes examined. Large areas of the Cropland class occur in the most fertile areas south of Brno and in the vicinity of Olomouc. Forest growth occurs mainly in less fertile mountain regions on the eastern and northern edges of the area of interest, also in highland areas around Brno (Drahanská vrchovina and Chřiby). The Grassland class dominates in less favored areas for agriculture (LFA) in mountain and foothill areas, especially in the Beskydy, Hrubý Jeseník and Nízký Jeseník. The heterogeneous composition of the land cover can be seen in the hills in the western part called Vysočina (around the towns of Jihlava, Pelhřimov, Havlíčkův Brod and Žďár nad Sázavou). It is the most heterogeneous area within the area of interest. Significant water surfaces are located around Novomlýnské reservoirs and near Tovačov, where there are large anthropogenic lakes created after sand mining.

In addition to the LULUCF class map for the entire evaluated area, scale-detailed maps of 3 selected areas are provided in Appendix D. The first area documents LULUCF between the Pilská reservoir and the village of Polnička. The local landscape is a composite of individual monitored classes. To the south of the village Polnička there is a quarry, which was successfully classified as Settlements according to the LULUCF nomenclature. In the middle row, the documented area has been strongly anthropogenically affected. There is a D1 highway with widely developed road infrastructure and large construction complexes. Both surfaces were well classified as Settlements. The sparsely built-up areas of the villages of Pávov and Nový Pávov (in the west) were also well classified. The bottom row shows the area inside the Libavá military district, which has been severely affected by droughts and bark beetle disturbances. In the case of this area, certain weaknesses of the classification algorithm were documented, where some pixels without vegetation were incorrectly classified as Settlements. On the other hand, most of the affected forest area was successfully classified as Forest Land.

4. Discussion

The main aim of this study was to develop an RF-based classification method that allows the classification of Sentinel-2 data according to LULUCF requirements. Multispectral satellite data from the Sentinel-2 mission were used for their advanced spatial and temporal resolution. Methodological procedures were developed and implemented in the freely accessible cloud platform GEE. The methodology was tested, and the results were validated for a larger region (two NUTS 2 units with a total area of 23,217 km²) in Czechia using data collected in 2018. A motivation of this study was to prove a high relevance and perspective of EO data from the Copernicus program for LULUCF reporting. LULUCF reporting in Czechia has been exclusively based on cadastral land use information, which has several weaknesses that affect the quality of the collected data [22,23]. The compatibility of the cadaster land use with defined LULUCF classes is also problematic. For example, the Other Land class in the cadastral register includes highly diverse surfaces that should be divided into individual LULUCF classes.

From the research point of view, the most important task was to test the Random Forest classifier for the purpose of LULUCF classification. Although LULUCF classes often include very different surfaces in terms of spectral reflectance, the positive finding is that the use of an RF classifier did not require these surfaces to be classified as separate classes and subsequently aggregated. Thus, quite different surfaces within one LULUCF class entered the classification, such as orchards and arable land as the Cropland class, or built-up areas and roads as the Settlements class. The Random Forest classifier was able to correctly classify these surfaces with relatively high accuracy. However, the basic condition was the quality of training. A semi-automatic method was developed for the creation of training polygons using the CLC 2018 database to provide high-quality training of the monitored classes. However, for minority classes and specific surfaces (approximately less than 2% of the total area), this method was not able to fully ensure a sufficient number of training polygons. For such classes, it was necessary to create training polygons manually.

A relevant research aspect was to test and find the most suitable combination of Random Forest classifier input parameters (Number of Trees, Variables per Split and Bag Fraction) to achieve the highest classification accuracy. For this purpose, an algorithm was created in GEE, which allows a large number of combinations of input parameters to be tested and then the one that achieves the highest accuracy to be selected. From the achieved results of individual combinations accuracy, the highest value of the Kappa index and the overall accuracy were achieved in the case of the use of the following combination of parameters NT: 150, VPS: 3 and BF: 0.1 with the value κ: 0.8383 and the overall accuracy of 89.01%. Comparing the obtained results, where GEE was used for RF classifier implementation, in [31,37,38], the number of trees 100 and other parameter values were used as the default in GEE (VPS is default defined as the square root of the number of bands of an image; BF is 0.5). Such a setting would lead to the result κ: 0.8206 and OA: 87.87% in this study. The RF classifier was also used to map Cropland in Southeast Asia using GEE [39], where the method defined a value of 300 as the most appropriate NT and left the default value for other attributes. If this setting was used in this work, the result would be κ: 0.82 and OA: 87.79%.

The LULUCF classification used in this study is based on a multitemporal approach that uses several images during the observed vegetation season. Due to the large extent of the area of interest and multitemporal approach, it was decided to create a mosaic for classification purposes. The mosaic was created using the full potential of Sentinel-2 data; all images taken in the period from May to the end of July with a total cloud cover below 75% were used to create the mosaic. The Sentinel-2: Cloud Probability dataset (so-called s2cloudless) was used for the cloud masking of Sentinel-2 data in GEE [29]). However, according to the results of experiments within this study, masking clouds shadows remain a complicated and poorly elaborated task in the s2cloudless dataset. For this reason, another algorithm in GEE was used [30]. Shadows are defined by cloud projection intersection with low-reflectance near-infrared (NIR) pixels. The median method was chosen to create the resulting mosaic. For each pixel, a median of the cloudless values was determined as the final value. This method was proven to be suitable for such a large area with very heterogeneous conditions prevalent throughout the year. In the case of the median, it is advantageous to eliminate the influence of the resulting mosaic by outliers caused by noise or due to imperfections of cloud correction. This method allows the elimination of time-limited/exceptional surface conditions. For example, rapeseed (Brassica napus) has a highly different reflectance during the full flowering period compared to before or after flowering. This flowering phenological phase negatively influences the classification of the broadly defined LULUCF class Cropland. The choice of suitable tiling methods was also tested by [32], who evaluated the effect of using the tiling method on the resulting overall accuracy value. The mosaic created by averaging the available surface reflectance values was classified by 88% accuracy and the median mosaic had an accuracy of less than 86%. However, it should be noted that it dealt with the classification of three less large areas (25 km²) in Calabria (southern Italy) with lower heterogeneity and minimal cloud cover in the summer months. Another study [38] created a classified mosaic for each Landsat 8 and 7 spectral band as the 75th percentile of six values representing the average reflectance over each of six two-month periods (January–February, March–April, …) during a year. In [40], a classified mosaic was created using a minimum value for each month to ensure that the resulting values were not affected by clouds. This approach does not seem to be the best solution after the experiences in this study, because the recorded minimum values were often hit by a cloud shadow.

In addition to the selection of the specific mosaicking method, the decision on the length of the time interval from which images will be used remains an important research question. In this work, the interval of 3 months from May to the end of July was used. Based on the testing, this interval was chosen because it was a period that covers significant vegetation/phenological phases and from which it was possible to create a mosaic based on relevant values for the entire area of interest. The selection of time period is also dependent on weather conditions, especially cloudiness, which could be various for different years. For this reason, it is not possible to recommend a standard optimal time period. The 3-month median mosaic was also used in [41] to map cropland using the Random Forest in China. The mosaic was created together with images from Landsat 8 and 7. In [39], cropland in Southeast Asia was analyzed using a time interval of 4 months for the mosaic.

The input of altitude values from DEM (SRTM) in the classification proved to be highly useful. According to the calculated Gini importance, it was the most important band in the classification (Appendix C). This band also reached the highest significance in the case of the study [31], which classified the territory of Mongolia using Landsat 8 and SRTM data. Another significant input was the calculated NDVI variance band for the period from May to October. The choice of this input information was inspired by several previous works, e.g., in [38], arable land mapping (corresponding to the Cropland class) was performed using Random Forest. When classifying, he used the standard deviation of NDVI from the observed three years. In terms of Gini importance determination, the standard deviation of NDVI achieved the fifth highest relevance from the twelve compared. The elevation, slope, range of NDVI and minimum of NDVI had higher relevancy in [38]. In the case of this work, the NDVI variance band was second in terms of importance.

From the point of view of the data used, the Sentinel-2 data appear to be a promising data source for LULUCF purposes. This is evidenced by several studies that have successfully used Sentinel-2 data to classify similar LULUCF classes [12]. This study confirmed the high relevance of modern classification methods based on machine learning (specifically RF) and the high perspective in the use of cloud-based technologies. The GEE environment is undergoing dynamic development due to its free availability for noncommercial use and its wide range of data and useful algorithms.

Considering the shortcomings of the data and methods used in this work with regard to the LULUCF regulations, an important aspect is the use of satellite data, which reflect land cover classification rather than information on land use. However, the LULUCF methodology is based more on a land use approach. From this point of view, the following research steps should focus on the possibilities of combining satellite data (Sentinel-2, Planet.com) and cadastral data with the maximum use of the advantages of these data sources for the purposes of accurate and time-compatible reporting. The LULUCF classification method, which would be based on standardized data (Copernicus data), could deliver comparable (harmonized) LULUCF results for use in the IPCC and could be applicable in many countries around the world. To evaluate the accuracy and applicability of the proposed method, it is necessary in the future to focus on the evaluation of LULUCF changes over several years. Classification inaccuracies may be more noticeable in change detection between two time periods. Moreover, it would be useful to use alternative methods for accuracy assessment, e.g., Mapcurves GOF for categorical variables. The created algorithm is placed and open in Github to be freely used and developed: https://github.com/hawk919/LULUCF-GEE-classification/blob/main/CODE.

5. Conclusions

LULUCF is a greenhouse gas inventory sector that reports the extent of and changes in the following classes: Settlements, Cropland, Forest Land, Wetlands and Other Land. Due to the international scope, different methodologies with different data are used. LULUCF data from Czechia are reported based on cadastral data, which have limited abilities to detect land-use changes [22,23]. On the other hand, EO data and methods have been significantly developed in recent periods, especially due to important programs and missions, i.e., Copernicus or Landsat. For this reason, the main aim of this study was to use and test the Sentinel-2 data for the purpose of LULUCF in the two selected NUTS 2 regions in Czechia in 2018. The methodological workflow was implemented in the freely accessible platform GEE. From the research point of view, the most important task was to find the most suitable combination of Random Forest classifier input parameters (Number of Trees, Variables per Split and Bag Fraction) to achieve the highest classification accuracy. The classification with the highest accuracy (the overall accuracy of 89.1% and Cohen’s Kappa of 0.84) was achieved with the following combination of parameters: NT = 150, VPS = 3 and BF = 0.1. It was proven that the parameter VPS was of the greatest importance for the accuracy of the classification. To select and evaluate the relevance of parameters, an innovative algorithm was developed in GEE, which enabled the simultaneous execution and evaluation of 450 classifications at once. This innovation is probably the biggest benefit of this work, because this method allows the users to operatively evaluate and select suitable classification parameters for the area of interest in a short time.

Due to the large extent of the area of interest and multitemporal (multiple images) approach, it was decided to create a mosaic for classification purposes. The mosaic was created to exploit the full potential of Sentinel-2 data. For this reason, all images taken in the period from May to the end of July with cloud cover lower than 75% were used. The median method was chosen to create the resulting mosaic. This method was proven to be suitable for such a large area with very heterogeneous conditions prevalent during the year. Moreover, altitude values derived from SRTM and NDVI were added in the mosaic and used in the classification. The input of altitude values was highly useful, and according to the calculated Gini importance, it was the most important band in the classification (similar to the study [31]). The NDVI variance values were the second most significant in terms of Gini importance.

Another goal was to create a LULUCF classification nomenclature for Czechia with a detailed semantic description and maximum compatibility with LULUCF. To ensure compatibility with the LULUCF approach, the Woodland class was originally created and subsequently divided into Forest Land (areas greater than 0.5 ha) and Other Land (trees outside the forest area of less than 0.5 ha). The results of the classification show that the most dominant LULUCF classes within the area of interest in 2018 were Cropland and Woodland/Forest Land. The highest classification accuracy of over 90% was achieved for the classes Cropland and Woodland. In terms of user accuracy, the Settlements (66.05%) and the Other Land (43.75%) were the most problematic classes.

The developed method is based on the classification of Sentinel-2 data using the Random Forest classifier in the cloud-based platform GEE. This approach seems to be very promising for the systematic implementation of EO data in LULUCF. The Sentinel-2 data from Copernicus programme appear to be a relevant data source for LULUCF purposes. The following research steps should focus on the possibilities of combining satellite data (Sentinel-2) and cadastral data with the maximum exploitation of the advantages of individual data sources for the purposes of time-compatible LULUCF reporting. For this reason, it is necessary to focus on the evaluation and validation of LULUCF within a longer period and assess the accuracy of the changes. The cloud-based classification method, which would be using the standardized data (Copernicus data) and would be applicable in many countries around the world, could bring significant progress in the use of EO data in LULUCF. A closer dialogue between stakeholders/end-users and EO experts is the next important step for that goal.

Author Contributions

Conceptualization, J.S. and P.Š.; methodology, J.S., P.Š., J.L. and D.P.; software, J.S. and D.P.; validation, J.L. and J.S.; formal analysis, J.S. and P.Š.; investigation, J.S. and J.L.; resources, J.L.; data curation, J.S.; writing—original draft preparation, J.S. and P.Š.; writing—review and editing, P.Š. and N.K.; visualization, J.S.; supervision, P.Š.; project administration, J.L. and P.Š.; funding acquisition, P.Š. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the European Union’s Caroline Herschel Framework Partnership Agreement on Copernicus User Uptake under grant agreement no. FPA 275/G/GRO/COPE/17/10042, project FPCUP (Framework Partnership Agreement on Copernicus User Uptake), Action 2019-2-49 “Developing supports for monitoring and reporting of GHG emissions and removals from land use, land use change and forestry” (219/SI2.818795/07 (CLIMA)).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The created algorithm can be found at the following link: https://github.com/hawk919/LULUCF-GEE-classification/blob/main/CODE.

Acknowledgments

We would like to thank the European Union’s Caroline Herschel Framework Partnership Agreement on Copernicus User Uptake under grant agreement no. FPA 275/G/GRO/COPE/17/10042, project FPCUP (Framework Partnership Agreement on Copernicus User Uptake), Action 2019-2-49 “Developing supports for monitoring and reporting of GHG emissions and removals from land use, land use change and forestry” (219/SI2.818795/07 (CLIMA)) for their support. We would like to thank Julia Röhrig from DLR and Arslan Ali Nadir from FMI for the coordination of the project FPCUP (CLIMA) and Tatsiana Danilchyk for her help with the script. Finally, we would like to thank to reviewers for their useful comments and IFER—Institute of Forest Ecosystem Research Ltd. for useful consultations.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. LULUCF Classification Nomenclature and Description of Categories

Forest Land

Forest Land is an area that should be covered with woody vegetation, and the tree canopy cover should be at least 10%. The area should exceed 0.5 ha and contain trees able to reach a minimum height of 5 m at maturity. It also includes systems with woody vegetation that currently fall below, but are expected to exceed, the threshold for the Forest Land category. These areas normally form part of the forest area which is temporarily unstocked as a result of human intervention, such as harvesting or natural causes, but which is expected to revert to forest (Figure A1a–d) [15,27].

Figure A1. (a) coniferous forest, (b) deciduous forest, (c) logged area and (d) recovery forest area (source: authors).

2.: Cropland

This category includes managed land used for growing temporary (Figure A2a–c) and permanent crops (Figure A2d). It also includes arable land that was left for one or several years before being cultivated again or is temporarily used for grazing. Permanent crops include trees and shrubs that produce fruits, such as orchards and vineyards [15].

Figure A2. (a) plowed arable land, (b) ripe wheat, (c) different crops and (d) vineyard (source: authors).

3.: Grassland

This category includes meadow (Figure A3a,c) and pasture land (Figure A3b) that is not considered cropland. Grasslands generally have vegetation dominated by permanent grasses. This category includes all grasslands from wild lands (Figure A3d) and agricultural to recreational. This category includes natural grasses (steppe vegetation, grasslands above tree line) and sparse vegetation (below tree crown cover of at least 10%) [15,27].

Figure A3. (a) late summer meadow, (b) pasture with cows, (c) spring meadow and (d) natural mountain grassland (source: authors).

4.: Wetlands

This category includes land that is covered or saturated by water for the whole or part of the year (e.g., peatland and reeds-Figure A4d) and that does not fall into the Forest Land, Cropland, Grassland or Settlements category. It includes managed reservoirs, e.g., ponds or valley reservoir (Figure A4a,b) and unmanaged natural rivers (Figure A4c) and lakes [15].

Figure A4. (a) fish pond, (b) water reservoir, (c) river and (d) marshes (source: authors).

5.: Settlements

This category includes all developed land, including transportation infrastructure (Figure A5c) and human settlements of any size (Figure A5a,b). The important aspect of settlements is the terrestrial components of developed land that are managed and may influence CO₂ fluxes between the atmosphere and terrestrial carbon pools. In this context, the category Settlements” includes all classes of urban tree formations, namely, trees grown along streets, in public and private gardens and in different kinds of parks, provided that such trees are functionally or administratively associated with cities or villages. Landfills, open pit mines (Figure A5d) and sludge ponds also fall into this category [15].

Figure A5. (a) Dense urban fabric, (b) discontinuous urban fabric, (c) highway and (d) sand mine (source: authors).

6.: Other Land

This category includes bare soil, rock (Figure A6c,d), sand, ice and all unmanaged land areas that do not fall into any of the other five categories. Simultaneously, this class also includes uncultivated stands of trees (Figure A6a) or shrubs that do not meet the conditions set for the forest (area less than 0.5 ha). These are mainly bosques, unmanaged bushes, etc. Scrub mountain pines in the top parts of the mountains (Figure A6b) are also included, as they do not exceed a height of 5 m [15,27].

Figure A6. (a) Trees outside forest, (b) mountain woodland (Pinus mugo, Picea abies), (c) mountain woodland (Picea abies) with rocks and (d) blockfield (source: authors).

Appendix B. CORINE and LULUCF Categorizations

Table A1. CORINE Land Cover categories and their equivalent LULUCF: categories in the case study.

CLC3	CLC Name	LULUCF
1.1.1	Continuous urban fabric	Settlements
1.1.2	Discontinuous urban fabric	Settlements
1.2.1	Industrial or commercial units and public facilities	Settlements
1.2.2	Road and rail networks and associated land	Settlements
1.2.4	Airports	Settlements
1.3.1	Mineral extraction sites	Settlements
1.3.2	Dump sites	Settlements
1.3.3	Construction sites	Settlements
1.4.1	Green urban areas	Settlements
1.4.2	Sport and leisure facilities	Settlements
2.1.1	Non-irrigated arable land	Cropland
2.2.1	Vineyards	Cropland
2.2.2	Fruit tree and berry plantations	Cropland
2.3.1	Pastures	Grassland
2.4.2	Complex cultivation patterns	Cropland/Grassland
2.4.3	Land principally occupied by agriculture, with significant areas of natural vegetation	Cropland/Grassland/Forest Land
3.1.1	Broad-leaved forest	Forest Land
3.1.2	Coniferous forest	Forest Land
3.1.3	Mixed forest	Forest Land
3.2.1	Natural grassland	Grassland
3.2.2	Moors and heathland	Grassland/Other Land
3.2.4	Transitional woodland/shrub	Forest Land
3.3.3	Sparsely vegetated areas	Other Land /Grassland
4.1.1	Inland marshes	Grassland/Wetlands
4.1.2	Peatbogs	Forest Land/Wetlands
5.1.1	Water courses	Wetlands
5.1.2	Water bodies	Wetlands

Source: [42].

Appendix C. Gini Importance

In this Table A2, the Gini indices of the individual bands and their order in relation to the other bands are evaluated for four different combinations of input parameters. The bands are ordered by the average rank of importance from all four combinations.

Table A2. Gini importance of selected combinations of input parameters.

Band	NT: 100, VPS: 1, BF: 0.1		NT: 200, VPS: 2, BF: 0.2		NT: 300, VPS: 3, BF: 0.3		NT: 400, VPS: 4, BF: 0.4
Band	Gini Importance	Rank of Importance	Gini Importance	Rank of Importance	Gini Importance	Rank of Importance	Gini Importance	Rank of Importance
SRTM elevation	220.4	2.	544.8	1.	956.1	1.	1318.9	1.
NDVI variance	238.9	1.	452.3	2.	623.5	2.	736.2	2.
B11	180.9	5.	368.2	3.	526.9	3.	681.9	3.
B2	191.2	3.	321.1	5.	442.6	4.	545.4	5.
B6	172.8	8.	330.3	4.	438.3	5.	516.2	8.
B12	167.5	11.	317.7	6.	431.6	6.	550.4	4.
B8A	169.4	10.	311.1	8.	431.5	7.	528.4	6.
B5	170.2	9.	313.2	7.	427.3	8.	505.6	9.
NDVI	175.7	7.	305.2	10.	418.2	9.	516.5	7.
B4	185.3	4.	308.4	9.	402.8	11.	470.6	11.
B3	180.5	6.	304.9	11.	407.5	10.	480.3	10.
B7	155.0	13.	278.2	12.	375.6	12.	456.7	12.
B8	164.3	12.	276.9	13.	369.1	13.	449.3	13.

Appendix D. LULUCF Classification in the Selected Localities

Figure A7. LULUCF classification in the selected localities: Pilská Reservoir and Polnička Village (the first row); Pávov and Nový Pávov (the second Row); and Libavá Military District (the third row).

Appendix E

Appendix E is available at https://drive.google.com/file/d/12Uic7KqExjBV-IsucF2FwBcItAFfvLXc/view?usp=sharing.

References

Koomen, E.; Stillwell, J.; Bakema, A.; Scholten, H.J. (Eds.) Modelling Land-Use Change the GeoJournal Library; Springer: Dordrecht, The Netherlands, 2007; ISBN 978-1-4020-6484-5. [Google Scholar]
Verburg, P.H.; van Berkel, D.B.; van Doorn, A.M.; van Eupen, M.; van den Heiligenberg, H.A.R.M. Trajectories of land use change in Europe: A model-based exploration of rural futures. Landsc. Ecol. 2010, 25, 217–232. [Google Scholar] [CrossRef]
Michetti, M. Modelling Land Use, Land-Use Change, and Forestry in Climate Change: A Review of Major Approaches. SSRN Electron. J. 2012, 46, 1–57. [Google Scholar] [CrossRef] [Green Version]
Meyfroidt, P.; Lambin, E.F. Global Forest Transition: Prospects for an End to Deforestation. Annu. Rev. Environ. Resour. 2011, 36, 343–373. [Google Scholar] [CrossRef]
Ellison, D.; Lundblad, M.; Petersson, H. Reforming the EU approach to LULUCF and the climate policy framework. Environ. Sci. Policy 2014, 40, 1–15. [Google Scholar] [CrossRef] [Green Version]
Nielsen, T.D. From REDD+ forests to green landscapes? Analyzing the emerging integrated landscape approach discourse in the UNFCCC. For. Policy Econ. 2016, 73, 177–184. [Google Scholar] [CrossRef]
Latta, G.S.; Baker, J.S.; Ohrel, S. A Land Use and Resource Allocation (LURA) modeling system for projecting localized forest CO₂ effects of alternative macroeconomic futures. For. Policy Econ. 2018, 87, 35–48. [Google Scholar] [CrossRef] [PubMed]
Liu, S.; Li, Y.; Gao, Q.; Wan, Y.; Ma, X.; Qin, X. Analysis of LULUCF accounting rules after 2012. Adv. Clim. Chang. Res. 2011, 2, 178–186. [Google Scholar] [CrossRef]
Alcantara, C.; Kuemmerle, T.; Prishchepov, A.V.; Radeloff, V.C. Mapping abandoned agriculture with multi-temporal MODIS satellite data. Remote Sens. Environ. 2012, 124, 334–347. [Google Scholar] [CrossRef]
Hansen, M.C.; Stehman, S.V.; Potapov, P.V. Quantification of global gross forest cover loss. Proc. Natl. Acad. Sci. USA 2010, 107, 8650–8655. [Google Scholar] [CrossRef] [Green Version]
Lastovicka, J.; Svec, P.; Paluba, D.; Kobliuk, N.; Svoboda, J.; Hladky, R.; Stych, P. Sentinel-2 data in an evaluation of the impact of the disturbances on forest vegetation. Remote Sens. 2020, 12, 1914. [Google Scholar] [CrossRef]
Lewinski, S.; Malinowski, R.; Rybicki, M.; Gromny, E.; Nowakowski, A.; Jenerowicz, M.; Krupiński, M.; Krupiński, M.; Krätzschmar, E.; Guenther, S. Automatic Land Cover Classification of Europe with Sentinel-2 Imagery. 2019. Poster. In Proceedings of the Living Planet Symposium, MiCo–Milano Congressi, Milan, Italy, 13–17 May 2019. [Google Scholar] [CrossRef]
Herold, M.; di Gregorio, A. Evaluating land-cover legends using the UN land-cover classification system. In Remote Sensing of Land Use and Land Cover: Principles and Applications; CRC Press: Boca Raton, FL, USA, 2012. [Google Scholar]
Defourny, P.; Mayaux, P.; Herold, M.; Bontemps, S. Global Land-Cover Map Validation Experiences: Toward the Characterization of Quantitative Uncertainty; Giri, C., Ed.; Remote Sensing of Land Use and Land Cover. Priciples and Applications; JRC73563; Taylor and Francis: Abingdon, UK, 2012; pp. 207–223. [Google Scholar]
Penman, J.; Gytarsky, M.; Hiraishi, T.; Krug, T.; Kruger, D.; Pipatti, R.; Buendia, L.; Miwa, K.; Ngara, T.; Tanabe, K.; et al. Good Practice Guidance for Land Use, Land-Use Change and Forestry; Institute for Global Environmental Strategies (IGES) for the IPCC: Kanagawa, Japan, 2003; 590p. [Google Scholar]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14 August 1995; pp. 278–282. [Google Scholar]
Rodriguez-Galiano, V.F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J.P. An assessment of the effectiveness of a random forest classifier for land-cover classification. ISPRS J. Photogramm. Remote Sens. 2012, 67, 93–104. [Google Scholar] [CrossRef]
Jin, Y.; Liu, X.; Chen, Y.; Liang, X. Land-cover mapping using Random Forest classification and incorporating NDVI time-series and texture: A case study of central Shandong. Int. J. Remote Sens. 2018, 39, 8703–8723. [Google Scholar] [CrossRef]
Mellor, A.; Haywood, A.; Stone, C.; Jones, S. The performance of random forests in an operational setting for large area sclerophyll forest classification. Remote Sens. 2013, 5, 2838–2856. [Google Scholar] [CrossRef] [Green Version]
Belgiu, M.; Drăguţ, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 2016, 114, 24–31. [Google Scholar] [CrossRef]
Šandera, J.; Štych, P. Selecting relevant biological variables derived from Sentinel-2 data for mapping changes from grassland to arable land using random forest classifier. Land 2020, 9, 420. [Google Scholar] [CrossRef]
Micek, O.; Feranec, J.; Stych, P. Land use/land cover data of the urban atlas and the cadastre of real estate: An evaluation study in the Prague metropolitan region. Land 2020, 9, 153. [Google Scholar] [CrossRef]
Pazúr, R.; Feranec, J.; Štych, P.; Kopecká, M.; Holman, L. Changes of urbanised landscape identified and assessed by the urban atlas data: Case study of Prague and Bratislava. Land Use Policy 2017, 61, 135–146. [Google Scholar] [CrossRef]
Manakos, I.; Tomaszewska, M.; Gkinis, I.; Brovkina, O.; Filchev, L.; Genc, L.; Gitas, I.Z.; Halabuk, A.; Inalpulat, M.; Irimescu, A.; et al. Comparison of global and continental land cover products for selected study areas in South Central and Eastern European Region. Remote Sens. 2018, 10, 1967. [Google Scholar] [CrossRef] [Green Version]
Google Earth Engine. Sentinel-2 MSI: MultiSpectral Instrument, Level-2A. Available online: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR (accessed on 16 February 2022).
Gorelick, N.; Hancher, M.; Dixon, M.; Ilyushchenko, S.; Thau, D.; Moore, R. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 2017, 202, 18–27. [Google Scholar] [CrossRef]
Kučera, M. Kategorie pozemků v Národní Inventarizaci LESŮ České Republiky. Acta Univ. Agric. Silvic. Mendel. Brun. 2010, 58, 223–232. [Google Scholar] [CrossRef] [Green Version]
Sentinel Hub. Sentinel Hub’s Cloud Detector for Sentinel-2 Imagery. Available online: https://medium.com/sentinel-hub/cloud-masks-at-your-service-6e5b2cb2ce8a (accessed on 16 February 2022).
López-Puigdollers, D.; Mateo-García, G.; Gómez-Chova, L. Benchmarking deep learning models for cloud detection in Landsat-8 and Sentinel-2 images. Remote Sens. 2021, 13, 992. [Google Scholar] [CrossRef]
Google Earth Engine. Sentinel-2 Cloud Masking with s2cloudles. Available online: https://developers.google.com/earth-engine/tutorials/community/sentinel-2-s2cloudless (accessed on 16 February 2022).
Noi Phan, T.; Kuch, V.; Lehnert, L.W. Land cover classification using Google Earth Engine and random forest classifier-the role of image composition. Remote Sens. 2020, 12, 2411. [Google Scholar] [CrossRef]
Praticò, S.; Solano, F.; Di Fazio, S.; Modica, G. Machine learning classification of mediterranean forest habitats in Google Earth Engine based on seasonal Sentinel-2 time-series and input image composition optimisation. Remote Sens. 2021, 13, 586. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Mutanga, O.; Adam, E.; Ismail, R. Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers. ISPRS J. Photogramm. Remote Sens. 2014, 88, 48–59. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Stehman, S.V. Selecting and Interpreting Measures of Thematic Classification Accuracy. Remote Sens. Environ. 1997, 62, 77–89. [Google Scholar] [CrossRef]
Foody, G.M. Status of land cover classification accuracy assessment. Remote Sens. Environ. 2002, 80, 185–201. [Google Scholar] [CrossRef]
Tassi, A.; Gigante, D.; Modica, G.; Di Martino, L.; Vizzari, M. Pixel-vs. Object-based Landsat 8 data classification in Google Earth Engine using random forest: The case study of maiella national park. Remote Sens. 2021, 13, 2299. [Google Scholar] [CrossRef]
Phalke, A.R.; Özdoğan, M.; Thenkabail, P.S.; Erickson, T.; Gorelick, N.; Yadav, K.; Congalton, R.G. Mapping croplands of Europe, Middle East, Russia, and Central Asia using Landsat, Random Forest, and Google Earth Engine. ISPRS J. Photogramm. Remote Sens. 2020, 167, 104–122. [Google Scholar] [CrossRef]
Oliphant, A.J.; Thenkabail, P.S.; Teluguntla, P.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K. Mapping cropland extent of Southeast and Northeast Asia using multi-year time-series Landsat 30-m data using a random forest classifier on the Google Earth Engine Cloud. Int. J. Appl. Earth Obs. Geoinf. 2019, 81, 110–124. [Google Scholar] [CrossRef]
Zurqani, H.A.; Allen, J.S.; Post, C.J.; Pellett, C.A.; Walker, T.C. Mapping and quantifying agricultural irrigation in heterogeneous landscapes using Google Earth Engine. Remote Sens. Appl. Soc. Environ. 2021, 23, 100590. [Google Scholar] [CrossRef]
Teluguntla, P.; Thenkabail, P.; Oliphant, A.; Xiong, J.; Gumma, M.K.; Congalton, R.G.; Yadav, K.; Huete, A. A 30-m Landsat-derived cropland extent product of Australia and China using random forest machine learning algorithm on Google Earth Engine cloud computing platform. ISPRS J. Photogramm. Remote Sens. 2018, 144, 325–340. [Google Scholar] [CrossRef]
Kosztra, B.; Büttner, G.; Hazeu, G.; Arnold, S. Updated CLC Illustrated Nomenclature Guidelines; European Environment Agency: Wien, Austria, 2017. [Google Scholar]

Figure 1. Area of interest—land cover is derived from CORINE Land Cover 2018 according to the method documented in Appendix B.

Figure 2. Workflow.

Figure 3. Demonstration of cloud masking method and its effectivity based on the selected part of the image 20180617T095029_20180617T095028_T33UXR with 64.95% cloudiness in the whole image. (a) shows the initial step of masked shadows, clouds and the created buffer, (b) shows the final masked area applied to all the parameters.

Figure 4. Spectral signatures of selected two training data polygons; (a) spectral signature of training polygon 331—Cropland and (b) spectral signature of training polygon 488—Grassland.

Figure 5. Comparison of NDVI variance, aerial image 2018 (ČÚZK) and ZM 10 (ČÚZK) in the sample selected area.

Figure 6. Scheme of creating training polygons.

Figure 7. Distribution of training polygons within the area of interest.

Figure 8. Post-processing Classification; (a) map of original classification, (b) map with replaced isolated pixels, (c) map with distinguished Forest land and ToF and (d) same area in ortophoto from 2018 (ČÚZK).

Figure 9. Average Kappa index values for individual parameters.

Figure 10. Average Kappa index values for individual combinations of parameters.

Figure 11. LULUCF classification of the area of interest, generalized as the dominant class in each ZSJ administrative unit. Note: The Other Land class does not appear in the legend as it does not predominate in any of the ZSJ.

Table 1. Amount of training polygons for individual training classes.

Class LULUCF	Semi-Automatically Created Polygons	Manually Created Polygons	Total
Settlements	16	18	34
Cropland	122	24	146
Woodland	60	12	72
Grassland	18	13	31
Wetlands	2	6	8
Other land	1	7	8
Total sum	219	80	299

Table 2. Validation matrix of the final classification.

	Settlements	Cropland	Woodland	Grassland	Wetlands	Other Land	User Accuracy
Settlements	107	26	10	18	1	0	66.05%
Cropland	8	882	6	26	1	0	95.56%
Woodland	1	11	728	32	0	0	94.30%
Grassland	1	54	23	243	1	0	75.47%
Wetlands	0	5	1	1	17	0	70.83%
Other Land	16	1	0	1	0	14	43.75%
Producer Accuracy	80.45%	90.09%	94.79%	75.70%	85.00%	100.00%	Overall accuracy 89.01%

Table 3. Areas and representations of individual LULUCF classes according to the final classification.

Class	Area	Percentage
Settlements	1059.8 km²	4.56%
Cropland	9936.4 km²	42.80%
Forest Land	8259.6 km²	35.58%
Grassland	3560.7 km²	15.34%
Wetlands	178.6 km²	0.77%
Other Land	222.1 km²	0.96%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Svoboda, J.; Štych, P.; Laštovička, J.; Paluba, D.; Kobliuk, N. Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia. Remote Sens. 2022, 14, 1189. https://doi.org/10.3390/rs14051189

AMA Style

Svoboda J, Štych P, Laštovička J, Paluba D, Kobliuk N. Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia. Remote Sensing. 2022; 14(5):1189. https://doi.org/10.3390/rs14051189

Chicago/Turabian Style

Svoboda, Jan, Přemysl Štych, Josef Laštovička, Daniel Paluba, and Natalia Kobliuk. 2022. "Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia" Remote Sensing 14, no. 5: 1189. https://doi.org/10.3390/rs14051189

APA Style

Svoboda, J., Štych, P., Laštovička, J., Paluba, D., & Kobliuk, N. (2022). Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia. Remote Sensing, 14(5), 1189. https://doi.org/10.3390/rs14051189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia

Abstract

1. Introduction

2. Materials and Methods

2.1. Area of Interest

2.2. Data

2.3. Legend

2.4. Methods

2.4.1. Cloud Masking and Mosaicking

2.4.2. LULUCF Classification

2.4.3. Training Polygons

2.4.4. Parameters of Classification

2.4.5. Accuracy Assessment

2.4.6. Post-Processing Classification

3. Results

3.1. Influence of Parameter Selection on the Resulting Accuracy of Classification

3.2. Accuracy Assessment of the Classification

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. LULUCF Classification Nomenclature and Description of Categories

Appendix B. CORINE and LULUCF Categorizations

Appendix C. Gini Importance

Appendix D. LULUCF Classification in the Selected Localities

Appendix E

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI