Comparison of Support Vector Machines and Random Forests for Corine Land Cover Mapping

: Land cover information is essential in European Union spatial management, particularly that of invasive species, natural habitats, urbanization, and deforestation; therefore, the need for accurate and objective data and tools is critical. For this purpose, the European Union’s ﬂagship program, the Corine Land Cover (CLC), was created. Intensive works are currently being carried out to prepare a new version of CLC+ by 2024. The geographical, climatic, and economic diversity of the European Union raises the challenge to verify various test areas’ methods and algorithms. Based on the Corine program’s precise guidelines, Sentinel-2 and Landsat 8 satellite images were tested to assess classiﬁcation accuracy and regional and spatial development in three varied areas of Catalonia, Poland, and Romania. The method is dependent on two machine learning algorithms, Random Forest (RF) and Support Vector Machine (SVM). The bias of classiﬁcations was reduced using an iterative of randomized training, test, and veriﬁcation pixels. The ease of the implementation of the used algorithms makes reproducing the results possible and comparable. The results show that an SVM with a radial kernel is the best classiﬁer, followed by RF. The high accuracy classes that can be updated and classes that should be redeﬁned are speciﬁed. The methodology’s potential can be used by developers of CLC+ products as a guideline for algorithms, sensors, and the possibilities and difﬁculties of classifying different CLC


Introduction
Human activities have changed the environment for thousands of years. A very significant increase in population and socio-economic activities associated with production and consumption patterns have intensified environmental changes. The pressure on ecosystems, natural habitats, and biodiversity loss are among the most intense impacts of climate change over territories. These impacts translate into changes in land cover; therefore, an update and quality thematic mapping become a key indicator for better analysis and decision making to define potential climate actions. Changes in land cover become a significant driver of climate change, but at the same time, climate change can lead to changes in land cover; therefore, it is essential for good land management to continuously generate thematic maps of land cover with the best possible accuracy at appropriate scales to facilitate their usability. Land cover, combining components of the environment and human activity, is one of the most important biophysical features of the Earth's surface [1]. Current land cover maps are an essential element of environment management and monitoring [2]. Updating land cover maps requires technology that Corine Land Cover data are used for analyzing the changes in the land use [5] and deforestation rate assessment [6] and is also often used as input to various models, e.g., air pollution monitoring [7], calculating population density [8], and determining urban heat islands [9].
Currently, non-parametric algorithms such as Artificial Neural Networks (ANN) [10], Support Vector Machine (SVM), and Random Forest (RF) [11] are most used to classify land cover. They are good at handling multidimensional data and provide satisfactory classification results [12]. The validation of CLC products is a difficult task due to the wide range of distinguished areas that are complex in structure and spectrally heterogeneous, which forces the use of time-consuming optimization procedures [13], e.g., setting the threshold for the maximum number of pixels per class, additionally assigned to the training and test set using a specific sampling method [11]. Besides, to make the reference data representative and reliable, they are reduced by an internal buffer of a specific size, aimed at removing mixed pixels at the edges of the reference polygons [14].
The Landsat and Copernicus missions have gained tremendous popularity in land cover research due to their open access policy [15]. It is often necessary to increase the informational function of acquired images through data fusion. One of the methods used is multi-temporal composition, combining scenes acquired at different dates for the same location [16]. It is based on differences in the spectral reflection of objects depending on the season. This is one of the most commonly used techniques that allow a statistically significant increase in classification accuracy [17]. Gounaridis et al. [18] used such an approach to classify CLC 2012 data where 12 out of 15 Corine classes were identified in the research area, and then a 50 m wide internal buffer was used on the field data representing them. During training, 1000 pixels were chosen randomly for each class, and then the Random Forest algorithm classified the composition of W multi-temporal images obtained by the Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) scanners. An overall accuracy of 83% was obtained. Another possible method is to include vegetation indicators [19].
Altitude data also supplement multispectral data. Leinenkugel et al. [11] used a multitemporal composition obtained by Enhanced Thematic Mapper Plus (Landsat 7 ETM+) and Operational Land Imager (Landsat 8 OLI) scanners and additionally included the altitude data of the Shuttle Radar Topography Mission (SRTM). Analyses were conducted of three research areas in Europe that are geographically diverse from each other. CLC 2012 served as reference data for land cover, which was prepared by removing mixed pixels using two internal buffers (50 and 100 m wide) to check the effect of their size on classification accuracy. In the research areas, there were 20,000 to 50,000 training pixels for the entire dataset. Classifying with the Random Forest algorithm, overall accuracy was obtained in the range of 55% to 67%, depending on the area. The differences in the classification accuracy between the data on which the 50 and 100 m buffers were used turned out to be small, as the total accuracy values fluctuated by only 1 percentage point in favor of both the data with the larger and smaller buffer.
Multispectral and radar data are also used for research and are combined with digital terrain models. Balzter et al. [14] combined radar data acquired by the Sentinel-1A satellite with SRTM data. Using CLC 2006, a hybrid model was created containing 27 classes [14]. Training and verification pixels were selected at random. Besides, a maximum pixel threshold of 20,000 pixels per class was set, and then, the classification was carried out using the Random Forest algorithm, achieving an overall accuracy of 68%.
Despite the high popularity of multispectral and radar data in land cover studies, Novillo et al. [20] used data collected by the Multiangle Imaging SpectroRadiometer (MISR) located on the Terra platform in order to gather image data for the same area at different sensor inclination angles. Based on 2006 CLC data for level 3, 35 classes were identified, followed by an internal buffer width of 137.5 m. The classification was performed 100 times with the k-means algorithm for each of the seven sensor deflection combinations. The combination with all mirror tilting configurations achieved the highest overall accuracy (70%). The best total accuracy (over 75%) was obtained for the following classes: nonirrigated arable land (211), agro-forestry areas (244), and coniferous forest (312). Many classes obtained poor results (about 20%): construction sites (133), green urban areas (141), fruit trees and berry plantations (222), annual crops associated with permanent crops (241), bare rocks (332), sparsely vegetated areas (333), and inland marshes (411); however, these classes had a small set of samples for training and verification (several dozen pixels).
During the past two decades, the Random Forest (RF) and Support Vector Machine (SVM) classifiers have drawn attention to image classification in remote sensing applications. Sheykhmousa et al. [21] highlighted that in particular, land use and land cover (LULC) mapping is the most common application of remote sensing data for a variety of environmental studies, and based on an extensive review of existing articles, they consider that the growing applications of LULC mapping alongside the need for updating the existing maps have offered new opportunities to effectively develop innovative RS image classification techniques in various land management domains to address local, regional, and global challenges. As a result of the review article, the relatively high similar SVM and RF performance in terms of classification accuracy makes them among the most popular machine learning classifiers within the RS community and demonstrates competitive results with CNNs [21].
Since the Random Forest algorithm is popular in land cover classification and gives satisfactory results, Rodriguez-Galiano et al. [22] conducted a detailed analysis of the impact of input data properties on classification accuracy. On the multi-temporal composition of images obtained by the TM scanner, they determined 14 land cover classes. They checked the impact of data reduction on the overall accuracy of the classification. On the one hand, it saves time in the classification process, and on the other hand, it is necessary to have a representative number of samples for each class. It was found that only when the training dataset was reduced by 70%, overall accuracy decreased. They used the same methodology when checking the classifier's resistance to the influence of noisy data, which can often negatively affect the accuracy of the classification, e.g., using 50% of the noisy data, the out-of-bag (OOB) error increased significantly [23].
Although there are many studies on land cover classification [24] in the literature, few studies address the Corine Land Cover classification, specifically, level 3. The class definition characteristics of the Corine Land Cover make the land cover types included Remote Sens. 2021, 13, 777 4 of 36 in the project only present in Europe, so it is not easy to compare it with classifications made for other areas, which narrows the scope of comparison. Current Corine Land Cover classification studies mostly cover small areas and referred to the generalized first and second levels [25]. Corine Land Cover has numerous limitations [26], and the new edition of CLC+ is intended to overcome them. One of its elements will be pixel-based classification, which makes it necessary to check the classification's accuracy for particular classes. The results obtained in the study may provide valuable information for the teams preparing classifications for individual European areas. The main source of optical data in the CLC+ project is the multi-temporal data acquired by the Sentinel-2 satellite, with the Landsat 8 satellite filling in any gaps in the data; therefore, in this paper, data acquired by both Sentinel-2 and Landsat 8 satellites were checked to show the classification accuracy discrepancies depending on data types.
Additionally, by comparing the most popular machine learning algorithms such as Random Forest and Support Vector Machines, the bias of classifications that are often based on a single method was reduced. Due to the ease of implementation of the algorithms used, it is possible to reproduce the results and make them comparable easily. The influence of the number of pixels per class to achieve a given overall accuracy was also examined in this study. This helps determine the number of samples required when collecting master data and preparing it for classification in a CLC project. This study was undertaken due to the lack of comparative studies in the literature on CLC classes over varying areas, comparing classifiers and using iterative accuracy assessment.
Our intention in this study was to assess the automatic land classification accuracy according to Corine guidelines based on Sentinel-2 and Landsat 8 multitemporal images (spring, summer, and autumn 2018/2019). For this purpose, areas located in various European Union parts, with a diversified history, spatial management, land cover, and different cartographic traditions were selected; however, due to the Corine program guidelines, the minimal mapping unit had to be 500 × 500 m, which significantly reduced the image information potential. The research procedure was based on two scenarios: 1) classification of twenty-meter pixel classification of the Sentinel-2 bands (10 m pixels were resampled to 20 m) and also on the thirty-meter pixel sizes of the Landsat 8; 2) in the next step, all pixels were resampled to 60 m, the basic classifications were made on them, and the results were generalized with a majority filter to the size of 8 × 8 pixels (480 × 480 m compared to the required 500 × 500 m in the Corine program). This allowed us to assess the effectiveness of the classifiers and allowed for the potential of automatic classification according to the official European CLC 2018 dataset. CLC data served as the input layer and were divided into training (25%), test (25%), and validation (50%) sets. Random Forest (RF) and Support Vector Machine (SVM) algorithms in the open-source R programming language were used as the classifiers. The main idea was to verify free-of-charge methods, which can be easily implemented on multicore processors. The classification patterns were randomly selected based on the 3rd CLC level. One of the key elements was the impact of the number of training pixels on the obtained results; it allowed us to optimize field reconnaissance during a verification phase (in the research, we used archival (2018) CLC layer and responsible satellite images).

Materials and Methods
Densely populated metropolitan with surrounding suburban areas characterized by very heterogeneous land cover, but also fragments of large-scale farming and protected areas on five research areas were selected, each covering an area of 50 × 50 km, located in Romania (northern and southern Braila), Catalonia (Tarragona), and in Poland (eastern and western part of Warsaw). This enabled diverse land cover forms to test the impact of geographic variability as well as local variations resulting from the Corine Land Cover national teams. In each country, the areas depicted on the cloudless Sentinel-2 granules as well as on the Landsat 8 image fragment were selected. Multispectral image data were very heterogeneous land cover, but also fragments of large-scale farming and protected areas on five research areas were selected, each covering an area of 50 × 50 km, located in Romania (northern and southern Braila), Catalonia (Tarragona), and in Poland (eastern and western part of Warsaw). This enabled diverse land cover forms to test the impact of geographic variability as well as local variations resulting from the Corine Land Cover national teams. In each country, the areas depicted on the cloudless Sentinel-2 granules as well as on the Landsat 8 image fragment were selected. Multispectral image data were acquired in spring, summer, and autumn 2018, but due to intense cloud cover on some of the collected scenes, the decision was made to replace them with data acquired in 2019

Research Areas
Braila (Romania) is a 4700 km 2 county situated in eastern Romania. Braila is a medium-size county representing 2% of the total area of Romania ( Figure 1). The main geographic characteristic is the Baraganului Plain because it is surrounded in the Eastside by the Danube river. The surface is flat and crossed by Buzau, Calmatui, and Siret river confluences. Here, the Danube forms an island called The Big Island of Braila with a surface of 710 km 2 , used mainly in agriculture, and a group of seven small islands with dominant swamps called The Small Island of Braila and contains the Natural Park Small Pound of Braila. Braila is known as an agricultural county because of the soil quality, mostly chernozem, resulting in 80% of the entire county surface (except urban areas) being used as arable land. The main crops cultivated in the area are grain, corn, barley, oats, sunflower, soy, and rye [27,28]. The urban centers are the cities of Braila and Galati, including the Braila district and fragments of four other districts.  The Tarragona (Catalonia) research area ( Figure 2) is located in Tarragona's province on the Mediterranean Sea coast, along the famous tourist region of Costa Dorada. The designated area includes the Prades mountain range with oak and pine forests. The area is characterized by rich topography with height differences up to 1203 m a.s.l. (Tossal de la Baltasana). Agriculture consists mainly of vineyards, olive groves, and hazelnut cultivation [29]. The main type of soil is calcareous soil, as well as regosol and lessive soil.
The river network is poorly developed, there are also no water reservoirs. The largest city in the region is Tarragona, an important center of the petrochemical industry that exerts a strong anthropopressure on the environment by emitting large amounts of air pollutants [30].
The Tarragona (Catalonia) research area ( Figure 2) is located in Tarragona's province on the Mediterranean Sea coast, along the famous tourist region of Costa Dorada. The designated area includes the Prades mountain range with oak and pine forests. The area is characterized by rich topography with height differences up to 1203 m a.s.l. (Tossal de la Baltasana). Agriculture consists mainly of vineyards, olive groves, and hazelnut cultivation [29]. The main type of soil is calcareous soil, as well as regosol and lessive soil. The river network is poorly developed, there are also no water reservoirs. The largest city in the region is Tarragona, an important center of the petrochemical industry that exerts a strong anthropopressure on the environment by emitting large amounts of air pollutants [30]. The Warsaw (Poland) area includes a fragment of the Mazovian Lowland (Masovian Plain), the largest geographical region in central Poland; see Figure 3. The river network consists of the Vistula and its tributaries, the Wieprz and Pilica, and the Sulejowski reservoir is a characteristic hydrological object. The distribution of soils is characterized by considerable diversification, mainly light soils (arenosols and podzolic), as well as lesser soils and marshes in the river valleys and floodplain terraces. Most of them are not very fertile. The species structure of forest communities consists mainly of pine forests with an The Warsaw (Poland) area includes a fragment of the Mazovian Lowland (Masovian Plain), the largest geographical region in central Poland; see Figure 3. The river network consists of the Vistula and its tributaries, the Wieprz and Pilica, and the Sulejowski reservoir is a characteristic hydrological object. The distribution of soils is characterized by considerable diversification, mainly light soils (arenosols and podzolic), as well as lesser soils and marshes in the river valleys and floodplain terraces. Most of them are not very fertile. The species structure of forest communities consists mainly of pine forests with an admixture of birch, oak, and alder. Agricultural land is predominant, and south of Warsaw is one of Poland's largest orchard areas [31]. The main urban center in the region is Warsaw, which is densely built and populated, resulting in a strong anthropopressure on the environment through significant emissions of air pollutants [32], and the transformation of the landscape through intensive urbanization [33].

Methods
The effectiveness of satellite image classifications according to the Corine Land Cover guidelines was evaluated according to the following methods ( Figure 4): • Sentinel-2 and Landsat 8 satellite image acquisition. The images were acquired, then the accuracy of atmospheric and geometric corrections was assessed (by identifying pixels with ground patterns for atmospheric correction, and then verification by overlapping multitemporal images, despite the acquisition of data for the same scene, the pixels of individual images may differ. For this purpose, the pixel grids of the images were harmonized and aligned), • obtaining official Corine Land Cover data for 2018 from the European Environment Agency (EEA),

Methods
The effectiveness of satellite image classifications according to the Corine Land Cover guidelines was evaluated according to the following methods ( Figure 4):

•
Sentinel-2 and Landsat 8 satellite image acquisition. The images were acquired, then the accuracy of atmospheric and geometric corrections was assessed (by identifying pixels with ground patterns for atmospheric correction, and then verification by overlapping multitemporal images, despite the acquisition of data for the same scene, the pixels of individual images may differ. For this purpose, the pixel grids of the images were harmonized and aligned), • obtaining official Corine Land Cover data for 2018 from the European Environment Agency (EEA), • assigning a common frame with satellite images by polygon buffers to eliminate pixels of surrounding land cover forms, • the random selection of pixels represents different land cover forms, • optimization of SVM and RF classifiers, iterative accuracy assessment (100 times), • image classifications with the best algorithms and parameters, accuracy assessment, • final review of land cover maps.

Input Data
Sentinel-2 images (level 2A) were acquired using Copernicus Open Access Hub. Only scenes with cloudiness not exceeding 10% were selected for further processing (Table 2). First, in the ENVI 5.3 software, all bands were resampled to the pixel size of sixty meters using the nearest neighbor method. For each research area, multitemporal compositions were created, collecting the spring, summer, and autumn scenes. The images were exported to the GeoTIFF format. Landsat 8 data were collected based on the Earth Resources Observation and Science (EROS) Science Processing Architecture (ESPA) service, in which images were ordered with surface reflectance already performed. In the ENVI 5.3 program, the same procedures of resampling to 60 m pixel size, layer stacking, and exporting to GeoTIFF were carried out. The acquired Sentinel-2 and Landsat 8 images were corrected atmospherically; Sentinel-2 atmospheric correction was based on an algorithm proposed in Atmospheric/Topographic Correction for Satellite Imagery [34]. The method performs atmospheric correction based on the LIBRADTRAN radiative transfer model [35]. Landsat 8 applies Land Surface Reflectance Code (LaSRC), based on the 6S radiative transfer model and a heritage from the MODIS MCD09 products [36]. Nevertheless, due to the analysis of images from different periods and areas, the authors verified the atmospheric correction quality. For this purpose, a ground-based ASD FieldSpec 4 (with the ASD ContactProbe; Analytical Spectral Devices, Inc., Longmont, CO, USA) spectrometric measurements of large 59 homogeneous and dominant calibration targets (asphalt, concrete, gravel, and paving stones) in Warsaw and surrounding areas were made. In the first step, the ASD FieldSpec spectrometric data were resampled to the resolution of Sentinel-2 and Landsat OLI, and then, pixel values corresponding to the localization of the field polygons on which the ground measurements were collected. The comparison results are the values of the Root Mean Square Error (RMSE); the results are presented in the Appendix A, Table A1.
Corine Land Cover data for 2018 were downloaded as the shapefile using the Copernicus Land Monitoring Service. Corine's and image data were clipped to specific research areas and projected into Universal Transverse Mercator (UTM) coordinate system (UTM 31N-Catalonia, UTM 34N-Poland, UTM 35N-Romania).
Internal buffer with a width of 100 m was used to minimize the negative edge impact on the spectral land cover patterns by reducing each polygon by its size ( Figure 5). The value of the buffer width was selected due to Corine Land Cover's technical specification, which specified that the drawn boundaries' geometric accuracy is greater than 100 m [4]. Polygon buffers were automatically created, and the small polygons with an area less than 25 ha were filtered. Due to the errors noticed in the Corine Land Cover 2018 map, a verification by photointerpretation of 2018 Sentinel-2 satellite images was made [37], and the total area used for classification and its proportions between classes for each area vary, indicating an unbalanced dataset (Table 3); therefore, each dataset was balanced with a threshold of 100, 200, 300, 400, and 500 pixels. For machine learning algorithms, it is important to balance the dataset. The main reason for this is to give equal priority to each class during the training process.

Classification and Accuracy Assessment
The procedures presented in this section were conducted using the R programming language and the R Studio 4.0.0 integrated development environment [38][39][40]. The first step was to extract each class's patterns from the image pixels using the raster and rgdal packages [41,42]. Then, on the obtained datasets, the algorithm training parameters were optimized, and the tuning of all classifiers' parameters was performed on its entire set of patterns. The parameter optimization procedures were performed on the references derived from the Multispectral Instrument Sentinel-2 (MSI) and Landsat 8 OLI images ( Table  4). The randomForest package was used to implement the Random Forest algorithm [43]. The ntree parameter was set to 500 because the OOB error values above this number usually stabilize [44]. Tuning was applied to the mtry parameter, and the mtry with the lowest OOB error value was selected. In the case of the Support Vector Machines algorithm, the e1071 package was used as an implementation [45]. The learning parameters were optimized using the grid search method, where each combination of the parameters is checked from the pool of parameters. Tuning was performed for linear, polynomial, radial kernel

Classification and Accuracy Assessment
The procedures presented in this section were conducted using the R programming language and the R Studio 4.0.0 integrated development environment [38][39][40]. The first step was to extract each class's patterns from the image pixels using the raster and rgdal packages [41,42]. Then, on the obtained datasets, the algorithm training parameters were optimized, and the tuning of all classifiers' parameters was performed on its entire set of patterns. The parameter optimization procedures were performed on the references derived from the Multispectral Instrument Sentinel-2 (MSI) and Landsat 8 OLI images ( Table 4). The randomForest package was used to implement the Random Forest algorithm [43]. The ntree parameter was set to 500 because the OOB error values above this number usually stabilize [44]. Tuning was applied to the mtry parameter, and the mtry with the lowest OOB error value was selected. In the case of the Support Vector Machines algorithm, the e1071 package was used as an implementation [45]. The learning parameters were optimized using the grid search method, where each combination of the parameters is checked from the pool of parameters. Tuning was performed for linear, polynomial, radial kernel function (RBF), and sigmoid kernel functions. The coef0 parameter for polynomial and sigmoid kernels was left at the default value of 0 and the degree parameter for the polynomial kernel was left at its default of 3. One of our goals in this study was to compare the accuracy of classification according to algorithms and the number of pixels used in training. Scenarios were prepared to focus on an influence of 100, 200, 300, 400, and 500 pixels for each class of the training set. For this purpose, an iterative accuracy assessment was used, which was repeated 100 times, according to the following steps • random selection of reference pixels for training and testing datasets in the 50:50 ratio, it was ensured that they were divided according to their belonging to a polygon (pixels from a given polygon could be included in only one of the datasets (training or test) in order to meet the condition of their independence [37]. The rngtools and doRNG packages were used to generate random seeds and optimize the iterative accuracy assessment's execution time. The multiple cores of the processor utilized doParallel, and foreach package using a PSOCK (Parallel Socket Cluster), a cluster was used [46,47]. • resampling of a proper number of pixels for training (previous step), • training of Random Forest and Support Vector Machines classifiers, • accuracy assessment of each classification, • saving classification accuracy results and a random seed.
Then, the classification accuracy of the remaining areas was assessed on the algorithms' selected parameters, where the parameters turned out to be satisfactory in all cases. The following metrics were used to assess the classification accuracies [48]: • overall accuracy (OA)-the ratio of all correctly classified pixels to the sum of all pixels, • producer accuracy (PA)-the ratio of correctly classified pixels of a given class to all reference pixels of a given class, • user accuracy (UA)-the ratio of correctly classified pixels of a given class to all pixels classified to this class, • Kappa coefficient measures the correspondence between two datasets, each of which identifies items into mutually exclusive categories presenting how much better the classifier is performing over the performance of a classifier that simply guesses at random according to the frequency of each class. Kappa is always less than or equal to 1. Values of 0 or less indicate that the classifier is useless. There is no standardized way to interpret its values. Kappa coefficient was used only to present the final results, but the index is characterized by a high correlation with the overall accuracy, and thus the redundancy of information is doubled [49,50]. • F1-score-is the weighted harmonic mean of the user and producer accuracy [51]. Then, the average accuracy of all classes' F1-score coefficient values was compared in the acquired results. The best algorithms were selected to test the effect of the number of training pixels on the classification accuracy (five classification scenarios were verified in which 100, 200, 300, 400, and 500 pixels were tested per class in the training and testing sets, as well).
For the final maps' production, models with the highest average F1-score for all scenario classes were selected. Finally, according to the Corine guidelines, a median filter with an 8 × 8 window was used (it means that pixel size is 60 m, so areas smaller than 480 × 480 m were excluded, confirming the CLC rules to eliminate smaller areas than 500 × 500 m, which is the Minimum Mapping Unit of the Corine program). The land cover classes of the final results were colored following the Corine Land Cover specification [4].

Results
Evaluation of individual classifiers and datasets, their results for the average F1-score of all classes need to be compared ( Figure 7). Sentinel-2 images scored the top seven, while the SVM classifiers took the top three best places (RBF, linear, and polynomial kernel functions). Large and homogeneous southern Braila polygons (35TNK) scored the top four on the Sentinel-2 data (F1-score achieved 88%). In the case of images obtained by the Landsat OLI sensor, the best classifier was the SVM algorithm with a linear kernel function for the Catalonia area (F1-score oscillated around 77%). The worst results were obtained for the Random Forest classifier for the Landsat 8 satellite for Catalonia (70% F1) and Sentinel-2 for northern Braila (70% F1). Then, the average accuracy of all classes' F1-score coefficient values was compared in the acquired results. The best algorithms were selected to test the effect of the number of training pixels on the classification accuracy (five classification scenarios were verified in which 100, 200, 300, 400, and 500 pixels were tested per class in the training and testing sets, as well).
For the final maps' production, models with the highest average F1-score for all scenario classes were selected. Finally, according to the Corine guidelines, a median filter with an 8 × 8 window was used (it means that pixel size is 60 m, so areas smaller than 480 × 480 m were excluded, confirming the CLC rules to eliminate smaller areas than 500 × 500 m, which is the Minimum Mapping Unit of the Corine program). The land cover classes of the final results were colored following the Corine Land Cover specification [4].

Results
Evaluation of individual classifiers and datasets, their results for the average F1-score of all classes need to be compared ( Figure 7). Sentinel-2 images scored the top seven, while the SVM classifiers took the top three best places (RBF, linear, and polynomial kernel functions). Large and homogeneous southern Braila polygons (35TNK) scored the top four on the Sentinel-2 data (F1-score achieved 88%). In the case of images obtained by the Landsat OLI sensor, the best classifier was the SVM algorithm with a linear kernel function for the Catalonia area (F1-score oscillated around 77%). The worst results were obtained for the Random Forest classifier for the Landsat 8 satellite for Catalonia (70% F1) and Sentinel-2 for northern Braila (70% F1). Remote Sens. 2021, 13, x FOR PEER REVIEW 13 of 35 The Support Vector Machine algorithm with the radial kernel function (SVM RBF) algorithm achieved very good results for all areas and all classes of the research (the highest average accuracies of the F1-score oscillating between 73% and 88%; Figure 8); therefore, the further focus was on this classifier and on the Random Forest (it achieved up to 5% lower results than SVM RBF), predicting the result images on both the Sentinel-2 and Landsat 8 imagery. Comparing the classification outcomes with the original Corine Land Cover maps, it can be seen that post-classification maps contain more details, which can be seen in particular in more heterogeneous areas. There is also a difference in the degree of generalization between the Landsat 8 and Sentinel-2 images, which show land cover in more detail. In terms of classifiers, the difference is particularly noticeable for the areas of Braila, where the Random Forest classifier generalized agricultural areas more than the Support Vector Machine, which classified technical objects and roads as class 112 (discontinuous urban fabric). For research areas in Romania, there are significant differences in the case of class 242 (complex cultivation patterns), which in the original study was strongly underestimated comparing with class 211 (non-irrigated arable land). As for the area of northern Braila, there is a discrepancy in the identification of water reservoirs (class 512), which remained without water on the satellite images during the analyzed period.
RF_Braila_35TNL_Sentinel-2 Average value of the F1 score (%) Classification sets The Support Vector Machine algorithm with the radial kernel function (SVM RBF) algorithm achieved very good results for all areas and all classes of the research (the highest average accuracies of the F1-score oscillating between 73% and 88%; Figure 8); therefore, the further focus was on this classifier and on the Random Forest (it achieved up to 5% lower results than SVM RBF), predicting the result images on both the Sentinel-2 and Landsat 8 imagery. Comparing the classification outcomes with the original Corine Land Cover maps, it can be seen that post-classification maps contain more details, which can be seen in particular in more heterogeneous areas. There is also a difference in the degree of generalization between the Landsat 8 and Sentinel-2 images, which show land cover in more detail. In terms of classifiers, the difference is particularly noticeable for the areas of Braila, where the Random Forest classifier generalized agricultural areas more than the Support Vector Machine, which classified technical objects and roads as class 112 (discontinuous urban fabric). For research areas in Romania, there are significant differences in the case of class 242 (complex cultivation patterns), which in the original study was strongly underestimated comparing with class 211 (non-irrigated arable land). As for the area of northern Braila, there is a discrepancy in the identification of water reservoirs (class 512), which remained without water on the satellite images during the analyzed period. In the case of Catalonia, in relation to the original Corine Land Cover map, there was a change of classes 221 (vineyards) and 222 (fruit trees and berry plantations), which should be identified as class 242 (complex cultivation patterns). Particular differences in the classification of the above-mentioned classes can be noticed between the Sentinel-2 and Landsat 8 images. There are visible differences in the distinction between forest types, e.g., coniferous forests (312), which have decreased compared to deciduous forests (311). In the Catalonia area, the building classes were very well distinguished: 111 (continuous urban fabric), 112 (discontinuous urban fabric), and 121 (industrial or commercial units).
In the case of Warsaw granules, fruit trees and berry plantations were classified well (222). As in Catalonia, the different forest classes (311, 312, and 313) were distinguished in more detail than in the original CLC study. The Sentinel-2 images show the Fruit trees and berry plantations class (222) in more detail than the Landsat images. Difficulties were noticed in distinguishing classes 111 (continuous urban fabric) and 121 (industrial or commercial units) due to mixed pixels representing different classes characterized by granularity, discontinuity, and mosaic character (Figure 8).  The comparison of accuracy according to the F1-score of individual classes for 100 repetitions of independent classifications is presented in Figures 9-12, and the values of the overall accuracy and the Kappa coefficient in Figure 13. The highest values of the median F1-score were obtained for spectrally homogenous classes, e.g., 511 (water courses), 512 (water bodies), and 523 (sea and ocean); coniferous forests (312) and vineyards (221).          All areas obtained satisfactory overall accuracy and Kappa coefficient results ( Figure  13). The highest values of the median total accuracy were achieved for the northern area of Braila and the lowest total accuracy was for eastern Warsaw. Regarding the classifiers, the Support Vector Machines with a radial nucleus obtained higher maximum values of the total accuracy and the Kappa coefficient, but they were characterized by a larger spread than the Random Forest. In the case of imaging, higher median values were obtained for the Sentinel-2 satellite data. In all cases, higher accuracies were obtained using Sentinel-2 images than Landsat 8. In the classifiers for areas in Romania and Catalonia, Support Vector Machines with the RBF turned out to be better. In contrast, for areas located in Poland, higher accuracy was obtained for the Random Forest algorithm. Analyzing the map ( Figure 14) and the error matrix (Table 5) of the North Braila area, the best results were achieved by water courses (511; F1 100%), water bodies (512; F1 99%), and rice fields (213; F1 98%). The problems were noticed in the case of the transitional woodland-shrub (324; F1 69%), which is often confused with land principally occupied by agriculture, with significant areas of natural vegetation (243) as well as complex cultivation patterns (242). The lowest values were obtained for the building classes: Industrial or commercial units (121; F1 57%) and discontinuous urban fabric (112; F1 64%), often mixing with each other. Sentinel-2 images could receive higher values of the F1-score median as well as a smaller spread of final results for the categories of anthropogenic land cover than in the case of Landsat 8 images. The OLI scanner's classification results obtained slightly higher median results for pastures (231) and land principally occupied by agriculture, with significant natural vegetation areas (243).
Individual research areas were characterized by high variability within classes: The lowest median results were obtained for classes such as: olive groves (223), complex cultivation patterns (242), Land principally occupied by agriculture, with significant areas of natural vegetation (243), and transitional woodland-shrubs (324).
All areas obtained satisfactory overall accuracy and Kappa coefficient results ( Figure  13). The highest values of the median total accuracy were achieved for the northern area of Braila and the lowest total accuracy was for eastern Warsaw. Regarding the classifiers, the Support Vector Machines with a radial nucleus obtained higher maximum values of the total accuracy and the Kappa coefficient, but they were characterized by a larger spread than the Random Forest. In the case of imaging, higher median values were obtained for the Sentinel-2 satellite data.
In all cases, higher accuracies were obtained using Sentinel-2 images than Landsat 8. In the classifiers for areas in Romania and Catalonia, Support Vector Machines with the RBF turned out to be better. In contrast, for areas located in Poland, higher accuracy was obtained for the Random Forest algorithm.
Analyzing the map ( Figure 14) and the error matrix (Table 5) of the North Braila area, the best results were achieved by water courses (511; F1 100%), water bodies (512; F1 99%), and rice fields (213; F1 98%). The problems were noticed in the case of the transitional woodland-shrub (324; F1 69%), which is often confused with land principally occupied by agriculture, with significant areas of natural vegetation (243) as well as complex cultivation patterns (242). The lowest values were obtained for the building classes: Industrial or commercial units (121; F1 57%) and discontinuous urban fabric (112; F1 64%), often mixing with each other.

Discussion
In our opinion, there is a lack of comparison of CLC products obtained using different areas, sensors, and classifiers in the literature. We focused on conducting analyzes on individual Sentinel-2 imaging granules, selecting three countries characterized by a different history, which influenced spatial management (starting from large-scale agricultural crops (South Braila), a large urban agglomeration (Warsaw) with direct operating facilities, including the suburban zone, which on the one hand provides protected areas and climatic resorts along Vistula river, as well as typical suburban zones, including small farms providing food for suburban towns), but also coastal and river port areas (Tarragona, North Braila) and well-known tourist areas. The selection of tools was not accidental, because, based on the literature review, we wanted to use machine learning algorithms that are widely available and based on a programming language, which allows us to automate the procedure for classifying subsequent areas. In many cases, the results obtained on the original data (20 m Sentinel-2 and 30 m Landsat 8) were much more accurate than on the generalized data (60 m), and then filtered to a resolution of 480 × 480 m, which is a requirement of Corine Land Cover. In our work, we compared results obtained using four different SVM kernels and the Random Forest classifier. We also included a multi-temporal composition, which gave satisfying results without having to search for additional datasets to make the images more informative (other sensors, radar, lidar, etc.). This is especially important considering adding additional data is time-consuming (especially if the product is developed for the whole of Europe). We investigated the relation between the number of pixels used for training per class and the classifier used. Such works are not rare, but they mostly focus on either problem, not on both as we carry out here. Our results will help inform further CLC+ creators about the required minimum number of pixels per class they need to acquire in order to deliver acceptable results. Our study is particularly relevant now that a new version of CLC+ is being developed, with a heavy focus on creating solutions employing classification approaches that enrich the final product.
Despite the advantages and numerous applications, CLC products and analyses carried out should be treated with caution, and researchers should be aware of the restrictions associated with them [26]. Despite the continuous improvement of definitions, a significant disadvantage is the ambiguity of the class description, as Jansen and Di Gregorio [53] pointed out. Diaz-Pacheco and Gutiérrez [54] highlighted a significant impact of subjective photo interpretation on results, which translates into thematic accuracy, insufficient standardization, and harmonization of data. The difference in thematic accuracy in the scale of regions and countries is also noted by García-Alvarez and Camacho-Olmedo [55]; problems with detailed analyses of land cover were noticed by Di Sabatino et al. [56]. According to the European Environmental Agency [57], the most challenging and subjective classes in photointerpretation are classes: 242, 243, 313, and 324 [57]; in the case of our classifications, it was confirmed only in the case of class 324 (East and West Warsaw) and 242 (Catalonia). The rest of the areas offered an acceptable level of classification accuracies (Table 12). It is also worth noting that in 2012 there was a change in the methodology for creating the German CLC [58], which caused a significant decrease in area in classes 242, 243, and an increase in area in class 231. The standard photointerpretation (used in CLC 1990(used in CLC , 2000(used in CLC , and 2006) was replaced by an original methodology based on the use of the existing Digital Land Cover Model for Germany (DLM-DE) national data, which is much more detailed (the size of the minimum mapping unit (MMU) is 1 ha). This makes it impossible to analyze changes between versions with the old and the new methodology. Besides, the Remote Sens. 2021, 13, 777 26 of 36 CLC technical team verification in many countries has shown that a common mistake in photointerpretation is the overestimation of classes 242 and 243 [59]. The process of creating CLC is time-consuming, expensive, and the final product is published with a significant delay between data collection and publication. F-score for complex agricultural areas (classes 242 and 243) oscillate around 80% for Brailas (F1: 81%, 75%, 79%), the same accuracies were observed by Denize et al. [60], who used Sentinel-2 data and SVM and RF algorithms for classifying crop residues, bare soil, winter crop, and grassland scoring an overall accuracy of 81%. For the same source of data and Random Forest algorithms for maize, onion, sunflower, and sugar beet, Immitzer et al. [61] achieved 65-76%, which is a similar result to the Warsaw area in our study.
Phiri et al. [62], based on an analysis of 222 papers oriented on land cover mapping, confirmed that SVM and Random Forest algorithms offered the best accuracies in the case of Sentinel-2 images (89-92%) than Landsat-based maps; although the mean classification results of Random Forest and SVM were comparable (~89%), as the maximal, as well (~93-95%) [62]. General observations were confirmed in our case, where we could notice a similar range of the best results, but in our case, SVM RBF obtained results better by two to three percent than RF.
Analyzing the sensor's effect on classification accuracy, better total accuracy results (7-8%) were obtained for the MSI sensor than OLI regardless of the algorithm used. For the Random Forest a Support Vector Machines algorithm, the differences between the MSI and OLI sensors were 8% and 7% in favor of MSI, respectively. The advantage of the MSI scanner over OLI in overall accuracy (2-8%) was also recognized by Phiri et al. [62], who collated 20 scientific papers on land cover classification. The researchers noted that the MSI scanner was 8% better for the Random Forest algorithm, while for Support Vector Machines the difference was smaller at only 2%. Similar conclusions were also reached by Forkuor et al. [63] by resampling the resolution of Landsat 8 to Sentinel-2 (10 m), obtaining higher overall accuracies (2-4%) for the MSI scanner than OLI. Additionally, when resampling Sentinel-2 to Landsat 8 resolution (30 m), performed by Topaloglu et al. [64], it led to better total accuracy results (2%) for the MSI scanner than OLI. Despite the longer operational time of the Landsat series of satellites than Sentinel-2, allowing for the development of data processing methods, an important element seems to be the larger number of multispectral bands (Sentinel-2: 13 bands, Landsat 8: 8 bands) of the MSI scanner increasing the classification accuracy (Table 13).
An important element affecting land cover classification accuracy is the number of classes identified (Table 13). Increasing the number of land cover classes identified contributes to a decrease and blurring of spectral differences between classes, resulting in decreased classification accuracy [65]. Additionally, an analysis of 64 articles on land cover classification by the team of Thinh [66] highlights a significant decrease in the total accuracy when increasing the number of classes. In the present study, the highest accuracies were obtained for areas where 10 and 14 classes were present, which significantly affected the classification accuracy and contributed to the differences in the results obtained from other authors who usually designated fewer classes; however, it is not only the number of classes determined that matters, but also the choice of methods for processing the data and making them more informative. Demirkan et al. [67], although identifying only four firstlevel Corine classes, achieved weaker results than the vast majority of other researchers identifying more classes. The reason for the more inferior result could be the use of only a single satellite image. An acceptable way to increase the data's informativeness is to create a multi-temporal composition from satellite images acquired at different times. This allows us to not only reduce the influence of a single image and errors that may occur in it, but especially helps to distinguish vegetation types that change their spectral reflectance depending on the analyzed period. In this paper, a multi-temporal composition was used from satellite images acquired in spring, summer, and autumn; however, there are "denser" time series in the literature, as in the case of the team of Forkuor et al. [63], who used Sentinel-2 and Landsat 8 imagery from five dates, obtaining high total accuracy results (89-93%). Increasing the time series does not always produce higher classification accuracy results as shown by the study of Weinmann and Weidner [68] who, using imagery acquired by the MSI scanner for four dates with a small area (16 × 16 km 2 ) and the same number of classes, obtained total accuracy results of up to 80%. It is worth noting that it is impossible for many areas to acquire satellite imagery due to the cloud cover present, which makes analysis impossible. Another important aspect of differences in classification accuracy is the spatial resolution of satellite data used. On the one hand, the high spatial resolution allows the classification of land cover forms with high precision; on the other hand, land cover requires generalization. That is why many classification systems such as Corine Land Cover allow for the presence of another class in a given class of elements, e.g., farm buildings in agricultural areas, technical paths in forests, and small water bodies in forests. Such generalization is often possible by resampling satellite data to a lower resolution, allowing mixed upland cover classes to occur sporadically within large homogeneous classes. Researchers obtain similar results for several classes regardless of whether they use a higher [69] or lower resolution [64]; however, for the classification of the third level Corine Land Cover data, it was necessary to lower the resolution in order to generalize the land cover classes, allowing a dozen classes to obtain results comparable to other authors. Similar methods are used to identify dozens of Corine classes, as exemplified by the study carried out by Novillo et al. [20], who used 275 m resolution MISR scanner imagery for this purpose, but this was at the cost of reduced precision in class boundary delineation.
Comparing the classifiers used in this study, higher results were obtained for Support Vector Machines than Random Forest for both MSI and OLI scanners (6% and 7% higher, respectively). For the results obtained by Demirkan et al. [67] for four classes, the accuracy differences do not appear in the comparison made by Thanh Noi and Kappas [66]. Only at six classes, a slight advantage in classification accuracy (by 3%) of Support Vector Machines over Random Forest is seen. Slight differences are also found by Forkuor et al. [63], where for the OLI scanner Random Forest is 1% better, and for MSI, there is an inverse relationship where the Support Vector Machines algorithm is 1% better. Phiri et al. [62] found a 4% advantage of Support Vector Machines for the OLI scanner. The MSI scanner Random Forest was found to be 5% better than SVM. The process of selecting hyperparameters for the classifiers, as well as the nature of the data and the number of classes, may have influenced the discrepancies.
Analyzing the results obtained for each Corine level 3 class (Table 14) and comparing them with other authors should be considered satisfactory. Since the classification was carried out in three different study areas, where geographical variability could have significantly influenced the obtained values, we decided to give the minimum and maximum value obtained for a given class from these areas. In the case of anthropogenic sites for classes 111 and 112, a minimum producer accuracy (72% and 77%, respectively) was obtained, comparable to the results of 75% and 77% from Balzter et al. [14]. Despite the use of different methods by the team of Balzter [14], who used Sentinel 1A radar imagery combined with SRTM data, this shows the possibility of successfully classifying development areas for Corine Land Cover. In the case of class 211 agricultural land, the obtained producer accuracies by Gudmann et al. [71] are remarkably high (97%), being slightly higher than the obtained maximum accuracies (95%) for this class. This may have been influenced by the small study area (500 km 2 ) being five times smaller than each study area in this paper, and also the geographical nature of the area may have influenced the result. In the case of classes 242 and 243, where difficulties in classification were encountered, resulting in 61% and 50% of the producer's minimum accuracy, respectively. Problems in their classification were also found by the team of Gudmann et al. [71], who obtained 34% and 26% of the producer's accuracy, respectively. This may indicate the inability to identify these classes, and the need to find other methods to classify them effectively is urgent. Additionally, for class 324, the weakest result was obtained (min. PA 56%), which is comparable to other researchers (50% [14]; 42% [71]). For deciduous forests (311), a comparable result (min. PA 63%) was obtained with Balzter et al. [14] (PA 67%). Noticeable differences were found for the coniferous forest class (312), for which the minimum value of the producer's accuracy (95%) differs significantly from the results obtained by the researchers (70% [14], 58% [71]). The difference of the result in the case of Baltzer et al. [14] can be understood by different characteristics of the radar data, but in the case of the team of Gudmann et al. [71], it differs significantly despite the use of a multi-temporal composition from images acquired by the Sentinel-2 satellite and the Random Forest algorithm. Significant differences are also found for the producer accuracy score (44%) for water bodies (512) obtained by Gudmann et al. [71]. Surprisingly, the low accuracy in particular of the characteristic spectral reflectance curve for water, the rarely occurring problems in identifying this class [72], and the lack of occurrence of other classes related to water areas. The differences obtained may have been influenced by the nature of the study area and its size. Regarding a usage of more sophisticated classifiers such as those based on deep learning methods, there is a multitude of approaches one can select; simple implementations such as a deep multilayered perceptron offers comparable results as obtained by SVM or RF, but optimizing procedure requires more time and better servers [72]. In order to compare shallow and deep learning methods, it is necessary to prepare the data in an appropriate way to compare them objectively (Liu, 2017) [73], also bias can be introduced without following uniform rules, where often one method is favored and its parameters optimized, while other algorithms are not optimized and are assigned default parameter values [74]. Additionally, it should be emphasized that for deep learning methods a very large number of samples were needed for training which requires more computing resources and longer time for optimizing the parameters [73]. In our case, the challenge for the classifiers was to determine patterns of the same classes, but biogeographically different areas, where different spatial management principles apply. For example, research carried out by Ulmas and Liiv (2020) [72] where a convolutional machine learning model with a modified U-Net structure was used to classify land cover according to Corine Land Cover nomenclature did not yield better results, e.g., lowest results were obtained for the classes: discontinuous urban fabric (class 112; F1-score: 1%, and in our case the lowest 42% for Landsat 8, in case of Sentinel-2 results were better; Figures 9-12), industrial or commercial units (class 121; F1-score: 36%, and in our case 4%), sport and leisure facilities (class: 142; F1 0%, and in our case 42%), rice fields (class 213; F1 12%, and in our case 42%), fruit trees and berry plantations (class 222; F1 27%, and in our case 28%). Additionally, for water area classes where there are usually no problems with their classification low accuracies were obtained (F1 38-83%, and we observed the lowest accuracy in Warsaw's 8%, but its median achieved 84% (class 512), excluding this one case, the lowest accuracy scored 48%, but the lowest median 84%). Additionally, the superiority of deep learning methods over Random Forest and Support Vector Machines was not observed in other land cover classification studies [74,75]. Naturally, this is a subject of discussion since much depends on the subject of classification (number of classes, data, how distinct classes are, etc.). Most works focus on using artificial neural networks and their variations to classify datasets structured in the same way as for SVM or RF [76][77][78][79]. That is, to classify images using pixel-based approaches. This approach disregards the natural advantages of more advanced deep learning techniques such as convolutional or deconvolutional neural networks, which can exploit relations between pixels and objects on the image [73]. Proper use of those advantages allows for more accurate results that are also easier to analyze (such as providing class labels to image parts instead of individual pixels) [80]. Moreover, when deciding to investigate deep learning, one must be ready to invest a significant amount of time into processing and training steps. The use of deep learning enormously increases processing times compared to classical machine learning algorithms [81], especially when large areas are involved. This, combined with the expertise required by deep learning, leads to interesting but challenging problems that are yet to be thoroughly investigated.

Conclusions
The result of the work is the verification of the potential of both the Landsat 8 images that formed the basis of the Corine program and the Sentinel-2 data. The conducted analyzes showed a several percent advantage of the Sentinel-2 data, despite the fact that the classification was carried out on the basis of resampled data to a resolution of 60 m.
Another key result is the analysis of the usefulness of algorithms. The Support Vector Machine algorithm with the radial kernel function (SVM RBF) achieved significant results for all research areas. The following part focused on this classifier and Random Forest, making the prediction of the result images on both Sentinel-2 and Landsat 8 imagery. SVM results are characterized by a smaller range of data distributions such as deciduous forests (311), water courses (511), and water bodies (512) than in the case of the results obtained for Random Forest, which, however, obtained a smaller range of data distribution for permanently irrigated land (211). Comparing the classification images with the original Corine Land Cover study shows that more detail is included, which makes the class areas less homogeneous than in the original study. There is also a difference in the degree of generalization between results acquired from the Landsat 8 and Sentinel-2 imagery. The performed analyzes confirmed that the presented results are not the best that can be obtained based on the RF and SVM algorithms and the used data, higher results were obtained for the original resolutions (20 m Sentinel-2 and 30 m Landsat 8), but the Corine program guidelines required generalization to a resolution of 500 × 500 m (in our case we used 480 × 480 m), which lowered the presented results by several percent. Sentinel-2 shows the land cover in more detail, offering 8-11% higher results, e.g., a particularly big difference can be observed for the areas of South and North Braila. Random Forest classifier better generalized agricultural areas than Support Vector Machines, which classified technical objects and some roads as class 112 (industrial or commercial units). For research areas in Romania, there are significant differences in the share of class 242 (complex cultivation patterns, which in the original study was strongly underestimated) at the expense of class 211 (non-irrigated arable land). As for the area of northern Braila, there is a discrepancy in the identification of water reservoirs (class 512), which remained without water during the satellite imaging.
Concerning the area of Catalonia, the original study by Corine Land Cover, there was a significant change in the share of classes 221 (vineyards) and 222 (fruit trees and berry plantations) in favor of class 242 (complex cultivation patterns), indicating their distribution in more detail. Considerable differences in the classification of the abovementioned classes can be seen between the Sentinel-2 and Landsat 8 images. The changes can also be seen in the forest type area where the coniferous forest area (312) has decreased in favor of deciduous forests (311). In the Catalonia area, the building classes were very well distinguished: 111 (continuous urban fabric), 112 (discontinuous urban fabric), and 121 (industrial or commercial units). In the case of Warsaw areas, orchard areas (222) represented the very well-classified class. As in Catalonia, the different forest classes (311, 312, and 313) were mapped in more detail than in the original CLC study. The Sentinel-2 images show the orchard class (222) in more detail than the Landsat images. Difficulties were noticed in distinguishing classes 111 (continuous urban fabric) and 121 (industrial or commercial units), because it is a class with small areas with very closed pixels values so that the thresholding process of pixels representing is very confused and given classes mixed with each other, characterized by large granularity, discontinuity, and mosaic character. The survey carried out although three test areas in different parts of Europe is still a small area in comparison to the whole European area, so it is necessary to investigate further areas and check individual combinations of classes. In this study, a total of 23 out of 44 Corine coverage classes were checked. Due to the nature of the area, given classes may obtain lower or higher accuracy depending on the co-occurrence of classes with similar terrain characteristics. Results from this research could help Corine Land Cover+ producers with decisions related to selecting algorithms, satellite data, and the possibilities and difficulties of classifying individual Corine Land Cover classes. This study indicates that Corine Land Cover classes could be classified with high accuracy.
The most important conclusions from the study are: • Sentinel-2 satellite images allowed to classify land cover with better overall accuracy (8-10%) than the Landsat 8 data, • Support Vector Machines algorithm with an RBF kernel function achieved the best results and obtained higher overall accuracy results (6-7%) than Random Forest, • the best classification results were achieved for classes: Sea and ocean (523