Use of Sentinel-2 and LUCAS Database for the Inventory of Land Use , Land Use Change , and Forestry in Wallonia , Belgium

Due to its cost-effectiveness and repeatability of observations, high resolution optical satellite remote sensing has become a major technology for land use and land cover mapping. However, inventory compilers for the Land Use, Land Use Change, and Forestry (LULUCF) sector are still mostly relying on annual census and periodic surveys for such inventories. This study proposes a new approach based on per-pixel supervised classification using Sentinel-2 imagery from 2016 for mapping greenhouse gas emissions and removals associated with the LULUCF sector in Wallonia, Belgium. The Land Use/Cover Area frame statistical Survey (LUCAS) of 2015 was used as training data and reference data to validate the map produced. Then, we investigated the performance of four widely used classifiers (maximum likelihood, random forest, k-nearest neighbor, and minimum distance) on different training sample sizes. We also studied the use of the rich spectral information of Sentinel-2 data as well as single-date and multitemporal classification. Our study illustrates how open source data can be effectively used for land use and land cover classification. This classification, based on Sentinel-2 and LUCAS, offers new opportunities for LULUCF inventory of greenhouse gas on a European scale.


Introduction
Under the United Nations Framework Convention on Climate Change (UNFCCC) and the Kyoto Protocol, each country has to submit an annual inventory of national emissions of greenhouse gas (GHG).This national inventory reports anthropogenic emissions and removals resulting from human activities for a calendar year.It is divided into five sectors: (1) Energy, (2) Industrial Processes and Product Use, (3) Agriculture, (4) Land Use, Land Use Change, and Forestry, and (5) Waste.This study focuses on the Land Use, Land Use Change, and Forestry (LULUCF) sector, which differs from the other sectors due to its particularity in presenting sources and sinks of GHG.For example, land uses (LU) such as forest and grassland have the ability to accumulate atmospheric emission CO 2 (net sinks), whereas croplands act as a net source.Human activities (e.g., afforestation, deforestation, forest management, cropland management, wetland drainage) induce changes in carbon stocks of LU categories and their conversions.Information about land area between emitting and absorbing categories of land is needed to estimate carbon stocks emissions and removals of GHG associated with LULUCF activities [1].Such inventory reports have to provide information regarding land identification for estimating change in carbon stock.According to [2] definitions, GHG emissions are Land 2018, 7, 154 2 of 16 estimated through the classification of the land into six LU categories: (1) forest land, (2) cropland, (3) grassland, (4) wetlands, (5) settlements, and (6) other land.Only broad and nonprescriptive definitions are provided for these LU categories: Countries may use their own definitions, which may or may not refer to internationally accepted definitions.The definitions of LU categories may incorporate land cover (LC) type, be LU-based, or be a combination of the two [3].In Belgium, the whole country area can be described by the first five categories.Therefore, the category "other land" has been removed from the Belgian inventory.
In Belgium, LU planning was transferred from the federal to the regional governments (Flanders, Wallonia, and Brussels) at the end of the 20th century.As a consequence, no LULC map has been produced at the national level.The current LULC map of Wallonia (Carte d'Occupation des sols de Wallonie (COSW)) dates from 2007 and is based on a combination of several data sources such as a cadastral plan and agricultural census.According to Reference [4], the LULC map of Wallonia relies on outdated, mixed, and incomplete LC and LU information.For this study, we were looking for an approach that would integrate data available at the European scale (i.e., not regional datasets) and be updatable on an annual basis.
The Belgian National Inventory, initially developed by Reference [5], consists of a regular grid of points covering the whole country.The sampling unit is of 1 × 2 km, representing a specific area of 200 ha.The current methodology uses an estimation of the area proportions [1].The proportion of each LU category is calculated through the division of the number of points per category by the total number of points.Then, area estimates are established by multiplying the proportion of each category by the total area.The three regions are responsible for delivering their GHG inventories, which are then compiled in a common report to produce to the Belgian GHG inventory.
The attribution of the points to one of the five LU categories is made through two-phase geoprocessing.The first phase automatically attributes an LU category when the point is covered by a regional vector layer related to the LU (see Section 2.3).Following this first phase, a certain number of points (~20%) remain undiagnosed and are then classified by the expert interpretation of aerial photography.This second step requires the manual intervention of operators and impedes processing efficiency because of its time-consuming nature and expensive cost.To address this problem, the present project aims to develop a method for extracting land use and land use changes information in order to facilitate and improve the annual reporting of LULUCF.For this purpose, building up an automated approach for mapping the emissions and removals resulting from the LULUCF sector should assist the elaboration of the National Inventory.Only a minority of countries make a direct use of satellite remote sensing for the GHG inventory preparation.In contrast, most European countries use derived products such as the Corine Land Cover [3].
The present article aims to define the best parameters to produce an LULUCF classification with Sentinel-2 imagery from the perspective of complying with the Intergovernmental Panel on Climate Change (IPCC) guidelines.To this end, the performances of four classifiers (maximum likelihood, random forest, k-nearest neighbor, and minimum distance) were examined under different sample sizes.We also investigated the new possibilities provided by the Sentinel-2 regarding the high temporal resolution and large spectral information for an accurate LULC mapping.In order to ensure a consistent classification at the European level, the study used sampling data from the Land Use/Cover Area frame statistical Survey (LUCAS) (© Eurostat) and data sources from the Copernicus program.Both types of data are characterized by a harmonized methodology, a European coverage, and free access.They could thus stand for a prototype of LULUCF classification at the European scale.This method provides a new opportunity for a common European reporting that would enable comparisons between different member states and increase the "credibility" of the European Union GHG inventory [6].

Materials and Methods
The methodology used in this study involved a pixel-wise classification technique using Sentinel-2 imagery.Different supervised classification algorithms were applied (maximum likelihood, random forest, k-nearest neighbor, and minimum distance) in order to compare their performance for land use and land cover classification (see Section 3.2).The four classifiers were implemented in the Sentinel Application Platform (SNAP) toolbox to run the experiments.The classification was trained using the LUCAS 2015 database (© Eurostat) (see Section 2.2).The training samples were randomly divided into 10 balanced datasets with the corresponding sizes of 1%, 2%, 3%, 5%, 10%, 30%, 40%, 60%, 80%, and 100% of the total training data to assess the effect of the training sample size on the resulting classification (see Section 3.1).Then, the quality of the results was assessed using two sets of references data, the Belgian inventory grid (as described previously (see Sections 1 and 2.3)) and a part of the LUCAS database which was divided randomly between training and validation datasets.Finally, the classification was validated in a confusion matrix and also through the comparison of LULC statistics from the LULUCF map produced and the land areas derived from the 2016 Belgian inventory and the Corine Land Cover (see Section 3.3).Figure 1

Sentinel-2 Data
The Sentinel-2 mission consists of two satellites, Sentinel-2A and Sentinel-2B, that measure reflected solar spectral radiances in 13 spectral bands ranging from visible to shortwave infrared (SWIR) bands [7,8].Sentinel-2A turned operational on 28 November, 2015, and Sentinel-2B on 7 July, 2017 [9].The data are characterized by a spatial resolution ranging from 10 to 60 m depending on the spectral band and a temporal resolution of 10 days with one satellite [9].

Sentinel-2 Data
The Sentinel-2 mission consists of two satellites, Sentinel-2A and Sentinel-2B, that measure reflected solar spectral radiances in 13 spectral bands ranging from visible to shortwave infrared (SWIR) bands [7,8].Sentinel-2A turned operational on 28 November 2015, and Sentinel-2B on 7 July 2017 [9].The data are characterized by a spatial resolution ranging from 10 to 60 m depending on the spectral band and a temporal resolution of 10 days with one satellite [9].
Sentinel-2 imagery has gained in interest for LULC mapping (e.g., [10][11][12][13]).For the remote sensing community, it presents four major advantages: (1) Covering all the continental surfaces with high spatial resolution, (2) being free of charge, (3) having a large number of spectral bands, and (4) offering the opportunity to build a time series.
In this study, Sentinel-2 images were downloaded from the Copernicus Open Access Hub (https: //scihub.copernicus.eu/).Despite not having a common agreement on the preprocessing of the data, the images were corrected by means of a Sen2Cor processor (version 2.4.0;© ESA) to reach level-2A, which provides bottom-of-atmosphere (BOA) reflectance.According to Reference [13], the comparison of the four major atmospheric correction methods (Sen2Cor, MAJA, 6S, and iCOR) showed minor differences between them.However, in a previous study, Sen2Cor yielded the best R 2 results for all the spectral bands [14].The 20 m spatial resolution bands (B5, B6, B7, B8a, B11, and B12) were then resampled with 10 m resolution.To cover the whole of Wallonia, eight tiles were required (Figure 2 and Table 1) to produce three mosaics with Sentinel Application Platform software (SNAP © ESA).Unfortunately, it was impossible for us to realize four mosaics for the different seasons of the year 2016.This was due to two main reasons.First, the year 2016 was only characterized by the presence of a single satellite (S2A).Second, the weather conditions of the year 2016 were really bad and we were not able to acquire images for the fall period.Each mosaic was cloud-free and presented a homogeneous tree cover.
Land 2018, 7, x FOR PEER REVIEW 4 of 15 high spatial resolution, (2) being free of charge, (3) having a large number of spectral bands, and (4) offering the opportunity to build a time series.
In this study, Sentinel-2 images were downloaded from the Copernicus Open Access Hub (https://scihub.copernicus.eu/).Despite not having a common agreement on the preprocessing of the data, the images were corrected by means of a Sen2Cor processor (version 2.4.0;© ESA) to reach level-2A, which provides bottom-of-atmosphere (BOA) reflectance.According to Reference [13], the comparison of the four major atmospheric correction methods (Sen2Cor, MAJA, 6S, and iCOR) showed minor differences between them.However, in a previous study, Sen2Cor yielded the best R² results for all the spectral bands [14].The 20 m spatial resolution bands (B5, B6, B7, B8a, B11, and B12) were then resampled with 10 m resolution.To cover the whole of Wallonia, eight tiles were required (Figure 2 and Table 1) to produce three mosaics with Sentinel Application Platform software (SNAP © ESA).Unfortunately, it was impossible for us to realize four mosaics for the different seasons of the year 2016.This was due to two main reasons.First, the year 2016 was only characterized by the presence of a single satellite (S2A).Second, the weather conditions of the year 2016 were really bad and we were not able to acquire images for the fall period.Each mosaic was cloud-free and presented a homogeneous tree cover.

LUCAS Dataset
The European Land Use/Cover Area frame statistical Survey (LUCAS) point database was used as a training sample and reference data.The database provides detailed attributes for LC and LU accompanied by a set of ground photographs covering all European countries [15].The dataset is based on a systematic 2 × 2 km grid of points on which each sample is assigned to one of the eight LC categories through a two-phase sampling.The validation of the first phase is made through photointerpretation, whereas the second phase selects a part of the dataset points that will be further visited by a field surveyor.The LUCAS survey is a unique dataset because it uses a classification that is comparable among member states.For this study, LUCAS 2015 was used as a training sample and validation dataset.The dataset was converted into the 5 categories of land as defined by Reference [2] through a conversion table.Then, all the points were updated and validated by means of the interpretation of the Sentinel-2 mosaic.Numerous studies have highlighted a class imbalance problem [15,16].To obviate this problem and avoid having points located in the boundary of two different land categories, the location of some points were modified.The major modifications occurred within the wetlands, settlements, and grassland.Indeed, we increased significantly the number of wetlands and settlements points and decreased the number of grassland and cropland points to make up a more equal proportion of the classes.The resulting sampling scheme and distribution of each category of land are presented in Figure 3.

National Belgian Grid Dataset
The second reference dataset for the LULUCF classification is the national Belgian grid used in the current inventory.The dataset is composed of 8439 points systematically distributed over Wallonia (Figure 4).This dataset corresponds to the description found in Section 1 with the use of several regional datasets and the interpretation of aerial photography of 0.25 m spatial resolution (© SPW).The first step of the geoprocessing attributes the category forest land based upon the regional forest inventory.The second step assigns the categories cropland and grassland according to SIGEC data (detailed agricultural data filled by each farmer for the purpose of compliance with EU Common Agricultural Policy).Third, sectoral plans are used to cross-check and refine the analysis.Finally, a cross-check with the use of sectoral plans and the buildings reference database (PICC) permits the identification of settlements [17].The use of regional datasets to assign points in LU categories has three main limitations.First, the regional datasets were not created for this purpose and their definitions may thus differ from the LULUCF classifications.Second, the datasets may not coincide in time with 2016.These vector layers follow their own cycle of updating that is independent from the LULUCF inventory.Finally, the interpretation of aerial photography is based on a mosaic that spans over several months (from 10 June, 2016, to 1 November, 2016).This induces variations such as light reflectance, vegetation cover, and seasonal changes that may lead to differences of interpretation by the operators.
problem [15,16].To obviate this problem and avoid having points located in the boundary of two different land categories, the location of some points were modified.The major modifications occurred within the wetlands, settlements, and grassland.Indeed, we increased significantly the number of wetlands and settlements points and decreased the number of grassland and cropland points to make up a more equal proportion of the classes.The resulting sampling scheme and distribution of each category of land are presented in Figure 3.

National Belgian Grid Dataset
The second reference dataset for the LULUCF classification is the national Belgian grid used in the current inventory.The dataset is composed of 8439 points systematically distributed over Wallonia (Figure 4).This dataset corresponds to the description found in Section 1 with the use of several regional datasets and the interpretation of aerial photography of 0.25 m spatial resolution (© SPW).The first step of the geoprocessing attributes the category forest land based upon the regional forest inventory.The second step assigns the categories cropland and grassland according to SIGEC data (detailed agricultural data filled by each farmer for the purpose of compliance with EU Common Agricultural Policy).Third, sectoral plans are used to cross-check and refine the analysis.Finally, a cross-check with the use of sectoral plans and the buildings reference database (PICC) permits the identification of settlements [17].The use of regional datasets to assign points in LU categories has three main limitations.First, the regional datasets were not created for this purpose and their definitions may thus differ from the LULUCF classifications.Second, the datasets may not coincide in time with 2016.These vector layers follow their own cycle of updating that is independent from the LULUCF inventory.Finally, the interpretation of aerial photography is based on a mosaic that spans over several months (from 10 June, 2016, to 1 November, 2016).This induces variations such as light reflectance, vegetation cover, and seasonal changes that may lead to differences of interpretation by the operators.

Training Size
Training size can significantly impact classification accuracy [18].Indeed, the classifiers act on the partition of the data or the space featured into classes.Consequently, the ideal training set for a specific classification may vary considerably.In order to evaluate the effect of the training sample size on the performance of the classifier, different configurations of the training sample were implemented (Figure 5).In total, 10 balanced training datasets of different sizes (1%, 2%, 3%, 5%, 10%, 30%, 40%, 60%, 80%, and 100% of the total training data) were implemented.

Training Size
Training size can significantly impact classification accuracy [18].Indeed, the classifiers act on the partition of the data or the space featured into classes.Consequently, the ideal training set for a specific classification may vary considerably.In order to evaluate the effect of the training sample size on the performance of the classifier, different configurations of the training sample were implemented (Figure 5).In total, 10 balanced training datasets of different sizes (1%, 2%, 3%, 5%, 10%, 30%, 40%, 60%, 80%, and 100% of the total training data) were implemented.As a result of Figure 5, the LUCAS database was randomly divided into a training sample of 372 points (30% of the dataset) and a validation dataset of 834 points (70% of the dataset).We maintained the balanced characteristic of both datasets, which means that we preserved an adequate representability of each class on the scene (forest land 22%, cropland 25%, grassland 22%, wetlands 12%, and settlements 19% of the points).

Classification Algorithms
Several studies have compared the performance of different classifiers for LULC classification (e.g., [19,20]).Their results have shown that it is difficult to produce a general statement to advise on the classifier ranking [15].Four popular classifiers-the maximum likelihood classification (MLC), random forest (RF), k-nearest neighbor (KNN), and the minimum distance classification (MDC)-were selected and implemented in SNAP.SNAP presents two major advantages.First, the determination of the tuning parameters for supervised classification consists only of defining the number of training samples and trees (for RF).Second, its open access and user interface permit an easy implementation for a further utilisation by the public authorities in charge of the LULUCF inventory.The four classifiers' performances for LULC classification using Sentinel-2 image data were compared in order to achieve the highest accuracy.
The best overall accuracy ((OA) = 91.1% and 76% with the LUCAS validation dataset and the Belgian grid, respectively) was achieved with the MLC algorithm (Figure 6).The overall accuracy suggested that the MLC algorithm was more suitable for this specific objective.The confidence interval (error tolerance) at a confidence level of 95% [21] based on the equation of Reference [22] is shown in Table 2.As a result of Figure 5, the LUCAS database was randomly divided into a training sample of 372 points (30% of the dataset) and a validation dataset of 834 points (70% of the dataset).We maintained the balanced characteristic of both datasets, which means that we preserved an adequate representability of each class on the scene (forest land 22%, cropland 25%, grassland 22%, wetlands 12%, and settlements 19% of the points).

Classification Algorithms
Several studies have compared the performance of different classifiers for LULC classification (e.g., [19,20]).Their results have shown that it is difficult to produce a general statement to advise on the classifier ranking [15].Four popular classifiers-the maximum likelihood classification (MLC), random forest (RF), k-nearest neighbor (KNN), and the minimum distance classification (MDC)-were selected and implemented in SNAP.SNAP presents two major advantages.First, the determination of the tuning parameters for supervised classification consists only of defining the number of training samples and trees (for RF).Second, its open access and user interface permit an easy implementation for a further utilisation by the public authorities in charge of the LULUCF inventory.The four classifiers' performances for LULC classification using Sentinel-2 image data were compared in order to achieve the highest accuracy.
The best overall accuracy ((OA) = 91.1% and 76% with the LUCAS validation dataset and the Belgian grid, respectively) was achieved with the MLC algorithm (Figure 6).The overall accuracy suggested that the MLC algorithm was more suitable for this specific objective.The confidence interval (error tolerance) at a confidence level of 95% [21] based on the equation of Reference [22] is shown in Table 2.

Accuracy Assessment
The quality of the map produced was evaluated quantitatively.Two accuracy assessments were performed.First, the OA was chosen because it is widely used, easy to interpret, and has a high practical value [16].Second, an assessment of the LULC statistics was carried out using the National Inventory Report of 2016 and the LULUCF classification.The LULC statistics were the main information used for producing the LU conversion matrix (kha) of Belgium from different time periods needed for each inventory.Finally, a visual check of all the resulting products was carried out.The different classifications were empirically compared to each other and with the three Sentinel-2 mosaics.

Accuracy Assessment
The quality of the map produced was evaluated quantitatively.Two accuracy assessments were performed.First, the OA was chosen because it is widely used, easy to interpret, and has a high practical value [16].Second, an assessment of the LULC statistics was carried out using the National Inventory Report of 2016 and the LULUCF classification.The LULC statistics were the main information used for producing the LU conversion matrix (kha) of Belgium from different time periods needed for each inventory.Finally, a visual check of all the resulting products was carried out.The different classifications were empirically compared to each other and with the three Sentinel-2 mosaics.

LUCAS Validation Dataset
Winter-Spring-Summer ( Another way to analyze the results was to compare the land areas from the GHG Inventory Report of 2016 and the LULUCF classification with Sentinel-2 imagery.According to Table 6, all the classes except wetlands were within the same range of surface area.To foster the discussion, the statistics of land areas from the Corine Land Cover 2012 were added to the table.

Discussion
Numerous challenges are encountered when deriving information on LULC through the use of remote sensing.Among them, one may mention the elaboration of the training sample, the choice of the best classifier, the selection of the bands, the multitemporal analysis, and the evaluation of map accuracy.
Two pieces of interesting information could be retrieved from an investigation of the classifiers' performance under a different sampling size.First, MLC and RF performed similarly.The difference in their performance was not statistically significant (less than 5%) [16].However, the highest accuracy was achieved with MLC in relation to both reference datasets (LUCAS and the National Belgian grid).Second, although a large training dataset was desirable, the maximum OA was reached with only 30% of the LUCAS dataset for all the classification algorithms.
Estimating the accuracy of a LULC classification is an important issue.Only a few existing databases are available for validation, and these data, although called "reference data", may be imperfect and could have a significant impact on the accuracy assessment [23].The first reference dataset, the Belgian inventory grid, suffered from synchronicity problems, which have been pointed out as a critical aspect of the accuracy assessment [24].Indeed, the last updating of the Belgian inventory grid was carried out in 2014 and was compared here with Sentinel-2 imagery from 2016.

Discussion
Numerous challenges are encountered when deriving information on LULC through the use of remote sensing.Among them, one may mention the elaboration of the training sample, the choice of the best classifier, the selection of the bands, the multitemporal analysis, and the evaluation of map accuracy.
Two pieces of interesting information could be retrieved from an investigation of the classifiers' performance under a different sampling size.First, MLC and RF performed similarly.The difference in their performance was not statistically significant (less than 5%) [16].However, the highest accuracy was achieved with MLC in relation to both reference datasets (LUCAS and the National Belgian grid).Second, although a large training dataset was desirable, the maximum OA was reached with only 30% of the LUCAS dataset for all the classification algorithms.
Estimating the accuracy of a LULC classification is an important issue.Only a few existing databases are available for validation, and these data, although called "reference data", may be imperfect and could have a significant impact on the accuracy assessment [23].The first reference dataset, the Belgian inventory grid, suffered from synchronicity problems, which have been pointed out as a critical aspect of the accuracy assessment [24].Indeed, the last updating of the Belgian inventory grid was carried out in 2014 and was compared here with Sentinel-2 imagery from 2016.However, this validation dataset enabled us to define the correct size of the sampling data, the classification algorithm, and the land area statistics.In contrast, the LUCAS validation dataset was based on Sentinel-2 imagery with 10 m spatial resolution and not on the aerial photography of 0.25 m.Thus, we observed errors related to the boundaries of small features, such as rivers less than 10 m wide, in the Belgian grid validation.The LUCAS validation dataset was mainly used to validate the classification accuracy due to its synchronicity with the scene.
To achieve the best overall accuracy, we tested three different seasons (winter, spring, and summer 2016) for single-date and multitemporal classifications.We also investigated the rich spectral information of Sentinel-2 data and their 10 bands with the use of two different combinations of spectral bands: 4 spectral bands at 10 m spatial resolution (B2, B3, B4, and B8) and the 10 spectral bands (4 bands at 10 m and 6 bands at 20 m, resampled at 10 m).
According to Tables 4 and 5, the summer season was the best period to carry out a single-date classification.The spring season had the second-best OA and the winter season yielded the worst OA.The winter season classification showed numerous confusions between settlements and croplands and between forest land and wetlands.The addition of the 20 m spectral bands resampled at 10 m enabled us to increase significantly the OA of single-date classification.
However, when using a time series of Sentinel-2 images, the results were more contrasted.According to Table 6, the use of the complete spectral information had a detrimental impact on the OA.The resampled bands had an impact on small linear features such as roads and rivers less than 10 m wide, as well as on the settlements category.We observed a substantial increase in the settlements category due to the generalization of the spectral information generated by the 20 m spatial resolution.Several authors [25,26] have also noticed these transformations along the borders between map categories: They lead to the reduction or amplification of the LULC categories.
The results indicated that the classifications of multidate images yielded slightly better results (Table 5) than single-date classification (Tables 3 and 4).We observed an increase of the OA of 1.1% between tritemporal (winter-spring-summer, 4 bands) and bitemporal classification (spring-summer, 4 bands).This gap was even smaller between single-date and bitemporal classifications (0.1%).Clearly, the running time of a single-date classification was faster than a multidate classification, bringing into question the necessity of using such a large amount of data for carrying out the LULUCF classification.
The per-pixel classification constituted a limitation for the LULUCF classification.It used spectral information of single pixels and ignored spatial information of groups of pixels called objects.Consequently, the OA assessment based on specific points of the Belgian grid may have corresponded to isolated pixels that had a different response than their surroundings, thus producing a lower OA.In this study, we used per-pixel classification because object-based classification requires a pixel size ranging from 0.25 to 5 m in order to be able to identify individual structures [27].The object-based approach was not appropriate with the 10 m spatial resolution of Sentinel-2 imagery and the mapping objectives.
The comparison between land areas from the current inventory and the LULUCF classification provided interesting insights.First, despite the variance of the OA results, land areas were consistent in both results except for the wetlands categories.The Corine Land Cover 2012 was used here as an alternative reference dataset for investigating the wetlands area.The high value of the wetlands area in the Belgian grid could be explained by the nonexplicit representation of the land in the Belgian grid.Eighty points of the grid corresponded to wetlands and, according to this methodology, each point represented 200 ha, leading to a total area of 16,000 ha.In contrast, the Sentinel-2 classification had Land 2018, 7, 154 14 of 16 10 m spatial resolution and did not include peat bogs and swamps, which could explain the lower land area values of this category.Therefore, we considered that the wetlands area of the Belgian grid was overestimated, whereas the wetlands area of the classification was underestimated.
Finally, the total runtime of this methodology, including the elaboration of the LUCAS training and validation dataset, the supervised classification with MLC, and the accuracy assessment, took approximately a week, whereas the manual interpretation of the aerial photography spanned over two months.This study therefore constitutes a practical and operational way for inventory compilers to produce a cost-effective and accurate high-resolution LULC map.

Conclusions
This article presented a prototype for a fully automatic production of a LULUCF map using Sentinel-2 data based on per-pixel supervised classification and using existing databases such as the LUCAS and the Belgian inventory grid for LULUCF.The originality of this approach was ensured by the use of open-source, freely available data with global coverage, making the application of this methodology at the European level possible.Moreover, Copernicus data and the LUCAS are virtually free tools for inventory compilers.However, remote sensing is not commonly used to represent land areas.Indeed, most countries still rely on a methodology based on the National Forest Inventory in LULUCF reporting.
Further research will involve a more complex time-series analysis of Sentinel-2 and a combination of Sentinel-1 and Sentinel-2.This will enable us to benefit from complementary information from both sensors and the availability of data from all weather conditions.We also plan to use very high-resolution data such as aerial photography and normalized digital surface model data to better comply with the Belgian definition of land.
provides a general flowchart of the methodology used in this study.Land 2018, 7, x FOR PEER REVIEW 3 of 15 implemented in the Sentinel Application Platform (SNAP) toolbox to run the experiments.The classification was trained using the LUCAS 2015 database (© Eurostat) (see Section 2.2).The training samples were randomly divided into 10 balanced datasets with the corresponding sizes of 1%, 2%, 3%, 5%, 10%, 30%, 40%, 60%, 80%, and 100% of the total training data to assess the effect of the training sample size on the resulting classification (see Section 3.1).Then, the quality of the results was assessed using two sets of references data, the Belgian inventory grid (as described previously (see Sections 1 and 2.3)) and a part of the LUCAS database which was divided randomly between training and validation datasets.Finally, the classification was validated in a confusion matrix and also through the comparison of LULC statistics from the LULUCF map produced and the land areas derived from the 2016 Belgian inventory and the Corine Land Cover (see Section 3.3).Figure1provides a general flowchart of the methodology used in this study.

Figure 1 .
Figure 1.Workflow of the Land Use, Land Use Change, and Forestry (LULUCF) classifications with Sentinel-2 imagery.

Figure 1 .
Figure 1.Workflow of the Land Use, Land Use Change, and Forestry (LULUCF) classifications with Sentinel-2 imagery.

Figure 2 .
Figure 2. Tiles arrangement allowing the realization of the three mosaics of Sentinel-2 in Wallonia, Belgium.Eight tiles (T31UDS, T31UES, T31UFS, T31UGS, T31UER, T31UFS, T31UGR, and T31UFQ) were used to produce three mosaics.The data spanned from December 2016 (winter mosaic), March to May 2016 (spring mosaic), and July to September 2016 (summer mosaic).The mosaics were cloud-free and characterized by a uniform leaf-on state of the trees.

Figure 2 .
Figure 2. Tiles arrangement allowing the realization of the three mosaics of Sentinel-2 in Wallonia, Belgium.Eight tiles (T31UDS, T31UES, T31UFS, T31UGS, T31UER, T31UFS, T31UGR, and T31UFQ) were used to produce three mosaics.The data spanned from December 2016 (winter mosaic), March to May 2016 (spring mosaic), and July to September 2016 (summer mosaic).The mosaics were cloud-free and characterized by a uniform leaf-on state of the trees.

Figure 3 .
Figure 3. Distribution of the training data in Wallonia, Belgium.(a) The original Land Use/Cover Area frame statistical Survey (LUCAS) points were composed of 1206 points distributed all around Wallonia (forest land = 324 points; cropland = 339 points; grassland = 418 points; wetlands = 15 points; settlements = 110 points).To avoid a class imbalance problem, the location of some points was modified.(b) After the modification, the distribution of the points was: forest land = 270 points; cropland = 296 points; grassland = 268 points; wetlands = 146 points; settlements = 226 points.(c) and (d) show a pie chart of the classes distribution for the original and modified LUCAS points.

Figure 3 .
Figure 3. Distribution of the training data in Wallonia, Belgium.(a) The original Land Use/Cover Area frame statistical Survey (LUCAS) points were composed of 1206 points distributed all around Wallonia (forest land = 324 points; cropland = 339 points; grassland = 418 points; wetlands = 15 points; settlements = 110 points).To avoid a class imbalance problem, the location of some points was modified.(b) After the modification, the distribution of the points was: forest land = 270 points; cropland = 296 points; grassland = 268 points; wetlands = 146 points; settlements = 226 points.(c,d) show a pie chart of the classes distribution for the original and modified LUCAS points.

Figure 4 .
Figure 4. Distribution of the national Belgian grid in Wallonia, Belgium.(a) Explanation of the sampling design; (b) Distribution of the points.It is composed of 8439 points distributed all around Wallonia (forest land = 2782 points; cropland = 2282 points; grassland = 2002 points; wetlands = 80 points; settlements = 1293 points); (c) Pie chart of the classes distribution.

Figure 4 .
Figure 4. Distribution of the national Belgian grid in Wallonia, Belgium.(a) Explanation of the sampling design; (b) Distribution of the points.It is composed of 8439 points distributed all around Wallonia (forest land = 2782 points; cropland = 2282 points; grassland = 2002 points; wetlands = 80 points; settlements = 1293 points); (c) Pie chart of the classes distribution.

Land 2018, 7 , 15 Figure 5 .
Figure 5.The performance of the different sample sizes of the maximum likelihood classification with the Belgian grid as reference data.The graph shows that when the training size was large enough (more than 100 samples), the classification accuracy only slightly increased.At 372 samples (30% of the total training data), the overall accuracy (OA) reached a maximum.After that, it slightly decreased and finally increased slowly to reach the same OA.

Figure 5 .
Figure 5.The performance of the different sample sizes of the maximum likelihood classification with the Belgian grid as reference data.The graph shows that when the training size was large enough (more than 100 samples), the classification accuracy only slightly increased.At 372 samples (30% of the total training data), the overall accuracy (OA) reached a maximum.After that, it slightly decreased and finally increased slowly to reach the same OA.
(a) Winter classification; (b) spring classification; (c) summer classification.The reference data used here were the LUCAS validation dataset with 834 points.The summer classification yielded the best OA with 88.7%, followed by the spring classification (83.2%) and winter classification (73.9%).
(a) Winter-spring-summer classification with 4 bands; (b) spring-summer classification with 4 bands; (c) winter-spring-summer classification with 10 bands; (d) spring-summer classification with 10 bands.The reference data used here were the LUCAS validation dataset with 834 points.The winter-spring-summer classification with 4 bands yielded the best OA with 92.6%.It was followed by the spring-summer classification with 4 bands (91.5%) and the spring-summer classification with 10 bands (90.2%).The winter-spring-summer classification with 10 bands yielded the worst OA (86.8%).

Figure 7 15 Figure 7 .
Figure 7 presents the best supervised per-pixel classification with a maximum likelihood classification algorithm using Sentinel-2 images from winter, spring, and summer 2016.The ArcGIS 10.5 (Analysis tools) toolbox was used to lay out the map and carry out the accuracy assessment because of its widespread use by the Walloon authorities.Land 2018, 7, x FOR PEER REVIEW 12 of 15

Figure 7 .
Figure 7. Winter-spring-summer classification with four bands (B2, B3, B4, B8a) of Wallonia, Belgium.This classification was produced with Sentinel-2 data from the three-season mosaics (winter, spring, and summer 2016).The maximum likelihood classification algorithm was used with a training dataset of 372 samples (30% of the dataset).This classification reached an OA of 92.6%.

Table 1 .
The dates of the images used to produce the three mosaics (winter, spring, and summer 2016).

Table 3 .
The three-season confusion matrixes with four spectral bands (B2, B3, B4, and B8).(a) Winter classification; (b) spring classification; (c) summer classification.The reference data used here were the LUCAS validation dataset with 834 points.The summer classification yielded the best OA with 88.7%, followed by the spring classification (83.2%) and winter classification (73.9%).

Table 6 .
Comparison of land areas in kha from the three best classification products (winter-springsummer with 10 bands, spring-summer with 4 bands, and summer with 10 bands) to the Greenhouse Gas (GHG) Inventory Report and the Corine Land Cover 2012.(a)The winter-spring-summer classification with 4 bands (best OA reached, 92.6%);(b) the spring-summer classification with 4 bands (second-best OA, 91.5%); (c) the summer classification with 10 bands (third-best OA, 91.4%).All the land areas were within the same range except the wetlands, which varied from 2.86 kha (winter-spring-summer with 10 bands) to 16.00 kha (GHG Inventory Report of 2016).