First 1-M Resolution Land Cover Map Labeling the Overlap in the 3rd Dimension: The 2018 Map for Wallonia

: Land cover maps contribute to a large diversity of geospatial applications, including but not limited to land management, hydrology, land use planning, climate modeling and biodiversity monitoring. In densely populated and highly fragmented landscapes as observed in the Walloon region (Belgium), very high spatial resolution is required to depict all the infrastructures, buildings and most of the structural elements of the semi-natural landscapes (like hedges and small water bodies). Because of the resolution, the vertical dimension needs explicit handling to avoid discontinuities incompatible with many applications. For example, how to map a river ﬂowing under a bridge? The particularity of our data is to provide a two-digit land cover code to label all the overlapping items. The identiﬁcation of all the overlaps resulted from the combination of remote sensing image analysis and decision rules involving ancillary data. The ﬁnal product is therefore semantically precise and accurate in terms of land cover description thanks to the addition of 24 classes on top of the 11 pure land cover classes. The quality of the map has been assessed using a state-of-the-art validation scheme. Its overall accuracy is as high as 91.5%, with an average producer’s accuracy of 86% and an average user’s accuracy of 91%. Dataset: The dataset can be visualized and downloaded from the following web //


Summary
The data was produced in the frame of the WALOUS project, which aimed at describing separately the land use and the land cover of the Walloon region. This paper is about the pure land cover component, which resulted from remote sensing image analysis that fused object-based and pixel-based approaches.
pixel-based approaches. The relevance of the land cover (LC) map is defined with regards to its purpose and later on to its effective use by a diversity of users. This map was therefore designed hand-in-hand with the user community [1]. Because of the diversity of users interested in the future LC map, the final specifications were adopted based on an analysis of the respective priorities. At the end of the design process, a consensus was agreed on with respect to the characteristics of the map: the LC typology, the minimum overall accuracy, the spatial resolution, the update frequency and the minimum mapping unit (MMU). Because of the diversity of users interested in the future LC map, these characteristics were decided based on an analysis of their priorities. At the end of the design process, a consensus was agreed on with respect to the characteristics of the map: a spatial resolution of 1 m 2 , an MMU of 15 m 2 (adapted according to some classes), a frequency of update between 3 and 5 years, and an overall accuracy of the product of about 85%. In terms of typology, additional information was requested in case of overlap between two LC classes. These particularities make the described LC map unique in its kind.

Data Format
The data are from a single band raster image snapped on the INSPIRE reference grid of the Walloon region ( Figure 1) at 1 m resolution. The total area mapped was 16,902 km 2 . It is stored in GeoTIFF format with georeferencing information (Belgian Lambert 2008 projection). It can therefore be directly opened in any geographic information system or image analysis software. The land cover classes are stored as integer values (bytes), which are described in Section 2.1.
The MMU, that is the smallest feature mapped, varies depending on the landscape. It is of 15 m 2 for all features outside of forested areas. Inside forested areas, the MMU is enlarged to 500 m 2 for tree types and bare soil. A companion map delineating the land use based on cadastral parcels [2] was also produced along with the LC map.  The MMU, that is the smallest feature mapped, varies depending on the landscape. It is of 15 m 2 for all features outside of forested areas. Inside forested areas, the MMU is enlarged to 500 m 2 for tree types and bare soil. A companion map delineating the land use based on cadastral parcels [2] was also produced along with the LC map.

Land Cover Legend
Land cover is defined as the physical and biological cover of the land surface, including artificial surfaces, agricultural areas, forests, (semi-)natural areas, wetlands and water bodies. The legend of this data is a pure land cover legend, derived from the INSPIRE Pure Land Cover Components (PLCC-INSPIRE) legend. It includes the following 11 main classes: artificial soil sealing, building, railways, bare soil, water, permanent grassland, alternating grassland and bare soil, broadleaved trees, broadleaved shrubs, coniferous trees, coniferous shrubs. Table 1 provides the numeric code of each main class as well as the percentage that it represents over Wallonia. In addition, 24 classes (resulting from the integration of auxiliary databases) complete the legend for all the overlapping items found in the landscape. These classes inform the users about LC features that are not visible from a bird's-eye view. The code of these classes is a double-digit number with the code of the lower class followed by the code of the upper class (Table 2).

Strengths of the LC Map
Besides its unique resolution, the most innovative characteristic of this LC map compared to most existing LC maps is the information describing all land surfaces, including hidden ones such as those underneath an elevated highway or below a river crossover. Thanks to a curated planimetric reference of roads, buildings, water bodies and railways delineating their exact footprint, double-labeled classes give information about LC items hidden on the orthophotos. For instance, a road passing through a forest is often not fully detected by aerial remote sensing. Where the road strip is masked by the tree crown, the LC map indicates that there is an artificial coating under a tree. It is the same for a stream under a bridge, a grassland area under a viaduct, etc. Figure 2 illustrates some double-labeled classes. This map information provides to the users both land representation either of top or above elements only, or bottom or below elements only, depending on their needs. This original feature directly enables multiple important applications, such as connectivity network assessment, total canopy area for ecological habitat connectivity [3], or imperviousness analysis for runoff mitigation. Furthermore, the primary advantage of double labels is the completeness of the land surface characterization as expected from any LC map. Dealing with these overlaps in the vertical dimension was actually mandatory because of the very high resolution of the map.

Strengths of the LC Map
Besides its unique resolution, the most innovative characteristic of this LC map compared to most existing LC maps is the information describing all land surfaces, including hidden ones such as those underneath an elevated highway or below a river crossover. Thanks to a curated planimetric reference of roads, buildings, water bodies and railways delineating their exact footprint, doublelabeled classes give information about LC items hidden on the orthophotos. For instance, a road passing through a forest is often not fully detected by aerial remote sensing. Where the road strip is masked by the tree crown, the LC map indicates that there is an artificial coating under a tree. It is the same for a stream under a bridge, a grassland area under a viaduct, etc. Figure 2 illustrates some double-labeled classes. This map information provides to the users both land representation either of top or above elements only, or bottom or below elements only, depending on their needs. This original feature directly enables multiple important applications, such as connectivity network assessment, total canopy area for ecological habitat connectivity [3], or imperviousness analysis for runoff mitigation. Furthermore, the primary advantage of double labels is the completeness of the land surface characterization as expected from any LC map. Dealing with these overlaps in the vertical dimension was actually mandatory because of the very high resolution of the map. Another strength of this mapping study is the production of very precise land cover area estimates for each class thanks to this wall-to-wall mapping effort at 1 m 2 and a statistically sound confusion matrix based on solid visual interpretation (convergence of five experienced photointerpreters). This confusion matrix allows correcting the pixel-counting area of each land cover class to provide probably one of the very first unbiased area estimates of land cover based on a wall-towall mapping approach at 1 m resolution over a large administrative region.

Accuracy Assessment
In order to validate the map, a stratified probabilistic sampling plan of 1511 validation points was created. The stratification was based on the pixel-based classification of the aerial imagery of Wallonia 2018 (orthophotos) before post-processing. The strata correspond to each class of the map in order to also have a good precision on the user's and, likely, producer's accuracies. To ensure a conservative accuracy assessment and a complete representativeness of the samples, points were also taken in the shadow class as obtained at this stage from orthophotos classification, where the land cover identification is more challenging. Another strength of this mapping study is the production of very precise land cover area estimates for each class thanks to this wall-to-wall mapping effort at 1 m 2 and a statistically sound confusion matrix based on solid visual interpretation (convergence of five experienced photo-interpreters). Thi confusion matrix allows correcting the pixel-counting area of each land cover class to provide probably one of the very first unbiased area estimates of land cover based on a wall-to-wall mapping approach at 1 m resolution over a large administrative region.

Accuracy Assessment
In order to validate the map, a stratified probabilistic sampling plan of 1511 validation points was created. The stratification was based on the pixel-based classification of the aerial imagery of Wallonia 2018 (orthophotos) before post-processing. The strata correspond to each class of the map in order to also have a good precision on the user's and, likely, producer's accuracies. To ensure a conservative accuracy assessment and a complete representativeness of the samples, points were also Data 2020, 5, 117 5 of 11 taken in the shadow class as obtained at this stage from orthophotos classification, where the land cover identification is more challenging.
Five different experts visually interpreted the full set of points using the original aerial imagery at 25 cm resolution, available in both natural color (blue, green, red) and near-infrared false color. In case of conflicts between two or more experts, additional information was collected until a consensus was found for each point.
Because of the stratified sampling, a weight (w) was assigned to each point depending on the strata (i) in which the point was located. The weight was proportional to the area of the class (S i ) divided by the total area of the map, and inversely proportional to the number of points taken in each class (n), as shown in Equation (1) [4].
These weights (Table 3) are used to remove the sampling bias caused by the stratified sampling. The estimated overall accuracy (OA) of the LC map reaches 91.5%, which is statistically greater than 90% with a confidence level of 95%. Table 4 shows the confusion matrix along with the user's and producer's accuracies. The user's accuracy (UA) is defined as the probability, for each class, that the label of a pixel is correct; the producer's accuracy (PA) is the probability that a field feature is correctly classified on the map. The mean map UA is around 91% and the mean map PA, 86%. All accuracy values are very good except the producer accuracy of the bare soils. Most of these errors are due to confusion with artificial coating or permanent grassland patches with a very low or Data 2020, 5, 117 6 of 11 dry vegetation cover. Mislabeling is also common between types of tree classes: coniferous trees have the second-worst UA and PA.
Based on the confusion matrix, it is possible to estimate the area of each class in the Walloon region (bias-adjusted estimator from [5]). Table 5 puts in parallel the unbiased area estimates with the areas actually occupied in the final version of the map ("mapped area"). This shows that bare soils and artificial buildings are under-represented on the map, while artificial coatings and permanent grassland are over-represented.

Methods
Several data sources are available in Wallonia with each having its own spatial reference and its own LC classification. In order to reach the most complete and up-to-date possible LC map of Wallonia at a very high resolution, elements of each data source were used depending on their relevance and their quality (mostly their currentness, precision and accuracy). The approach proceeded by data fusion to give a unique result targeting the best possible combination. The final product was then post-processed in a semi-automated way to further improve its quality and enrich its legend with double labels.

Data Design
A broad user consultation was organized at the start of the project [1]. The objectives of this consultation were (1) to understand what users would do with the product, (2) to match the user needs with INSPIRE technical recommendations and obligations [6], and (3) to define the users' priorities. On a total of 16 days, the representatives of 23 institutions were individually interviewed with open and closed questions. Among those questions, the respondents received 10 points to be prioritized between temporal resolution, MMU and OA requirements. Users' stories depicting the optimum requirements were compiled from open-ended questions and a discussion. After individual interviews, a plenary meeting helped to reach a consensus with users in case of conflicting needs and to refine the typology of the map. Small adjustments of the data design were also agreed at the semestrial steering committee of the project, based on illustrative examples of the remaining issues.

Input Data
The main imagery source was a 2.5 Tb mosaic of orthorectified aerial images (Figure 3  In addition to the airborne datasets, images from the Sentinel-1 and 2 satellites provided information about temporal dynamic LC features (grassland in rotation, difference between broadleaved and coniferous trees). Sentinel-1 10 m resolution images were acquired every 2 to 6 days by the C-Band SAR instrument, while the Sentinel-2 instrument recorded multispectral reflectance at 10 and 20 m resolution every 3 to 5 days.
Beside the imagery dataset, an LC layer of Wallonia at 2 m resolution was used for the preparation of a training dataset. This product is an outcome of the Lifewatch European Research Infrastructure Consortium that aims to delineate and characterize ecological homogeneous units, ecotopes, throughout Europe and in Belgium using 2015 orthophotos [7].
As part of the WALOUS project, a planimetric reference was built based on vector auxiliary data. It includes roads, railways, rivers and buildings of Wallonia. Both the Service Public de Wallonie (SPW) and the Belgium National Geographic Institute (IGN) provided the geospatial databases that complete the planimetric reference. From the linear ancillary data, the road network was completed and made continuous using toolboxes from the open-source platform QGIS and GRASS GIS. In particular, the linear referencing toolbox from QGIS was used to solve completeness issues in the most precise road dataset.
Finally, external auxiliary data reinforced the training dataset, such as a forest mask created by the Université de Liège for the year 2016, which provides information about clear-cuts and coniferous and broadleaved trees, and the anonymized land parcel identification system data of Wallonia.

Image Classification
This section summarizes the steps of processing data from the aerial and satellite imagery available for 2018. More details are available in [8]. Two complementary approaches yielded conceptually different LC classifications based on the very high-resolution airborne dataset: a pixelbased approach and an object-based approach.
First, a pixel-based classification approach used Orfeo ToolBox (OTB), an open source C++ library for state-of-the-art remote sensing [9], to develop a processing chain using a random forest classifier. Thanks to the metadata given with the orthophotos tiling, large homogeneous zones of similar flight conditions were defined in order to reduce the size of the area to be processed. In addition to the airborne datasets, images from the Sentinel-1 and 2 satellites provided information about temporal dynamic LC features (grassland in rotation, difference between broadleaved and coniferous trees). Sentinel-1 10 m resolution images were acquired every 2 to 6 days by the C-Band SAR instrument, while the Sentinel-2 instrument recorded multispectral reflectance at 10 and 20 m resolution every 3 to 5 days.
Beside the imagery dataset, an LC layer of Wallonia at 2 m resolution was used for the preparation of a training dataset. This product is an outcome of the Lifewatch European Research Infrastructure Consortium that aims to delineate and characterize ecological homogeneous units, ecotopes, throughout Europe and in Belgium using 2015 orthophotos [7].
As part of the WALOUS project, a planimetric reference was built based on vector auxiliary data. It includes roads, railways, rivers and buildings of Wallonia. Both the Service Public de Wallonie (SPW) and the Belgium National Geographic Institute (IGN) provided the geospatial databases that complete the planimetric reference. From the linear ancillary data, the road network was completed and made continuous using toolboxes from the open-source platform QGIS and GRASS GIS. In particular, the linear referencing toolbox from QGIS was used to solve completeness issues in the most precise road dataset.
Finally, external auxiliary data reinforced the training dataset, such as a forest mask created by the Université de Liège for the year 2016, which provides information about clear-cuts and coniferous and broadleaved trees, and the anonymized land parcel identification system data of Wallonia.

Image Classification
This section summarizes the steps of processing data from the aerial and satellite imagery available for 2018. More details are available in [8]. Two complementary approaches yielded conceptually different LC classifications based on the very high-resolution airborne dataset: a pixel-based approach and an object-based approach.
First, a pixel-based classification approach used Orfeo ToolBox (OTB), an open source C++ library for state-of-the-art remote sensing [9], to develop a processing chain using a random forest classifier. Thanks to the metadata given with the orthophotos tiling, large homogeneous zones of similar flight conditions were defined in order to reduce the size of the area to be processed. Command lines for the Data 2020, 5, 117 8 of 11 creation of the dataset and the classifier were managed by zones. The processing was parallelized through a SLURM job scheduling system.
The training dataset was first built on the 2 m reference land cover layer from the Lifewatch product and on the planimetric reference. Potential shadows were predicted based on the DSM and the sun position. The limits of the shadows were then refined using a threshold on the NIR values. Four shadow subclasses were created by crossing the predicted shadowy areas with the reference dataset for permanent grassland, artificial coating, bare soils and grassland in rotation during the year. This reference dataset was then eroded using a multiclass mathematical morphology operator [10].
To reduce the salt and pepper effect inherent in most pixel-based approaches, a mean shift algorithm [11] was applied to the aerial imagery, prior to the classification. This classification was performed by a locally trained random forest algorithm on the smoothed images, with an a priori probability defined by the height class.
In addition to the classification of all land cover classes, a targeted classifier was developed to better delineate artificial buildings. This classifier was a convolutional deep neural network based on the Robosat tool [12]. It was trained with a binary map of buildings derived from high spatial resolution vector data.
The object-based (OBIA) classification approach consists in a complete toolchain using GRASS GIS, a full-fledged geographical information system [13]. All tools in the OBIA approach [14,15] have been specifically designed for very large datasets, allowing parallel computing.
Recent studies have shown that using a constant segmentation parameter across space does not lead to ideal segments, as spatial structures differ significantly between different parts of the territory [16][17][18]. The OBIA approach in this study took this into account by optimizing the segmentation within small tiles, delineated using a module for the creation of semantically useful cutlines inspired by [19]. In order to speed up the region-growing segmentation, superpixels were first created in a rapid run of the SLIC algorithm [20]. After segmentation, diverse statistics (spectral, shape, texture, height, neighbors, x and y coordinates) were gathered for each segment. Using the existing reference vector databases, segments overlapped by a single class were selected for training. Outliers (e.g., segments labeled as artificial buildings that have a tree growing over them in the image) were eliminated through simple tests, mainly based on thresholds on NDVI and height. The dataset was divided into strata according to the date at which the photos were taken. A subsample of the very large set of training segments was then used to train a different random forest model on each of the strata and the resulting models were applied to each strata's tiles. The OBIA approach focuses specifically on the quality of the classification of urban landscapes, for which it is particularly well suited.
A pixel-based classification of time series of coarser resolution (10m) satellite images was used to consolidate the classification of LC classes with a strong phenology change during the year. To better discriminate between broadleaved deciduous and needleleaved evergreen forests, as well as larch stands, which are deciduous needleleaved trees, one long-term average of Sentinel-1 images and 2 seasonal composites of Sentinel-2 images (one composite for February-March in the leaf-off period and another composite for April-June for the leaf-on period) was classified with a random forest algorithm.
The open source Sen2-Agri system [21] then provided information about agricultural areas. It is an operational standalone processing system generating agricultural products from Sentinel-2 (A&B) and Landsat 8 time series during the growing season. For the LC map of Wallonia, Sen2-Agri generated a binary cropland mask map and a cultivated crop-type map for 2018.

Data Fusion
Fusion of data consisted in integrating all the interesting inputs of the LC of Wallonia into a unique result. A random forest algorithm operated this integration into object outputs from the OBIA segmentation. The methodology used GRASS GIS, python and R tools and the processing was parallelized on all the computer nodes available [22]. First, a list of statistics from all the input data was extracted by object. Those features were: the entropy-based analysis of class probabilities [23] from both the per-pixel and the OBIA results, the class proportions within each object from the pixel-based results, the mean DHM values, the modal classes of the Sentinel-based results, the modal classes from binary versions of the ancillary vector datasets, the modal classes of a forest mask as well as the modal classes from the rasterized version of the agricultural land parcel information system. Statistics were then extracted for 3000 reference points labeled by visual interpretation and stratified in 3 strata to cover different types of mismatches between the input classifications. This set of points was used as training for a random forest classifier.
Each segment was classified and the class likelihood was stored in addition to the labels. The mean DHM was then used to distinguish coniferous and broadleaved trees lower than 3 m. Finally, polygons smaller than the MMU (15 m 2 for all classes and 500 m 2 for trees located inside the forest mask polygons) were merged with their neighbors sharing the largest border.

Post-Processing
With the same accuracy assessment methods used for the final land cover product, the accuracy of the fully automated classification and data fusion reached 86.3%. Despite this acceptable overall accuracy, an interactive consolidation of the map was completed using all available information about the land surface in Wallonia.
In the first place, the quality of different databases was controlled to use the most reliable in an automated approach. Roads, buildings, water bodies and railways were completed and consolidated by combining the planimetric reference and other databases of coarser precision but better completeness. Confusions between tree types were corrected by combining classified Sentinel-2 data of several years, the 2019 forest mask of the Walloon Region and the Lifewatch database. Finally, LPIS data from 2017 to 2019 were used to better discriminate between grasslands in rotation during the year from permanent grasslands.
In a second step, a visual consolidation was achieved by experts' photo interpretation. It allowed correcting all types of classes and removal of macroscopic errors in very specific situations. This step focused on confusion between bare soil and artificial coating, and further improved broadleaved and needleleaved forest discrimination.

User Notes
The data described in this paper are from a pure land cover map with a spatial resolution of 1 m, a minimum mapping unit of 100 m 2 in forest and 15 m 2 elsewhere, and double labels in case of vertically overlapping classes. It is provided as a GeoTIFF file supported in any GIS software.
Defining a pure land cover typology means that the land use is not involved in the class definition. For instance, a parcel of grass in our typology potentially belongs to different land uses such as residential (garden), primary production (pasture), nature conservation (wetland), or tertiary services (sport). However, agricultural practices also shape the land cover, and the alternating herbaceous class aims to distinguish these practices from a land cover perspective. This class primarily encompasses annual crops but also, by definition, intensive hay meadows and pastures that were plowed during the inter-season. More information on agricultural classes and all other land use categories is available in the companion land use map [2].
The very high spatial resolution of this map makes it relevant for a land cover typology that neglects mixed pixels. Indeed, even if mixed pixels do occur at any scale, their relative proportion at 1 m 2 is very small [24]. Because most of the sources had a better spatial resolution than the final data, a majority rule has been used to agglomerate these sources at 1 m resolution.
The use of a minimum mapping unit improves the readability and the usability of the map. It is very useful to remove small canopy gaps inside forests, giving some context to the land cover interpretation. It also contributes to remove artefacts, thereby increasing the overall map quality.
However, users are warned that this comes with the cost of a loss of information is some cases, when very small spatial objects are removed from the map.
Last but not least, users are invited to take advantage of the versatility of a land cover map with double labels. This solution was proposed to resolve conflicting user needs, and it is possible because the project could rely on ancillary data in addition to the orthophotos. Double labels enable the completeness of the high resolution land cover map. With double labels, it is possible to maintain the connectivity of the road network, the rail network and the river network. It is also possible to determine when a road hinders animal movement (e.g., forest under viaducts [3]). For statistics, users can select the component of the double label that matches their needs. They can also use both components, but then the total area represented becomes larger than the total area of the territory. Funding: This research was funded by the Walloon Government and supported by the Service Public de Wallonie as part of the WALOUS project, which aimed to develop reproducible methods for the land cover and land use maps of Wallonia.