Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes

Trenčanová, Bianka; Proença, Vânia; Bernardino, Alexandre

doi:10.3390/rs14051262

Open AccessArticle

Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes

by

Bianka Trenčanová

^1,2,*,

Vânia Proença

²

and

Alexandre Bernardino

²

¹

ISR—Institute for Systems and Robotics, Instituto Superior Tecnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal

²

MARETEC—Marine, Environment and Technology Center, Instituto Superior Tecnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(5), 1262; https://doi.org/10.3390/rs14051262

Submission received: 29 December 2021 / Revised: 28 February 2022 / Accepted: 2 March 2022 / Published: 4 March 2022

(This article belongs to the Special Issue Use of Remote Sensing Techniques for Wildlife Habitat Assessment)

Download

Browse Figures

Versions Notes

Abstract

:

In Mediterranean landscapes, the encroachment of pyrophytic shrubs is a driver of more frequent and larger wildfires. The high-resolution mapping of vegetation cover is essential for sustainable land planning and the management for wildfire prevention. Here, we propose methods to simplify and automate the segmentation of shrub cover in high-resolution RGB images acquired by UAVs. The main contribution is a systematic exploration of the best practices to train a convolutional neural network (CNN) with a segmentation network architecture (U-Net) to detect shrubs in heterogeneous landscapes. Several semantic segmentation models were trained and tested in partitions of the provided data with alternative methods of data augmentation, patch cropping, rescaling and hyperparameter tuning (the number of filters, dropout rate and batch size). The most effective practices were data augmentation, patch cropping and rescaling. The developed classification model achieved an average F1 score of 0.72 on three separate test datasets even though it was trained on a relatively small training dataset. This study demonstrates the ability of state-of-the-art CNNs to map fine-grained land cover patterns from RGB remote sensing data. Because model performance is affected by the quality of data and labeling, an optimal selection of pre-processing practices is a requisite to improve the results.

Keywords:

U-Net; convolutional neural network (CNN); shrub detection; heterogeneous land cover mapping; UAV imagery; Mediterranean forest

1. Introduction

Remote sensing is a primary source of data for vegetation mapping, and due to continual developments in geo-information technologies, this field is gradually becoming more universal. The use of remotely sensed satellite imagery is an effective way of acquiring data for various land cover mapping applications [1,2,3,4]. Satellites can map large areas in single acquisitions, but their data suffer from insufficient spatial, spectral and temporal resolutions, which are typically too coarse for some applications. They also suffer from cloud cover contamination and are limited by fixed timing and costly data acquisition [5]. These issues can be mitigated by the low-altitude flight of unmanned aerial vehicles (UAVs). Although originally developed for military purposes, UAVs have become an important commercial tool for monitoring the Earth’s surface, revolutionizing the acquisition of fine-grained data due to their high spatial resolution, low-cost and application versatility. Their other advantages are their flexibility in obtaining data from target areas that are often difficult to reach, the minimization of disturbances of inspected areas, and the provision of real-time data [6]. Therefore, UAVs found their place in various fields, including ecology and the conservation of wildlife [7,8], agriculture and forestry [9], firefighting [10], and disaster zone mapping [11].

1.1. Remote Sensing and Land Cover Mapping

The applications of remote sensing imagery in vegetation assessments are very diverse, including the monitoring of species after fire events [6,12] and the health condition and diseases of trees [13,14]; the mapping of ecosystem structure and function [15], plant communities [16] and individual species [17]; the assessment of biodiversity [7], plant diseases [18]; and many more applications.

Shrubs, the target class in this paper, are low-to-mid-sized woody plants, with perennial stems. From an aerial perspective, they occur in various shapes and sizes and form complex clusters of individuals, which makes the mapping and monitoring of their distribution and growth challenging. Moreover, unless they clearly stand out from their environment, such as in a desert [19], they are difficult to delimit from the surroundings [20,21]. This especially becomes an issue in intricate and diverse forest environments containing different types of vegetation stratified in overlapping patches and vertical layers. The large intra-class variance of the observed patterns and the considerable overlapping of shrubs’ spectral signatures with other vegetation types make their detection harder, even for machine learning models, and often leads to the misclassification of vegetation types [22]. Labeled datasets to inform classification algorithms in this domain are scarce and often require manual labeling by visual inspection of imagery or field surveys. Due to the features of shrub vegetation and of the land cover patterns, this is a time-consuming process and may lead to an insufficient amount of labeled data, and thus the need to artificially increase their volume by heavy data augmentation.

Since large volumes of unlabeled high-resolution data are unfeasible for humans to process, they are often used in combination with artificial neural networks (ANNs). One of the sub-classes of ANNs are convolutional neural networks (CNNs), which allow state-of-the-art deep learning algorithms to handle growing quantities of land observation data in an automated way. Segmentation networks are an emerging group of neural networks that focuses specifically on semantic segmentation tasks, i.e., pixel-level segmentation, which is especially efficient in land cover classification. Semantic segmentation [23] labels each pixel of an image with a corresponding class, and the resulting high-resolution map is typically of the same size as the input image—a so-called dense prediction. It is also able to learn the spatial configuration of labels and class-specific structures [24]. The detection can be either of one specific class [25] or of multiple classes at the same time [26]. The two main challenges of the existing methods are intra-class inconsistency and inter-class indistinction [27].

One of the main research questions in deep image analysis is how to provide pixel-level high-resolution segmentation. There are two approaches for trying to address this problem: (1.) using a dilated (à trous) convolution, e.g., DeepLab [28], and (2.) connecting pooling and un-pooling layers, e.g., DeconvNet, SegNet or U-Net [29]. Among the first networks to focus on semantic segmentation was a fully convolutional network (FCN) [30]. It uses a traditional CNN as a feature extractor but replaces the fully connected layers with up-convolutions, producing spatial feature maps, instead of classification scores, that are further up-sampled to a dense pixel-wise output. An improvement of the FCN is the already mentioned SegNet [31], which consists of an encoder part to extract spatial features and a decoder part to up-sample the feature maps. Similar to FCN and SegNet is the fully convolutional semantic segmentation network, U-Net [32], which will be discussed further in the next section. SegNet and U-Net can densely label every pixel at the original resolution of the image due to their down-sample – up-sample architecture. High-level representations are learnt via convolutions and then up-sampled back to the original resolution via deconvolution. These nets are computationally efficient and able to learn spatial dependencies among classes. Their drawback is a low geometric accuracy [33].

The research on how state-of-the-art classification tools perform in complex land cover mapping tasks is scarce [34], even more so when it comes to shrubs. Shrub detection with CNNs has been addressed for Ziziphus lotus in Cabo de Gata-Níjar Natural Park (Almería, Spain) [19]. In this study, the vegetation was sparse, with shrub cover contrasting with bare soil. After combining GoogleLeNet with data augmentation, transfer learning (fine tuning) and pre-processing, an F1 score of 97% was achieved. The pre-processing techniques that improved the detection performance the most were background elimination and long-edge detection. Random flipping, scaling, cropping, and brightness were used for data augmentation. However, in most landscapes, shrubs are a very general and heterogeneous group of vegetation types with individuals of variable shapes, sizes, and distribution patterns, forming irregular and complex clusters of individuals [19]. High intra-class and low inter-class variance is a challenge when mapping shrub cover, causing difficulties in distinguishing them from their surroundings [35] or other vegetation classes. M. Mahdianpari et al. [34] used multispectral data, containing more complementary information, as a way to alleviate the problem of classification of spectrally similar vegetation types. They also found InceptionResNetV2 as the most efficient state-of-the-art CNN (compared to DenseNet121, InceptionV3, VGG16, VGG19, Xception and ResNet50) for classifying complex multispectral remote sensing wetlands scenes (F1 score of 93%). In their pursuit of maximizing the distinction between the target vegetation type (weeds) and the surroundings, C. Hung et al. [35] argued for the use of imagery from different seasons to take advantage of phenological dynamics and seasonal changes in the vegetation appearance, as well as performing the survey at lower flight altitudes (below 100 m [36]) or using higher resolution sensors to obtain more detail.

In [37], the authors used U-Net to differentiate between various forested classes using satellite imagery. They showed that the classification system could be improved by using a combination of multispectral and synthetic aperture radar imagery, rather than using only one type of data. They achieved an F1 score of 86% on an old-growth forest and 62% on an old-growth plantations, even though F1 scores for the secondary forest and young plantations were only 45% and 11%, respectively. This indicates that classification results can differ significantly, even between classes with similar species and patterns but with individuals at different stages of their life cycle. This can be a challenge in mixed landscapes such as the one discussed in our paper. The authors of the study also suggested that the classification results could be improved by using better spatial resolution imagery. Another study [38] employed U-Net to recognize poplar and coniferous trees from RGB satellite images. Classifying species to either one of the groups was successful, with a mean accuracy up to 96%; however, the network was not able to separate species within the same group. U-Net was also applied in [39] to detect the presence or absence of trees and large shrubs in Australia. Using multispectral band satellite imagery with a pixel size of 3.2 m and a panchromatic band with a pixel size of 80 cm, the overall classification accuracy, as well as precision and recall, were reported to be around 90%. The trees and shrubs were classified jointly, in sparse shrubland areas, open savanna woodlands and rangelands, compared to which our studied environment showed a higher level of complexity.

1.2. Case Study: Fire Prone Mediterranean Landscapes

In our study, we intended to develop a set of processing techniques for classifying shrubs—a key structural component of interest in a fire-prone Mediterranean region. Fire and herbivory are two important sources of ecological disturbance that shape landscapes and species communities in the Mediterranean basin [39]. Today, natural disturbance regimes were replaced by modified regimes driven by human land use and anthropogenic disturbances [40]. Notably, the decline in farming and grazing in marginal farmland areas is a major trend affecting Mediterranean landscapes, which has been followed either by afforestation or land abandonment, both resulting in a cessation of moderate disturbance regimes, an accumulation of fuel loads and increased fire risk [41,42]. In the case of land abandonment, natural regeneration is often associated with the establishment and expansion of shrub species in abandoned fields [39,41]. In the Mediterranean basin, many shrub species are fire-tolerant, with traits that, under the dry and warm weather conditions typical of Mediterranean areas, favor recurrent fires, and ultimately, the dominance and long-term persistence of shrubs. Consequently, ecosystem recovery through secondary succession is halted, as well as the natural reestablishment of native forests [43].

This paper reports results on a case study farm, Quinta da França (Figure 1), which integrates agricultural and forest land uses. The farm’s management is guided by sustainability principles and focuses on promoting environmental services provided by agroforestry activities and sustainable forest management. Notably, the farm’s forest area experienced its last major fire event in 1996 and is now managed for carbon storage and sequestration, with an estimated amount of 7000 tons of CO₂/year. Its management is focused on the reduction in fire risk, increase in carbon sequestration, and biodiversity conservation. Vegetation cover and its level of development are heterogenous. Tree cover is dominated by a deciduous native species, Pyrenean oak (Quercus pyrenaica), with different-aged trees, including mature trees and patches of regenerating trees. This species is able to regenerate vegetatively after fire; mature stands are characterized by a low level of flammability, while regenerating stands are associated with a high level of flammability [44]. The understory is dominated by perennial broom shrubs (Cytisus striatus, C. multiflorus), which are characterized by an ability to resprout after fire and by fire-stimulated seed germination [45,46]. The structural complexity of the landscape, composed of regenerating tree patches and a pyrophytic shrub layer, increases fire proneness and the vulnerability to fire spread, requiring regular shrub control. The use of livestock for biomass regulation is now being implemented through targeted grazing.

1.3. Objectives

The general goal of our work was to develop a method for high-resolution land cover mapping applicable to the case study’s forest area, with a focus on fire-prone shrub vegetation. Unlike other studies, e.g., [17], where the target vegetation contrasts with bare soil or the ground cover, here shrub patches have irregular shapes and are interspersed with other types of cover, including herbaceous patches and rock outcrops. A dedicated automatic classifier, able to differentiate shrub cover from multiple cover types at small spatial scales, is needed for this type of environment. Maps of vegetation cover will ultimately serve as a foundation for better-informed landscape planning and grazing management and for the research of innovative ways to integrate livestock productions, as well as biodiversity conservation and fire prevention in the fire-prone landscapes of Mediterranean regions.

The main objectives and contributions of this paper are: (i) the creation of pixel-based labeled datasets for the training, validation and testing of machine learning models for the classification of fire-prone vegetation type (shrubs) from natural color UAV images; (ii) a systematic analysis of the best training practices for increasing the accuracy of a state-of-the-art CNN (U-net) to automatically segment the key vegetation type in images; (iii) the evaluation of feasibility and the performance of semantic segmentation of shrub cover in a complex heterogeneous landscape.

2. Materials and Methods

2.1. Study Area

Quinta da França (Figure 1) is located in Covilhã, Portugal. The climate is Mediterranean with warm and dry summers, and most precipitation occurs from October to May. Summer is a critical season regarding the risk of forest fires, with the average temperature reaching 22.2 °C and only 10 mm of rainfall in August (https://en.climate-data.org/europe/portugal/covilha/covilha-6944/, accessed on 2 March 2022).

The area of interest is an oak forest (Figure 1b) of about 200 ha. This area includes, as of June 2018, a grazing parcel of about 100 ha, where the use of cattle is being tested as a nature-based solution for biomass regulation through grazing and trampling. The re-introduction of herbivores into fire-prone regions is being promoted as an environmentally sustainable and cost-effective tool for wildfire prevention [47]. However, such interventions also imply trade-offs and require thorough land planning and regular monitoring for which a detailed land cover mapping is essential.

2.2. Data Description

This study uses a set of images acquired by hexacopter with two cameras: VIS GITUP2 camera (Shenzhen, China) with RGB filter (370—680 nm) and a 170° lens (fisheye), and a NIR Mapir Survey2 NDVI camera (San Diego, California) (Red: 660 nm, NIR: 850 nm) with a 16 MP (4608 × 3456 px) sensor Sony Exmor IMX206 (Bayer RGB) (Tokyo, Japan) and 90° lens. The experiments presented in this paper use the RGB images acquired by the VIS GITUP2 camera. Some of the drawbacks of these images were the use of fisheye lenses and motion blur, which caused distortion and made the annotation more challenging, especially in peripheral areas of the images. In this paper, we exploit only the information contained in the RGB images, since this makes the presented method more convenient for use in combination with most aerial imaging systems, including off-the-shelf UAVs.

The flight altitude relative to the take-off point was 120 m, velocity 5 m/s and photos were taken every 5 s. The resolution is about 6.25 cm/1 px. The images were provided to us by landowners and were originally acquired for purposes other than this study. Their primary objective was to cover a full forest site (about 200 ha), which came at the expense of lower level of detail in the images. The UAV was assembled by the company Terraprima—Serviços Ambientais, Sociedade Unipessoal, Lda (Samora Correia, Portugal. https://www.terraprima.pt/pt, accessed on 2 March 2022).

We had access to three image sets, one acquired in August 2019, the other in December 2019, and another in August 2020. Training data were selected from the August 2019 image set. Test data were selected from all image sets, which allowed us to verify the generalization ability of the developed models for data from the different years and different seasons.

The images from all image sets were converted into PNG format and sliced into smaller square-shaped tiles with dimensions (800 × 800) px, corresponding to approximately (50 × 50) m patches of land. The tile size was chosen based on the size of the objects of interest and the amount of context. Selected tiles were then labeled at pixel level for the objects of interest. Four land cover classes were identified: shrubs, trees, shadows, and rocks.

For training, we selected a partition of 13 tiles from one of the original RGB TIFF images of 4608 × 3456 px taken in August 2019. We chose the tiles we considered as the most representative in terms of all different land cover configurations present in the image. We denote this set the Training Partition. For testing, we defined the following Test Partitions:

One (800 × 800) px tile derived from the same image from which training tiles were taken, which was not previously used for training;
Two (800 × 800) px tiles derived from two other images that were taken during the same flight;
Two (800 × 800) px tiles derived from one image that was taken during the same season but in a following year (August 2020—summer season);
Two (800 × 800) px tiles derived from one image that was taken during a different season (December 2019—winter season).

The reasoning for this approach was to test the performance of the trained models on highly similar data (1 and 2), on seasonally similar data (3) and on highly distinct data, taken during different phenological stage (4).

Land cover classification requires a fine-grained understanding of an image and its context, meaning that dense pixel-level annotations, such as semantic or instance segmentation, were needed. While the former labels each pixel with a corresponding class, the latter also classifies each instance of a class separately. For the purposes of this paper, semantic segmentation was sufficient. Table 1 shows the pixel share of the four classes in the training partition. It can be observed that the data are unbalanced, i.e., the classes are uneven, which is representative of this type of landscape.

Labelbox (https://labelbox.com/, accessed on 2 March 2022) application was used for labeling. Segmentations approximated by superpixels were used to facilitate the annotation process, rather than selecting individual pixels. The Superpixel tool of Labelbox calculates segment clusters of pixels with similar color, which leads to more efficient annotation than using manual labeling, especially when it comes to objects with complex boundaries. The only parameter that was adjusted was segment cluster size. The smallest setting (‘XS’) was chosen not only due to the level of complexity of the objects’ boundaries, but also due to the low inter-class variance of the target vegetation class, where more conservative calculations of the pixel color had to be applied. The drawback, in comparison to manual labeling, is that the algorithm can find patterns that are not relevant to the specific task or setting, leading to under- or over-segmentation. For such cases, there is an option of additional manual editing using the Eraser and Pen tools, which are synonyms for manual labeling. These tools were also used during the labeling phase. We assigned pixels to classes through a visual inspection, based on our knowledge and experience with the Mediterranean vegetation, as well as our in situ acquaintance of the farm’s vegetation and its distribution.

The final product of the process was a set of hand-crafted dense pixel-level semantic segmentation maps, where each pixel was assigned a label of a corresponding class (Figure 2). Pixel-based classification maps accurately capture the geometry of an image, such as corners and fine elements, but can face issues such as noise or an incorrect characterization of context-dependent classes [33]. The labeling process was challenging because the boundaries between shrubs and other vegetation types were often indistinguishable. Additionally, there may be some incoherent labeling of shadows that coexisted with other classes.

2.3. Model

A state-of-the-art U-Net model (https://github.com/hlamba28/UNET-TGS, accessed on 2 March 2022), named TGS U-Net, was used as a basis for the work. The model builds on the original U-Net architecture (Figure 3), that extracts features with convolutional layers in the encoding part and restores the original size of the image in the decoding part. The TGS U-Net uses the input image size (128 × 128 × 3) and gradually reduces its dimensions, while increasing the number of channels (from 128 × 128 × 3 to 8 × 8 × 256), and then gradually increases its dimensions and decreases the depth (from 8 × 8 × 256 to 128 × 128 × 1).

The main building block of the TGS U-Net consists of two consecutive 2D convolutional layers with batch normalization and ReLU. Batch normalization was used to improve the training. The number of filters starts at 16 and is doubled at every convolution step. There are four such blocks in the encoder side, each followed by a max pooling layer, which halves the image dimensions, and a dropout layer. The fifth convolutional block forms a bottleneck with the maximum depth and minimum spatial dimensions, after which comes the decoder side, with four symmetrical deconvolution layers concatenated with the feature maps from the encoder side. Afterwards, comes a dropout layer and the convolutional block, which helps the model to assemble a more precise output. The number of filters is halved at each step, while the resolution is doubled. Ultimately, the output of a binary classification is sigmoid, which assigns each pixel a probability of belonging to the target class.

In this work, we kept most of the architecture of the original TGS U-Net but experimented with the effect of different input sizes and the number of filters in the quality of the segmentation in our scenario.

2.4. Model Training

We trained different models and systematically evaluated the effect of training parameters in performance. One of the most important parameters was the network input size. We tried the following choices: 128 × 128 px, 144 × 144 px, 192 × 192 px, 240 × 240 px, 288 × 288 px, 400 × 400 px, and 496 × 496 px. This required the adaptation of the input tiles (of size 800 × 800) to be processed by the network. First, we cropped the tiles in patches of smaller sizes, and then we resized these patches to the network input size. This may induce a scaling of the resolution that impacts the segmentation quality. In the experiments we evaluated this impact by creating several sets with different patch sizes: S1 with 832 (100 × 100) px patches, S2 with 208 (200 × 200) px patches, S3 with 117 (300 × 300) px patches, S4 with 52 (400 × 400) px patches, and S5 with 52 (500 × 500) px patches.

Data augmentation was then applied to each of these patch sets, generating three variants with sizes of around 800, 1600 and 3800 samples. All of these combinations were called training datasets, as these were the actual data used in the network training. The same data augmentation techniques were used to generate all training datasets; these were random rotations, skews, flips, random brightness, elastic distortions, and shears from the Augmentor library (Marcus D Bloice, Peter M Roth, Andreas Holzinger, Biomedical image augmentation using Augmentor, Bioinformatics, https://github.com/mdbloice/Augmentor, accessed on 2 March 2022). Afterwards, patches were fed into the model with different input sizes, corresponding to different scale factors, depending on the patch dimensions (this is further explained in Section 3.1.2). Figure 4 summarizes all training datasets used.

The model was trained with an Adam optimizer with a learning rate of 1 × 10⁻⁵. Predictions were compared to labels with the binary cross entropy loss function. Early stopping was implemented if the validation loss did not improve for 10 consecutive epochs to prevent overfitting. Learning rate was reduced when the validation loss did not improve for five consecutive epochs. Each pixel was assigned to the classes with probabilities above the threshold of 0.5. Each Training Dataset was split into a training and validation set with the ratio 9:1. The validation set was never used in the training process and was only used to evaluate the model’s generalization ability during training and to take decisions on the training process. Each model was trained for 50 epochs.

The cloud service Google Colab (https://colab.research.google.com/, accessed on 2 March 2022) was used for training and evaluating the model. Deep learning methods were implemented using Keras (https://keras.io/, accessed on 2 March 2022) with a TensorFlow (https://www.tensorflow.org, accessed on 2 March 2022) backend. With a memory limit of 12 GB and time limit of 12 h, which comes with the free version of the service, this paper also aimed to explore the setups with a reasonable trade-off between working within these limits and yielding good results. This increases the usability and practicality for future exploration with off-the-shelf computational systems. The developed code and used data sets are publicly available (https://github.com/firefrontproject/Shrub-detection-with-U-Net, accessed on 2 March 2022).

2.5. Evaluation Method

The main evaluation metric is the F1 score, a class-specific measure of segmentation accuracy suitable for unbalanced datasets, such as the ones used in this paper:

F 1 score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(1)

where:

Precision = \frac{TP}{TP + FP}

(2)

and:

Recall = \frac{TP}{TP + FN}

(3)

where TP means true positives, FP is false positives and FN is false negatives.

3. Results

In this section, we present the results of several experiments designed to study the best model configurations in the training/validation sets, and to evaluate the generalization ability of the proposed models in test datasets. We performed several variations in model training according to the following conditions: (i) amount of training data, including augmentations; (ii) network input size; (iii) patch size; and (iv) hyperparameter tuning (number of filters, dropout rate and batch size). We started with a preliminary analysis of the effect of the amount of training data, patch size, and rescaling (network input size) by looking at performance metrics in the validation sets. This preliminary study allowed us to pre-select a set of promising model configurations for our problem. Then, we tested the trained models on the independent test datasets described in Section 2.2. to assess generalization to different conditions (same vs. different areas, days, seasons).

3.1. Preliminary Analysis in the Training Sets

In the first experiment (Section 3.1.1), we evaluated the performance of the system in a multi-class scenario. We considered not only shrubs but also trees, rocks, and shadows, as they are the most prominent patterns in the aerial images. Having noticed that the results for shrubs were not satisfactory, we decided to perform binary classifications in the remaining experiments (shrub cover vs. non-shrub cover), with more favorable results. In Section 3.1.2, we evaluate the influence of patch size, data augmentation and network input size in the performance of the model. In Section 3.1.3, we assess the influence of other hyperparameters.

3.1.1. Multi-Class Segmentation

The TGS U-Net was trained on the 832-sample dataset (without data augmentation) of all four classes separately. The model input dimensions were (128 × 128) px. The main characteristic of this dataset is its small patch size (100 × 100) px, and thus the highest number of original (i.e., non-augmented) samples. The results (for the target class, i.e., shrubs) obtained by using this dataset became the baseline for our study, because no other treatment of data was used in the first part of this experiment. The difference in performance among the used classes for the smallest 832 sample dataset can be found in Table 2.

Confusion matrices of all classes from the initial experiment can be found in Table 3. Shrubs were often confused with trees, which was caused by the low inter-class variance of shrubs when compared to this class, and shadows because they overlapped with many of the shrubs in used images.

Differences in the performance of the different classes were also assessed visually in form of continuous maps (heatmaps) and binary predictions (Figure 5). Using predictions in the form of heatmaps, instead of discrete classes, can be particularly useful in the case of landscapes with a lot of transitions among vegetation species or types, where pixels can contain more than one vegetation type. Because a lot of shrubs occur around and under the trees, the heatmaps could in fact be more useful than binary maps for aiding an expert in the decision-making process regarding landscape management.

In this first experiment, we noticed that the classification of shrubs (the class of most interest for our applications) was underperforming with respect to the other classes. Thus, we decided to train the remaining models with just two classes: shrub vs. non-shrubs. Therefore, in the remainder of the experiments we consider only models for binary classification.

In the second part of the initial experiment, for the target class (shrubs), the dataset was augmented to contain 1664 and 3832 samples. We note that no post-processing of the images was conducted. For the shrub class, the performance rose with the growing size of the training dataset, with F1 = 0.31 for the smallest (832), F1 = 0.63 for the intermediate (1664) and F1 = 0.68 for the largest dataset (3832).

3.1.2. Binary Segmentation

In the previous experiment, we verified that small patch sizes and multi-class segmentation did not perform well. In this experiment, we evaluated the simpler case of binary segmentation (shrub vs. non-shrub) and used the training datasets characterized by larger patch sizes, which increased memory and, most of all, time requirements. As a result, some of the experiments were left out because they were unfeasible to conduct; specifically, the rescaling experiments with the largest training datasets (3808 instances) and the larger network input sizes were left out of the study, even though they exhibited the best performance. Rescaling experiments with the worst-performing (the smallest, 808 instances) training sets and larger network input sizes were also left out. In the end, a set of 21 experiments was performed, exploring the impact of the patch size and rescaling of the model input on the performance. Data augmentation was also assessed simultaneously. No image post-processing was applied in this case either. The summary of conducted experiments can be found in Table 4.

The impact of patch size and rescaling were then studied. With the increasing patch size, the accuracy was expected to improve, because a larger patch captures more spatial context, as is illustrated in Figure 6.

Tested scales were 1:1 (input size : patch size) as in [48], and 1:2, as in [49]. Because the input must be compatible with the four max-pooling layers contained in the architecture of the TGS U-Net, and therefore must be divisible by 24, the scales are only approximated to these layers, as can be seen from Figure 4. Table 5 shows how changing the model input size changes the amount of space represented in one pixel.

The results of a data augmentation and patch size variation for model input 128 × 128 are shown in Figure 7.

In Figure 8, we present the results of different combinations of patch size and input size, resulting in different patch scaling to fit the network input size.

The three best-performing models (F1 = 0.90), which can be seen in Figure 8, took 50, 46 and 60 h to train and qualitatively did not bring much of a value in comparison to a model that took only four hours to train, which can be seen in Figure 9. The best tradeoff between training time and performance was achieved by a model using patch size S3 with 1664 (300 × 300) px samples, with a reduction of 50% in spatial dimensions (S3-1664_144 × 144). It achieved a validation F1 score of 0.82 in about four hours of training.

3.1.3. Hyperparameter Tuning

This section addresses the impact of different initial number of filters, dropout rate and batch size on the performance. The search was manual, using the following values:

The initial number of filters: 16, 32 and 64 [50];
The dropout rate: 0.05, 0.2, 0.5 and 0.75 [50,51];
The batch size: 15, 32 [52,53,54] and 50.

The results of the hyperparameter tuning are summarized in Figure 10.

3.2. Test Data

In this final test phase, all trained models (see Table 4) were evaluated on independent test datasets derived for the test partitions described in Section 2.2, which represented highly similar, seasonally similar, and highly distinct data. Summer and winter image used for the test partitions are depicted in Figure 11.

The highest performance of test data (0.76 to 0.77) was achieved with patch size S3, as can be seen in Table 6.

4. Discussion

4.1. Preliminary Analysis in the Training Sets

In this section, we elaborate on the results obtained in Section 3.1, regarding multi-class and binary segmentation, as well as using hyperparameter tuning to improve the classification results.

4.1.1. Multi-Class Segmentation

Here, we analyze the results obtained in the validation partition of this set (Section 3.1). In the first part of the initial experiment, we demonstrated that shrubs had the second lowest F1 score of all the classes. We also noted that the tree class significantly outperformed shrubs (even when compared to the augmented shrub dataset from the second part of this initial experiment). One reason for this could be that trees were a much more balanced class without any artificial adjustments to the data (accounting for 48.58% pixel representation across the dataset, unlike shrubs that only accounted for 20.99%). More importantly, however, trees seem to have clearer boundaries and a more regular shape, which makes them easier to distinguish in comparison to other classes. They also suffer less from high intra- and low inter-class variance. Shrubs, however, are more challenging due to their diversity of shapes, cover patterns, and, even, reflectance values (i.e., color). Because of this high variety of features, which could lead to misclassification, the neural network showed some limitations when compared to trees, which explains the lower precision obtained for this class. The very high accuracy of the class of rocks is misleading in this case because it was an underrepresented cover type that generated a small sample with only a 1.09% share of pixels in the dataset—the large number of true negatives masked away the significant number of false positives and false negatives evident in the values of precision and recall. The recall scores were low, which means that the algorithm still underperformed when identifying rocks within their own class. The main reason pertains to the limited number of examples in the training samples for the neural network to learn the various patterns and reflectance values that can be exhibited by rocks. Furthermore, rocks can be covered with vegetation (grasses, mosses, lichens, and even small shrubs) and bordered by vegetation, which affects their shape and reflectance, making it even more challenging to correctly classify them.

In the second part of the initial experiment, we showed that the largest dataset achieved the highest F1 score. This was an expected result since the model had more learning examples, and data augmentation aided in encoding more invariance, making the learning process more robust. However, the performance began to converge with the increasing size from the intermediate to the largest dataset. Thus, although there might still be a potential for further improvement by data extension through augmentation, the performance gains would likely be marginal.

Overall, patch sizes of 100 × 100 px performed poorly in comparison to other patch sizes. Small patches likely failed to capture enough of the spatial detail and fine-grained boundaries between the class and the background. There were presumably too many patches consisting of only a part of one object, not capturing enough of the context. Moreover, contrary to the remaining datasets used in our study, the patches here were upscaled (from 100 × 100 px to 128 × 128 px), which could increase blur and break down relevant patterns. Additionally, scaling factors above 1 provide little improvement in performance because there is no additional information gained, and instead they occupy more space in GPU [55].

4.1.2. Binary Segmentation

Here, we explored the impact of different experiments with binary segmentation on the results. First, we investigated the impact of data augmentation, which notably improved performance. The greatest differences among F1 scores of models trained with smaller patch sizes were between the 808- and 1658-instance datasets, while these differences began to plateau at 3808 samples. Apparently, there was not a sufficient amount of information in the 808 sample datasets. (The F1 score of 0 in patch size S5 for the smallest dataset reflects a degeneracy of the network to 0 recall, i.e., no detections). Doubling the dataset size to 1658 seemed to be already satisfactory, and expanding it even further may not compensate the added computational cost of training. For the larger patch size S4, the F1 score equalized among different-sized datasets.

Next, we explored the impact of the patch size. This was motivated by the studies of [48,56], suggesting that the accuracy should improve with the increasing patch size because a larger patch captures more spatial context. Increasing the patch size improved the performance for smaller training sets (808), while increasing it beyond (300 × 300) px for the larger sets (1658, 3808) proved to be unjustified, since it did not improve the classification results, similar to [35] (Figure 7). Instead, it increased time and computational requirements.

Finally, resizing was studied. Resizing images to smaller resolutions may lead to a loss of information [57]. Reina et al. [40] achieved a better performance with minimal down-scaling, whereas other studies report that down-scaling the input patch can contribute to a better filtering of the relevant spatial patterns ([49,58]). This can, therefore, depend on the content of the images and the target group. The goal was to find out which approach would work for the data used in this paper. Scaling down images too much could significantly hamper the ability to detect structures and textures. Higher F1 scores were achieved when the size of the rescaled patch was closer to the input size. This is especially important in cases where the size of the objects of interest is already small [59], or where downscaling would lead to a loss of relevant context information [48,57]. However, it is an interesting technique for shortening the training time [59], and the scale of 1:2 is a good trade-off between the small decrease in performance and a shorter training time [49].

4.1.3. Hyperparameter Tuning

Similar to [50], adding more filters improved the performance only until a certain point (32 filters), after which it started to decrease (64 filters), disagreeing with the general notion that deeper networks achieve better accuracies [29]. Using more filters made the network deeper and more complicated, which was probably not necessary for the kind of data used in this study, or it brought too many learnable parameters for available data, which caused overfitting. The F1 score of the best performing model with 32 filters was 0.84 but took 10 h to train, while the model with 16 filters achieved an F1 score of 0.82 in half the time.

The metrics generally worsened with the increasing dropout rate. The only exception was recall, which increased to 1 with the highest dropout rate. The model simply labeled most of the pixels as shrubs, producing many false positives. The best performance was achieved with the smallest dropout rate, which was part of the original setting. There was less deterioration in metrics between the dropout rates 0.05 and 0.2, but this became more apparent with larger dropout rates. The change was especially pronounced between dropouts 0.2 and 0.5, where the decline, especially in accuracy but also in precision and F1 score, was significant.

Batch size is a hyperparameter that, as many other hyperparameters, depends on many factors, such as the type of problem or data. Some authors [54] reported the best results when using a batch size as small as 2 or 4, while others [60] favored batch sizes as large as 128. The batch size did not have much of an impact on the results in this study. Considering that the further exploration of a batch size tuning would be dependent on the computational resources available, and that a batch size of 32 is generally recommended as a suitable value in many cases, further experimenting with this hyperparameter was not carried out in our work.

There are many other hyperparameters that could be further explored to improve the classification results, but the optimal model generally depends more on the used data (https://jakevdp.github.io/PythonDataScienceHandbook/05.03-hyperparameters-and-model-validation.html, accessed on 2 March 2022), rather than on hyperparameters. Nevertheless, it is not only important to tune the hyperparameters, but also to choose them diligently, since some of them may have a significant impact on the results, while others can have almost none.

4.2. Test Data

As expected, models performed worse with the new data. Some examples of the test results are shown in Table 6. The reason is that the test data did not come from the same dataset as training and validation data (excluding test dataset 1). The best average performance of all the experiments was achieved on test dataset 2 (F1 = 0.70), while test datasets 1 and 3 performed equally. The greatest culprit behind the gap between validation and test results was most likely the spatial distribution of vegetation in testing patches that were different from the training and validation sets. Due to a quite small dataset, models could not learn enough different spatial distributions of the target class. Patches in the test set 3 came from different images taken in a different year, so this was most likely the dominant factor of the performance decrease. The best evaluations were on test set 2 because these images were taken on the same day as the training images. The image from which training and validation patches were derived did not cover a representative enough sample of the shrub patterns in the area. The winter images were too different to be extrapolated from the summer data; a separate model would be necessary.

Furthermore, higher testing performances were generally achieved by models using larger patch sizes, larger dataset sizes and larger model input dimensions, in accordance with the validation results from Section 3.1.2. Data augmentation, patch size and model input dimensions (i.e., downscaling in our experiments) proved to be beneficial for the training and classification performance. The hyperparameter tuning did not bring any significant improvements in the performance, neither for validation, nor for test sets. Generally, the gaps between validation and test scores are relative to the data, selected metrics and models (https://machinelearningmastery.com/the-model-performance-mismatch-problem/, accessed on 2 March 2022).

5. Conclusions

This paper explored the potential of detecting irregular shrub cover in a complex heterogeneous landscape with U-Net. We presented a systematic analysis of the most important training parameters of a U-Net neural network when creating models for the segmentation of shrubs in RGB images acquired from a UAV. Due to their fire tolerance and high flammability, shrubs are of priority interest in terms of fire risk assessment and preventive management in Mediterranean regions, and their mapping is fundamental for better-informed land management and the reduction in forest fire hazards. This work consisted of two main parts: creating and manually labeling datasets and developing methods to increase detection accuracy using a U-Net neural network. We evaluated the impact of data augmentation, tiling, rescaling and hyperparameter tuning (number of filters, dropout rate and batch size) on the accuracy of the system. With respect to data augmentation, we observed that the largest datasets containing 3808 samples yielded the highest F1 scores. Regarding patch size, patches with (300 × 300) px, in combination with the largest datasets, provided the best results. For the larger datasets, larger patches did not improve performance, but increased the training time and computational demands. As for downscaling, degrading the image resolution typically leads to a loss of information, but the scale 1:2 significantly decreased the training time, while maintaining good performance levels. The configuration of pre-processing techniques yielding the best results depends on the problem and on the object of interest [19]. Hence, finding an optimal set of methods requires exhaustive research but could reap large benefits.

The major identified limitations were the amount of labeled data and the difficulty in ensuring a high precision when labeling. Using larger datasets with patches derived from several images taken during multiple flights could have a significant positive effect on the results. High-quality labels remain to be one of the central elements of image classification success. Due to the detailed boundaries of shrubs and some ambiguities with background elements, labeling by multiple annotators could help improve the quality of data. Using only RGB data was also a limitation, but this paper shows that it is possible to achieve reasonable results for some applications with such an inexpensive sensor.

Thus, based on the results achieved in this paper, we believe that further improvements in performance could be achieved by:

Further enlargement of the datasets, either from more labeled data from spatially and temporally distinct samples, or by employing more data augmentation variants;
Decreasing labeling incoherency, especially in case of frequently overlapping classes, by using more annotators and stricter rules on how to label mixed classes;
Alternatively, or in addition to the previous point, reduce the demands on precise segmentation and allow a less precise approach to labeling, e.g., selecting random regions of interest (ROI) within the class area, without identifying the exact borders of the class object. This could yield higher volumes of samples and concurrently dramatically reduce labeling time;
The systematic search of hyperparameters for augmentation and pre-processing techniques suitable for these particular data and tasks. Due to limited computational resources, we could not perform an exhaustive search of hyperparameters, but we noticed their importance in optimizing performance.

Finally, considering the nature and the objective of this task, using heatmaps in combination with expert opinions could be a better option than using binary predictions. This work has the potential to serve as an information tool for land planning and grazing management and could be also modified and repurposed to map other vegetation types, such as trees, or be used as a forest inventory tool.

Author Contributions

B.T., Conceptualization, Investigation, Methodology, Writing; V.P., Conceptualization, Methodology, Review and Editing, Supervision; A.B., Conceptualization, Methodology, Review and Editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by projects “SILVPAST—Cost-efficient implementation of silvo-pastoral mosaic systems of black oak” (PDR2020-101-031873) and (PCIF/SSI/0096/2017) “FIREFRONT—Real—Time Forest Fire Mapping and Spread Forecast Using Unmanned Aerial Vehicles”, and by FCT/MCTES (PIDDAC) through project LARSyS—FCT Pluriannual funding 2020–2023 (UIDB/50009/2020) and CEECIND/04469/2017 (V.Proença).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank Terraprima for making the images used in this research available. We would also like to thank the editor and the reviewers for their comments which helped us improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ahmed, B.; Noman, M.A.A. Land cover classification for satellite images based on normalization technique and Artificial Neural Network. In Proceedings of the 2015 International Conference on Computer and Information Engineering (ICCIE), Rajshahi, Bangladesh, 26–27 November 2015; pp. 138–141. [Google Scholar]
Fröhlich, B.; Bach, E.; Walde, I.; Hese, S.; Schmullius, C.; Denzler, J. Land cover classification of satellite images using contextual information. ISPRS Ann. Photogramm. Remote Sens. Spat. Inf. Sci. 2013, II-3/W1, 1–6. [Google Scholar] [CrossRef] [Green Version]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Vanjare, A.; Omkar, S.N.; Senthilnath, J. Satellite Image Processing for Land Use and Land Cover Mapping. Int. J. Image Graph. Signal Process. 2014, 6, 18–28. [Google Scholar] [CrossRef]
Matese, A.; Toscano, P.; Di Gennaro, S.F.; Genesio, L.; Vaccari, F.P.; Primicerio, J.; Belli, C.; Zaldei, A.; Bianconi, R.; Gioli, B. Intercomparison of UAV, Aircraft and Satellite Remote Sensing Platforms for Precision Viticulture. Remote Sens. 2015, 7, 2971–2990. [Google Scholar] [CrossRef] [Green Version]
Pérez-Rodríguez, L.A.; Quintano, C.; Marcos, E.; Suarez-Seoane, S.; Calvo, L.; Fernández-Manso, A. Evaluation of Prescribed Fires from Unmanned Aerial Vehicles (UAVs) Imagery and Machine Learning Algorithms. Remote Sens. 2020, 12, 1295. [Google Scholar] [CrossRef] [Green Version]
Getzin, S.; Wiegand, K.; Schöning, I. Assessing biodiversity in forests using very high-resolution images and unmanned aerial vehicles. Methods Ecol. Evol. 2012, 3, 397–404. [Google Scholar] [CrossRef]
Mangewa, L.J.; Ndakidemi, P.A.; Munishi, L.K. Integrating UAV Technology in an Ecological Monitoring System for Community Wildlife Management Areas in Tanzania. Sustainability 2019, 11, 6116. [Google Scholar] [CrossRef] [Green Version]
Csillik, O.; Cherbini, J.; Johnson, R.; Lyons, A.; Kelly, M. Identification of Citrus Trees from Unmanned Aerial Vehicle Imagery Using Convolutional Neural Networks. Drones 2018, 2, 39. [Google Scholar] [CrossRef] [Green Version]
Kinaneva, D.; Hristov, G.; Raychev, J.; Zahariev, P. Early Forest Fire Detection Using Drones and Artificial Intelligence. In Proceedings of the 2019 42nd International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 20–24 May 2019; pp. 1060–1065. [Google Scholar]
Kerle, N.; Nex, F.; Gerke, M.; Duarte, D.; Vetrivel, A. UAV-Based Structural Damage Mapping: A Review. ISPRS Int. J. Geo-Inf. 2019, 9, 14. [Google Scholar] [CrossRef] [Green Version]
Sankey, T.; Donager, J.; McVay, J.; Sankey, J.B. UAV lidar and hyperspectral fusion for forest monitoring in the southwestern USA. Remote Sens. Environ. 2017, 195, 30–43. [Google Scholar] [CrossRef]
Baena, S.; Moat, J.; Whaley, O.; Boyd, D.S. Identifying species from the air: UAVs and the very high resolution challenge for plant conservation. PLoS ONE 2017, 12, e0188714. [Google Scholar] [CrossRef] [Green Version]
Malenovský, Z.; Lucieer, A.; King, D.H.; Turnbull, J.D.; Robinson, S.A. Unmanned aircraft system advances health mapping of fragile polar vegetation. Methods Ecol. Evol. 2017, 8, 1842–1857. [Google Scholar] [CrossRef] [Green Version]
Langford, Z.L.; Kumar, J.; Hoffman, F.M.; Breen, A.L.; Iversen, C.M. Arctic Vegetation Mapping Using Unsupervised Training Datasets and Convolutional Neural Networks. Remote Sens. 2019, 11, 69. [Google Scholar] [CrossRef] [Green Version]
Lopatin, J.; Fassnacht, F.E.; Kattenborn, T.; Schmidtlein, S. Mapping plant species in mixed grassland communities using close range imaging spectroscopy. Remote Sens. Environ. 2017, 201, 12–23. [Google Scholar] [CrossRef]
Cao, J.; Leng, W.; Liu, K.; Liu, L.; He, Z.; Zhu, Y. Object-Based Mangrove Species Classification Using Unmanned Aerial Vehicle Hyperspectral Images and Digital Surface Models. Remote Sens. 2018, 10, 89. [Google Scholar] [CrossRef] [Green Version]
Sladojevic, S.; Arsenovic, M.; Anderla, A.; Culibrk, D.; Stefanovic, D. Deep Neural Networks Based Recognition of Plant Diseases by Leaf Image Classification. Available online: https://www.hindawi.com/journals/cin/2016/3289801/ (accessed on 26 December 2020).
Guirado, E.; Tabik, S.; Alcaraz-Segura, D.; Cabello, J.; Herrera, F. Deep-Learning Convolutional Neural Networks for Scattered Shrub Detection with Google Earth Imagery. arXiv 2017, arXiv:1706.00917. Available online: http://arxiv.org/abs/1706.00917 (accessed on 23 December 2020).
Ayhan, B.; Kwan, C. Tree, Shrub, and Grass Classification Using Only RGB Images. Remote Sens. 2020, 12, 1333. [Google Scholar] [CrossRef] [Green Version]
Hellesen, T.; Matikainen, L. An Object-Based Approach for Mapping Shrub and Tree Cover on Grassland Habitats by Use of LiDAR and CIR Orthoimages. Remote Sens. 2013, 5, 558–583. [Google Scholar] [CrossRef] [Green Version]
Lopatin, J.; Dolos, K.; Kattenborn, T.; Fassnacht, F.E. How canopy shadow affects invasive plant species classification in high spatial resolution remote sensing. Remote Sens. Ecol. Conserv. 2019, 5, 302–317. [Google Scholar] [CrossRef]
Zhou, Q.; Yang, W.; Gao, G.; Ou, W.; Lu, H.; Chen, J.; Latecki, L.J. Multi-scale deep context convolutional neural networks for semantic segmentation. World Wide Web 2019, 22, 555–570. [Google Scholar] [CrossRef]
Volpi, M.; Tuia, D. Dense Semantic Labeling of Subdecimeter Resolution Images With Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2017, 55, 881–893. [Google Scholar] [CrossRef] [Green Version]
Wen, D.; Huang, X.; Liu, H.; Liao, W.; Zhang, L. Semantic classification of urban trees using very high resolution satellite imagery. IEEE J. Sel. Top. Earth Obs. Remote Sens. 2017, 10, 1413–1424. [Google Scholar] [CrossRef]
Paisitkriangkrai, S.; Sherrah, J.; Janney, P.; Van Den Hengel, A. Semantic labeling of aerial and satellite imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 9, 2868–2881. [Google Scholar] [CrossRef]
Yu, C.; Wang, J.; Peng, C.; Gao, C.; Yu, G.; Sang, N. Learning a Discriminative Feature Network for Semantic Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1857–1866. [Google Scholar]
Chen, L.-C.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. arXiv 2017, arXiv:160600915. Available online: http://arxiv.org/abs/1606.00915 (accessed on 27 December 2020). [CrossRef] [Green Version]
Li, R.; Liu, W.; Yang, L.; Sun, S.; Hu, W.; Zhang, F.; Li, W. DeepUNet: A Deep Fully Convolutional Network for Pixel-Level Sea-Land Segmentation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3954–3962. [Google Scholar] [CrossRef] [Green Version]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. arXiv 2015, arXiv:1411.4038. Available online: http://arxiv.org/abs/1411.4038 (accessed on 2 November 2020).
Badrinarayanan, V.; Kendall, A.; Cipolla, R. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. arXiv 2016, arXiv:151100561. Available online: http://arxiv.org/abs/1511.00561 (accessed on 27 December 2020). [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arXiv 2015, arXiv:150504597. Available online: http://arxiv.org/abs/1505.04597 (accessed on 13 October 2020).
Stoian, A.; Poulain, V.; Inglada, J.; Poughon, V.; Derksen, D. Land Cover Maps Production with High Resolution Satellite Image Time Series and Convolutional Neural Networks: Adaptations and Limits for Operational Systems. Remote Sens. 2019, 11, 1986. [Google Scholar] [CrossRef] [Green Version]
Mahdianpari, M.; Salehi, B.; Rezaee, M.; Mohammadimanesh, F.; Zhang, Y. Very Deep Convolutional Neural Networks for Complex Land Cover Mapping Using Multispectral Remote Sensing Imagery. Remote Sens. 2018, 10, 1119. [Google Scholar] [CrossRef] [Green Version]
Hung, C.; Xu, Z.; Sukkarieh, S. Feature Learning Based Approach for Weed Classification Using High Resolution Aerial Images from a Digital Camera Mounted on a UAV. Remote Sens. 2014, 6, 12037–12054. [Google Scholar] [CrossRef] [Green Version]
Ashapure, A.; Jung, J.; Chang, A.; Oh, S.; Maeda, M.; Landivar, J. A Comparative Study of RGB and Multispectral Sensor-Based Cotton Canopy Cover Modelling Using Multi-Temporal UAS Data. Remote Sens. 2019, 11, 2757. [Google Scholar] [CrossRef] [Green Version]
Solórzano, J.V.; Mas, J.F.; Gao, Y.; Gallardo-Cruz, J.A. Land Use Land Cover Classification with U-Net: Advantages of Combining Sentinel-1 and Sentinel-2 Imagery. Remote Sens. 2021, 13, 3600. [Google Scholar] [CrossRef]
Korznikov, K.A.; Kislov, D.E.; Altman, J.; Doležal, J.; Vozmishcheva, A.S.; Krestov, P.V. Using U-Net-Like Deep Convolutional Neural Networks for Precise Tree Recognition in Very High Resolution RGB (Red, Green, Blue) Satellite Images. Forests 2021, 12, 66. [Google Scholar] [CrossRef]
Pereira, H.M.; Navarro, L.M. (Eds.) Rewilding European Landscapes; Springer International Publishing: Cham, Switzerland, 2015; ISBN 978-3-319-12038-6. [Google Scholar]
Pausas, J.G.; Paula, S. Fuel shapes the fire-climate relationship: Evidence from Mediterranean ecosystems: Fuel shapes the fire-climate relationship. Glob. Ecol. Biogeogr. 2012, 21, 1074–1082. [Google Scholar] [CrossRef]
Fernandes, P.M. Fire-smart management of forest landscapes in the Mediterranean basin under global change. Landsc. Urban Plan. 2013, 110, 175–182. [Google Scholar] [CrossRef] [Green Version]
Fernández-Manjarrés, J.; Ruiz-Benito, P.; Zavala, M.; Camarero, J.; Pulido, F.; Proença, V.; Navarro, L.; Sansilvestri, R.; Granda, E.; Marqués, L.; et al. Forest Adaptation to Climate Change along Steep Ecological Gradients: The Case of the Mediterranean-Temperate Transition in South-Western Europe. Sustainability 2018, 10, 3065. [Google Scholar] [CrossRef] [Green Version]
Álvarez-Martínez, J.; Gómez-Villar, A.; Lasanta, T. The use of goats grazing to restore pastures invaded by shrubs and avoid desertification: A preliminary case study in the Spanish Cantabrian Mountains. Degrad. Dev. 2016, 27, 3–13. [Google Scholar] [CrossRef]
Silva, J.S.; Moreira, F.; Vaz, P.; Catry, F.; Godinho-Ferreira, P. Assessing the relative fire proneness of different forest types in Portugal. Plant Biosyst.-Int. J. Deal. Asp. Plant Biol. 2009, 143, 597–608. [Google Scholar] [CrossRef]
Cruz, Ó.; García-Duro, J.; Riveiro, S.F.; García-García, C.; Casal, M.; Reyes, O. Fire Severity Drives the Natural Regeneration of Cytisus scoparius L. (Link) and Salix atrocinerea Brot. Communities and the Germinative Behaviour of These Species. Forests 2020, 11, 124. [Google Scholar] [CrossRef] [Green Version]
Tarrega, R.; Calvo, L.; Trabaud, L. Effect of High Temperatures on Seed Germination of Two Woody Leguminosae. Vegetatio 1992, 102, 139–147. [Google Scholar] [CrossRef]
Lovreglio, R.; Meddour-Sahar, O.; Leone, V. Goat grazing as a wildfire prevention tool: A basic review. IForest-Biogeosci. For. 2014, 7, 260–268. [Google Scholar] [CrossRef] [Green Version]
Reina, G.A.; Panchumarthy, R.; Thakur, S.P.; Bastidas, A.; Bakas, S. Systematic Evaluation of Image Tiling Adverse Effects on Deep Learning Semantic Segmentation. Front. Neurosci. 2020, 14, 65. [Google Scholar] [CrossRef] [PubMed]
Rakhlin, A.; Davydow, A.; Nikolenko, S. Land Cover Classification from Satellite Imagery with U-Net and Lovász-Softmax Loss. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 257–2574. [Google Scholar]
Zhang, P.; Ke, Y.; Zhang, Z.; Wang, M.; Li, P.; Zhang, S. Urban Land Use and Land Cover Classification Using Novel Deep Learning Models Based on High Spatial Resolution Satellite Imagery. Sensors 2018, 18, 3717. [Google Scholar] [CrossRef] [Green Version]
Zhang, F.; Du, B.; Zhang, L. Saliency-Guided Unsupervised Feature Learning for Scene Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2175–2184. [Google Scholar] [CrossRef]
Bengio, Y. Practical Recommendations for Gradient-Based Training of Deep Architectures. arXiv 2012, arXiv:12065533. Available online: http://arxiv.org/abs/1206.5533 (accessed on 14 December 2020).
Keskar, N.S.; Mudigere, D.; Nocedal, J.; Smelyanskiy, M.; Tang, P.T.P. On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima. arXiv 2017, arXiv:160904836. Available online: http://arxiv.org/abs/1609.04836 (accessed on 14 December 2020).
Masters, D.; Luschi, C. Revisiting Small Batch Training for Deep Neural Networks. arXiv 2018, arXiv:180407612. Available online: http://arxiv.org/abs/1804.07612 (accessed on 9 November 2020).
Zheng, L.; Zhao, Y.; Wang, S.; Wang, J.; Tian, Q. Good Practice in CNN Feature Transfer. arXiv 2016, arXiv:160400133. Available online: http://arxiv.org/abs/1604.00133 (accessed on 2 December 2020).
Kattenborn, T.; Eichel, J.; Wiser, S.; Burrows, L.; Fassnacht, F.E.; Schmidtlein, S. Convolutional Neural Networks accurately predict cover fractions of plant species and communities in Unmanned Aerial Vehicle imagery. Remote Sens. Ecol. Conserv. 2020, 6, 472–486. [Google Scholar] [CrossRef] [Green Version]
Zhang, W.; Tang, P.; Zhao, L. Remote Sensing Image Scene Classification Using CNN-CapsNet. Remote Sens. 2019, 11, 494. [Google Scholar] [CrossRef] [Green Version]
Müllerová, J.; Brůna, J.; Bartaloš, T.; Dvořák, P.; Vítková, M.; Pyšek, P. Timing Is Important: Unmanned Aircraft vs. Satellite Imagery in Plant Invasion Monitoring. Front. Plant Sci. 2017, 8, 887. [Google Scholar] [CrossRef] [Green Version]
Audebert, N.; Le Saux, B.; Lefèvre, S. Beyond RGB: Very high resolution urban remote sensing with multimodal deep networks. ISPRS J. Photogramm. Remote Sens. 2018, 140, 20–32. [Google Scholar] [CrossRef] [Green Version]
Iglovikov, V.; Mushinskiy, S.; Osin, V. Satellite Imagery Feature Detection using Deep Convolutional Neural Network: A Kaggle Competition. arXiv 2017, arXiv:170606169. Available online: http://arxiv.org/abs/1706.06169 (accessed on 2 December 2020).

Figure 1. (a) Location of Quinta da França in Portugal (Source: QGIS); (b) In yellow are borders of Quinta da França in detail. The red border delimits the main area of oak forest on the farm. The black point depicts the location of the data used in this paper (Source: Terraprima -Sociedade Agrícola Lda., 2012).

Figure 2. An example of a labeled tile and its binary masks. Upper left: original image tile, Upper right: labeled image tile (red—shrubs, orange—trees, yellow—shadows, light yellow—rocks). Bottom from left: binary mask of shrubs, trees, shadows, and rocks.

Figure 3. The original U-Net architecture [32]. Blue boxes—multi-channel feature maps. White boxes—copied feature maps. The no. of the channels is denoted at the top of each box. Height and width of each layer is shown at lower left edge of the box. The arrows denote different operations.

Figure 4. Training datasets used in experiments with different patch sizes and network input sizes.

Figure 5. Examples of classification results on validation data for the used classes. From left: the original image, the binary mask, the heatmap prediction and the binary prediction (with a threshold 0.5). From top: shrubs, trees, shadows and rocks. Black contours represent the boundaries of a binary mask of a particular class (changed to red color in the last row for better visibility).

Figure 6. Examples of patches with different sizes (from left to right: samples from patch sets: S1, S2, S3, S4 and S5).

Figure 7. Impact of data augmenting and a patch size on F1 score. Model input: (128 × 128) px.

Figure 8. Impact of downscaling on F1 score.

Figure 9. Qualitative comparison of the performance of the models with the longest training time and the best trade-off model regarding time and performance, C-1664_144 × 144.

Figure 10. Results of tuning different hyperparameters: (a) the number of filters (16, 32 and 64); (b) the dropout rate (0.05, 0.2, 0.5 and 0.75) and (c) the batch size (15, 32 and 50).

Figure 11. Summer (a) and winter (b) image. Yellow squares depict the tiles selected for labeling for test set 3 and test set 4, respectively.

Table 1. Pixel share of classes in the training partition (13 tiles).

Class	Shrubs	Trees	Shadows	Rocks
Pixel count	1,746,204	4,042,008	1,213,720	90,313
Pixel share	20.99%	48.58%	14.59%	1.09%

Table 2. Class performance for the 832-sample dataset (without data augmentation).

	Shrubs	Trees	Shadows	Rocks
Accuracy	0.46	0.79	0.90	0.99
Precision	0.23	0.83	0.77	0.72
Recall	0.96	0.83	0.63	0.18
F1 Score	0.31	0.83	0.69	0.29

Table 3. Confusion matrix of the validation set of the initial experiment.

	Reference
Classified	Shrubs	Trees	Shadows	Rocks
Shrubs	28.44%	1.87%	0.08%	0.00%
Trees	6.83%	43.69%	0.89%	0.00%
Shadows	5.17%	2.15%	9.52%	0.00%
Rocks	0.74%	0.05%	0.08%	0.48%

Table 4. A summary of conducted experiments.

No.	Class	Dataset Size	Patch Dimensions (px)	Model Input Dimensions (px)
1	Shrubs (augmented)	808	200 × 200	128 × 128
2		1658		128 × 128
3		1658		192 × 192
4		3808		128 × 128
5		3808		192 × 192
6		808	300 × 300	128 × 128
7		1658		128 × 128
8		1658		144 × 144
9		3808		128 × 128
10		3808		144 × 144
11		3808		288 × 288
12		808	400 × 400	128 × 128
13		1658		128 × 128
14		1658		192 × 192
15		1658		400 × 400
16		3808		128 × 128
17		808	500 × 500	128 × 128
18		1658		128 × 128
19		1658		240 × 240
20		1658		496 × 496
21		3808		128 × 128

Table 5. Resulting scales.

Patch Size (px)	Patch Size (m)	Scale (cm/1 px)	Model Input Size (px)	Scale (cm/1 px)	Model Input Size (px)	Scale (cm/1 px)	Model Input Size (px)	Scale (cm/1 px)
800	50.00	6.25	-	-	-	-	-	-
500	31.25		128	24.41	240	13.02	496	6.30
400	25.00		128	19.53	192	13.02	400	6.25
300	18.75		128	14.65	144	13.02	288	6.51
200	12.50		128	9.77	192	6.51	-	-
100	6.25		128	4.88	-	-	-	-

Table 6. Comparison of validation and test results of a chosen model per each patch size: S2, S3, S4 and S5.

Dataset Size	Patch Dimensions (px)	Model Input Dimensions (px)	F1 (-) (Validation Data)	No. of the Test Set	F1 (-) (Test Data)
3808	200 × 200 (S2)	192 × 192	0.82	1	0.60
				2	0.64
				3	0.63
	300 × 300 (S3)	288 × 288	0.90	1	0.77
				2	0.76
				3	0.62
1658	400 × 400 (S4)	400 × 400	0.90	1	0.69
				2	0.68
				3	0.61
	500 × 500 (S5)	496 × 496	0.90	1	0.67
				2	0.67
				3	0.58

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Trenčanová, B.; Proença, V.; Bernardino, A. Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes. Remote Sens. 2022, 14, 1262. https://doi.org/10.3390/rs14051262

AMA Style

Trenčanová B, Proença V, Bernardino A. Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes. Remote Sensing. 2022; 14(5):1262. https://doi.org/10.3390/rs14051262

Chicago/Turabian Style

Trenčanová, Bianka, Vânia Proença, and Alexandre Bernardino. 2022. "Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes" Remote Sensing 14, no. 5: 1262. https://doi.org/10.3390/rs14051262

APA Style

Trenčanová, B., Proença, V., & Bernardino, A. (2022). Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes. Remote Sensing, 14(5), 1262. https://doi.org/10.3390/rs14051262

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Development of Semantic Maps of Vegetation Cover from UAV Images to Support Planning and Management in Fine-Grained Fire-Prone Landscapes

Abstract

1. Introduction

1.1. Remote Sensing and Land Cover Mapping

1.2. Case Study: Fire Prone Mediterranean Landscapes

1.3. Objectives

2. Materials and Methods

2.1. Study Area

2.2. Data Description

2.3. Model

2.4. Model Training

2.5. Evaluation Method

3. Results

3.1. Preliminary Analysis in the Training Sets

3.1.1. Multi-Class Segmentation

3.1.2. Binary Segmentation

3.1.3. Hyperparameter Tuning

3.2. Test Data

4. Discussion

4.1. Preliminary Analysis in the Training Sets

4.1.1. Multi-Class Segmentation

4.1.2. Binary Segmentation

4.1.3. Hyperparameter Tuning

4.2. Test Data

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI