1. Introduction
Land cover (LC) maps are essential tools for measuring and monitoring the state of natural landscapes [
1]. Their applicability to landscape ecology [
2,
3], climate change vulnerability [
4,
5], natural capital and ecosystem service assessments [
6,
7] or conservation work [
8] is largely determined by their resolution and accuracy.
High-resolution LC maps were historically produced by detailed ground surveys or drawn by hand from photo-interpretation of aerial imagery [
9,
10]. Machine Learning (ML) image classification methods, such as Random Forests and Support Vector Machines, have since been used for automating these labour-intensive manual approaches at a range of scales and across different habitats [
11,
12,
13]. These relatively simple ML approaches typically require extensive image pre-processing and feature engineering to create input data with sufficiently low intra-class variability, incorporating spatial features through texture analysis or grouping pixels into objects using image segmentation. To overcome these limitations, deep learning models using Convolutional Neural Networks (CNNs) learn spatial patterns directly from raw imagery, which enables models to be applied efficiently at scale [
14].
The spatial resolution of LC maps is determined by the ground resolution of the image data, which is typically upward of 10 m for public satellite data [
15]. Although sufficient for some purposes, commonly available coarse-resolution LC maps preclude the accurate surveying and monitoring of small-scale, patchily distributed habitats and of fragmented or mosaic habitats at a landscape scale [
16]. To address this, unmanned aerial vehicles (UAVs) are increasingly used to collect very-high-resolution images for mapping LC at a high level of detail [
17]. Whilst advances in deep learning show great promise for combining UAVs and Machine Learning analysis [
18,
19], issues with weather conditions, operational factors and consumer-grade sensors or cameras remain, which makes operating UAVs at a landscape scale impractical [
20].
Alternatively, aerial surveys from piloted aircraft provide a compromise between high-resolution imagery and large spatial coverage, resulting in greater consistency of imagery at a landscape scale [
20]. In addition to spatial resolution, another key feature of LC maps is their LC class resolution—i.e., the specificity of the LC class schema. For example, European LC maps categorise habitats into relatively broad classes (forest, water, fields, etc.), which are useful for generalised LC monitoring [
21], but this reduced thematic detail results in landscape characteristics being defined less precisely [
22]. Within the United Kingdom (UK), current nationally available LC data sets such as the Centre of Ecology and Hydrology [
23] or the Living England [
24] LC maps provide context for LC classes and spatial resolution needed at a national level but do not provide regional information tailored to a specific area of interest [
25].
In this study, we consider the Peak District National Park (PDNP) in the UK, where the need for a high spatial and class resolution LC map is particularly relevant. The PDNP is an International Union for Conservation of Nature (IUCN) Category V Protected Area, characterised by the interaction of people (such as farming, housing, tourism and industry) and nature over time [
26]. As such, this has created a mixed landscape of farmland and other developed land along with sites designated primarily for nature conservation and other land uses [
27]. Landscape management planning therefore operates over large spatial scales, addressing a range of ecosystem processes, conservation objectives and land uses [
28]. Nevertheless, in the UK, there remains limited coverage of high-resolution LC data sets to support the delivery of landscape-scale conservation objectives, evidenced by the fact that the last time LC was extensively surveyed and classified in National Parks (NPs) was in 1991 by visual interpretation of aerial photography [
9].
Here, we have adapted (and slightly updated) the 1991 schema to create a new LC map of the PDNP using CNN-based semantic segmentation. However, two challenges must be overcome in order to use CNNs to predict the small-scale variations in LC typically found in UK NPs. Firstly, both raw and annotated data must be available at very high resolution. Second, CNNs must be able to handle the strongly non-uniform distribution of LC classes as well as the inherent variability of large-scale aerial photography related to image capture, such as the time of day and seasonality [
11].
We have addressed these two challenges by (1) creating a very-high-resolution data set and (2) developing a multi-stage CNN semantic segmentation approach. To overcome the first challenge, we have created an extensive data set of over 1000 image patches of 64 m × 64 m at 12.5 cm ground resolution and manually annotated using an updated version of the LC schema from [
9,
29]. LC classes range from woodland subclasses to moorland mosaics, and patches are distributed across the entire PDNP (spanning 1439 km
2). We have made this data set publicly available, including the raw RGB data [
30], which we envision could become a new standard data set for benchmarking UK LC prediction models. Secondly, we developed a multi-stage approach to overcome the challenge of non-uniform LC classes. We trained CNNs to classify RGB aerial photography obtained at 12.5 cm ground resolution [
30], and leveraging the multi-level structure of the hierarchical LC schema, we first predicted high-level classes, which we then used as a mask to predict their low-level subclasses. We then overlaid model predictions with a topography layer of urban classes [
31] to generate highly detailed LC maps. Further, secondary data were used to aid classification between some subclasses (using soil data).
Finally, we demonstrate the applicability of this model by quantifying metrics of habitat fragmentation of wet grassland and rush pasture across designated primary habitat areas. In summary, by developing a multi-stage approach to train CNNs and creating a detailed LC data set, we were able to detect small-scale LC features across landscapes at scales that are fine enough to inform local management decisions.
2. Methods
2.1. Study Area
This study concerns the Peak District National Park (PDNP), England, United Kingdom, which totals 1439 km
2 (see
Figure 1). The PDNP is an upland area at the southern end of the Pennines, most of which is above 300 m in altitude and with the highest point at 636 m. The PDNP contains a variety of landscapes that range from largely uninhabited broad, open moorlands in the Dark Peak [
32] to more-enclosed farmlands and wooded valleys in the White Peak and South West Peak [
32]. The landscapes have been shaped by variations in geology and landform and the long settlement and use of these landscapes by people.
2.2. Image Data
Orthorectified aerial digital photography of the entire PDNP (1439 km
) was obtained at 12.5 cm ground resolution through the Aerial Photography Great Britain (APGB) agreement for UK public sector bodies [
30]. Standard aerial photography images were used, containing red, green, and blue (RGB) wide spectral channels and collected at seven different dates between April 2019 to June 2022 (
Figure 1b).
2.3. Convolutional Neural Network
Convolutional Neural Networks (CNNs) are deep neural networks that use convolutional layers that can efficiently process image data; they have a strong track record in remote sensing and ecology applications [
14,
33]. In this study, we consider the task of performing semantic segmentation, i.e., predicting the (LC) class for each pixel of the input RGB image. To this end, we used a CNN model specialised for this task: the U-Net [
14,
34,
35,
36]. U-Nets are characterised by their (U-shaped) layout of hidden layers (
Figure 2b). First, a stack of encoding convolutional layers abstracts task-relevant information from the input image, which is then used by decoding layers to predict the LC class of each pixel. We used U-Nets adapted from Iakubovskii [
37], which were pretrained on Imagenet prior to optimisation on this task to reduce training time and the required number of data points. All classifiers used the same model architecture and training parameters (see
Section 2.7).
2.4. Land Cover Schema
The landscape classification system used to classify the area features originated from the
Monitoring Landscape Change in England and Wales survey undertaken by Hunting Technical Services Ltd. (1986) [
29]. This was a national classification suitable for mapping UK habitats from a combination of ground surveys and aerial photographs. Modifications were made to the classification categories by Taylor et al. [
9,
29] to take into account more-specialised LC classes found in UK national parks and that those features were solely mapped from aerial photography. This schema is representative of the landscape within the UK and is well-suited for monitoring LC classes in the PDNP and other UK national parks and surveying in detail single species, mixed species classes or intensively managed areas [
38].
We further adapted the land cover schema from Taylor et al. [
9,
29], with the addition of a new wetland vegetation class (F3d, wet grassland and rush pasture) and an extra subclass of upland heath (D1b, peaty soil upland heath): see
Table 1. Wet grassland and rush pasture (F3d) occurs on poorly drained, usually acidic soils and contributes to the richness of invertebrate fauna supporting key species. The habitat consists of various species-rich types of fen meadow and rush pasture such as purple moor grass (
Molinia caerulea) and rushes, especially sharp-flowered rush (
Juncus acutiflorus). In the landscape, it can often be found fragmented as part of the mosaic of farmland habitats and also moorland areas. Because of this, the habitat does not represent a fixed phytosociological community [
39], but as a land cover class, it is of interest as it is a cosmopolitan, patchily distributed habitat across the whole study area that occurs both within areas of moorland (D) and grassland (E). Therefore, it was decided to include it in both the D and E classifiers (see
Section 2.8).
Table 1.
Land cover schema adapted from [
9,
10,
29]. LC80 is the original schema; LC20 is the updated schema that we have created. LC80-main denotes the main class. Only classes that are present in PDNP are included.
Table 1.
Land cover schema adapted from [
9,
10,
29]. LC80 is the original schema; LC20 is the updated schema that we have created. LC80-main denotes the main class. Only classes that are present in PDNP are included.
LC80-Main | LC80 | LC20 | Name | New? |
---|
C (Wood and forest land) | C1 | C1 | Broadleaved high forest | - |
C | C2 | C2 | Coniferous high forest | - |
C | C4 | C4 | Scrub | - |
C | C5 | C5 | Clear felled/newly planted trees | - |
D (Moor and heath land) | D1 | D1a | Upland heath | - |
D | D1 | D1b | Upland heath, peaty soil | Yes |
D | D2b | D2b | Upland grass moor | - |
D | D2d | D2d | Blanket peat grass moor | - |
D | D3 | D3 | Bracken | .. |
D | D6a | D6a | Upland heath/grass mosaic | - |
D | D6c | D6c | Upland heath/blanket peat mosaic | - |
E (Agro-pastoral land) | E2a | E2a | Improved pasture | - |
E | E2b | E2b | Rough pasture | - |
F (Water and wetland) | F2 | F2 | Open water, inland | - |
F | F3a | F3a | Peat bog | - |
F | D2/E2 | F3d | Wet grassland and rush pasture | Yes |
G (Rock and coastal land) | G2 | G2 | Inland bare rock | - |
H (Developed land) | H1a | H1a | Urban area | - |
H | H1b | H1b | Major transport route | - |
H | H2a | H2a | Quarries and mineral working | - |
H | H2b | H2b | Derelict land | - |
H | H3a | H3a | Isolated farmsteads | - |
H | H3b | H3b | Other developed land | - |
I (Unclassified land) | I | I | Unclassified land | - |
Some LC classes from the schema by [
9,
29] were excluded in this study; the reasons for the exclusions are as follows:
C3 (mixed high forest)—the aim of the new modelling was to resolve C1 (broadleaved) and C2 (coniferous) at the resolution of single trees, so it was decided to exclude C3 (which would normally consider large parcels of woodlands to be a mix of broadleaved and coniferous trees).
D4 (unenclosed lowland areas)—the distinction between D4 and E classes is based on whether the land is “enclosed for stock control purposes” [
29]. This cannot be done based on 64 m × 64 m image patches as used for input data by the CNNs. D4 was therefore excluded.
D6b (upland mosaic heath/bracken)—although we have labelled these areas in the train/test data sets, we decided to merge these areas after classification into D3 (bracken). This was done because both D3 and D6b were relatively rare and therefore difficult to learn, while together they provided more data points (though combined, this was still one of the rarest classes).
D7 (eroded areas)—the large areas of eroded peat (D7a) that were present in the Peak District in the 1991 census [
9] have now been revegetated by the establishment of grasses and moorland plants in the past decades [
20,
40]. The few remaining patches of eroded peat are typically small patches or narrow strips in the bases of gullies.
D8 (coastal heath)—not present in the Peak District.
E1 (cultivated land)—this is barely present in the Peak District and was therefore excluded. (And where still present, it is relabelled to E2a).
F1 (coastal open water), F3b (freshwater marsh), F3c (saltmarsh), G2b (coastal rock) and G3 (other coastal features)—all not present in the Peak District.
2.5. Selecting Image Patches for Training and Testing
The image data were available as 1 km × 1 km tiles, which were split into 64 m × 64 m (i.e., 512 pixels × 512 pixels) patches for input to the CNN models. Our goal was to create a data set that sufficiently covered the spatial extent of the PDNP, the variety in LC classes, the variability within LC classes and the variability in image acquisition across different regions (caused by different flight dates). We therefore selected image patches across the PDNP with the following procedure:
First, we used the 1991 census data [
9] to select 50 tiles (of 1 km
2 each) that were representative of the overall LC distribution (of 1991). To do so, we generated 50,000 random samples of 50 tiles, computed the L1 loss of the LC area distribution of the sample compared to the LC area distribution of the entire PDNP, and selected the sample with the lowest L1 loss. This resulted in a sample of 50 tiles that was spatially distributed across the PDNP, illustrated in
Figure 3. From each of these tiles, we randomly selected nine image patches (of 64 m × 64 m)—one from each block of a 3 × 3 grid across the tile—resulting in 450 image patches. This approach was used to prevent bias in sample selection and to ensure the accuracy metrics used for validation were representative of the final mapped outputs [
41]. However, some classes are much more prevalent than others (
Figure 3). This meant that rare classes were very unlikely to be sampled sufficiently for model training. Therefore, an additional 577 patches were selected manually across the same 50 tiles, plus 30 extra tiles were selected to boost rare classes (
Figure 1c).
For the purpose of training a model that predicts land cover, we distinguish the following three types of data:
Training: (geospatial) data that are used to train the CNN classification models. These consist of both input data (aerial imagery) and land cover annotations.
Testing: data that are used to evaluate the performance of trained CNNs. Importantly, these data are not used to further improve CNN performance nor to assess convergence but only to quantify its final performance. These consist of both input imagery and land cover annotations.
Prediction: this is the entire area that is classified with the converged model (for further analysis). Only input imagery data are available a priori, from which the model predicts the land cover annotations.
2.6. Land Cover Annotation
The 1027 patches (64 m × 64 m) selected for training and testing the models were annotated manually by image-interpretation according to the LC schema of
Table 1. We labelled land cover using visual interpretation of aerial images because this allowed us to draw accurate LC class boundaries consistently at scale. Annotations were first done by one human expert interpreter, and afterwards, they were checked (and corrected where necessary) by a second expert interpreter. Uncertain annotations were verified in the field, leading to a highly detailed and accurate data set representative of upland UK landscapes. This data set was split randomly into 70% for training and 30% for testing the CNN models. The same split was used for all models and is maintained in the public data set [
42].
We used publicly available woodlands data to aid and speed up the manual annotation of large woodlands in image patches [
43]. Further, OS NGD data were used for mapping the F2, G2a and H classes (
Figure 2, [
31]). Lastly, primary habitat data for wet grassland and rush pasture were used for the habitat fragmentation analysis [
44]. These are all summarised in
Table 2.
2.7. Model Training
We explored two popular CNN backbones [
14], Resnet50 and Efficientnet-b1 [
37], and observed after visual inspection of large areas of predictions (continuous areas larger than single 64 m × 64 m patches) that Efficientnet had the tendency to classify patches as a whole, despite adding overlap (padding) between image patches, leading to block-like predictions at a large scale (see
Figure 4 for an example). It was therefore decided to use Resnet50 networks going forward. All CNNs were trained for 60 epochs using a batch size of 10, after which, the best-performing iteration based on the training loss was selected. Next, we considered two loss functions (cross entropy and focal loss with
), and optimised five CNNs for each loss function using an Adam optimizer for all four classifiers (main, C, D and E).
Table 3 reports the mean, standard deviation and maximum of the accuracy across the five runs for each setting. The best (i.e., maximum accuracy) model was then selected for the final predictions.
CNNs received RGB image patches of 512 × 512 pixels as input. To avoid edge effects, we used a padding of 22 pixels (meaning neighbouring image patches slightly overlapped). Image patches were grouped per tile of 1000 m × 1000 m, so the edges of these tiles were predicted without overlap. Input images were z-scored, and during training, data were augmented by random horizontal and/or vertical flipping. For training the main classifiers, LC annotations were relabelled to their corresponding main class (e.g., C1 was relabelled to C). For training the detailed classifiers, LC annotations that were not relevant to the classifier (e.g., C1 is not relevant to the D classifier) were blanked out and did not contribute to the loss during training.
2.8. Multi-Stage Semantic Segmentation
We developed a multi-stage approach because of the hierarchical LC schema, the high number of classes (23), the strong non-uniformity of the class distribution (
Figure 3) and the intra-class variance caused by the large area of interest (PDNP, 1439 km
2). The classification process was split into four stages (
Figure 2a). First, one CNN model was used to predict the main classes directly from the RGB data (
Figure 2a, first step). Second, OS NGD data were used to overwrite these predictions with any F2 (open water), G (rock) or H (developed land) LC class (
Figure 2a, second step). Third, three separate CNN classifiers were used for the prediction of the detailed sub-classes (
Figure 2a, third step). These detailed predictions were then masked using the combined classes from the previous step. For example, the output of the C-classifier would predict, directly from the RGB data, detailed C1, C2, C4 and C5 classes (see
Table 1) at locations classified as C by the main classifier. Fourth, soil data [
45] were used to disambiguate between subclasses of D (moorlands) with or without peaty soil: e.g., D1a or D1b (
Figure 2a, fourth step).
The new LC class wet grassland and rush pasture (F3d) posed a challenge for the classifiers, as it typically occurs in small patches both within moorlands (D) and grasslands (E). As CNNs rely on the context of the RGB image for classification, these different types of habitat surroundings were initially found to confuse the CNN models. Therefore, we decided to: (1) include F3d as a category in both the detailed D classifier and the detailed E classifier and (2) for the purpose of training the main classifier only, remap any F3d polygons to D class. In other words, to the CNN classifiers, F3d was presented as a subclass of D (moorlands) while allowing the possibility to classify E (grasslands) into F3d given its presence in grasslands too.
2.9. Single-Stage Semantic Segmentation
For comparison with the multi-stage models, detailed LC classes were also predicted directly using conventional single-stage semantic segmentation. U-Nets were trained using exactly the same protocols and parameters as previously described. Again, five networks were trained, and the best-performing network was selected for further analysis (
Table 3). Networks were trained to predict the detailed LC classes directly.
2.10. Merger with OS Layer for Developed Land
Ordnance Survey (OS) data were used to map the water (F2), rock (G2) and developed land (H) classes, as these had already been accurately and recently mapped by Ordnance Survey [
31]. After the main classifier predicted the main class of the land cover, these predictions were overwritten by the OS layer (i.e., areas that contained OS polygons replaced the model-predicted polygons,
Figure 2a). To do so, OS polygon classes were relabelled to our LC schema (the relabelling key is available online:
https://github.com/pdnpa/cnn-land-cover/blob/main/content/os_to_lc_mapping.json, accessed on 5 November 2023). Quarry (H2a) OS annotations were found to be inaccurate, and therefore, they were all manually verified and deleted if necessary.
2.11. Post-Processing of Model Predictions
Some detailed classes were distinguished based on secondary soil data (
Figure 2a). Specifically, some D classes had peat-soil and non-peat-soil variants (D1a and D1b, D2b and D2d, and D6a and D6c). To identify these, the model predicted D1, D2 and D6 generally, and predictions were subsequently labelled as peat/non-peat based on the `Peaty Soils Location’ data set from Natural England [
45]. For each predicted D1, D2 and D6 polygon, the intersection with the peaty soils layer polygons was computed, and it was then assigned the peat label if the intersection was greater than 50% of the area of the predicted polygon.
However, model predictions in the paper are shown without this post-processing step, as it is a deterministic separation of some classes that does not change the performance or accuracy of any class (but does create extra classes).
2.12. Statistics
All CNN models were evaluated on withheld test data only (30% of the 1027 image patches in our data set—the same train/test split was used for all models and analyses throughout this study). Predictions were evaluated by pixel-wise comparison between the human-annotated LC labels and the model-predicted LC labels. The following evaluation metrics were used, where TP = true positive, FP = false positive and FN = false negative predicted pixels, and
c indexes the LC class (e.g., C1, C2, …):
Further, the overall accuracy of a CNN classifier was computed as:
2.13. Habitat Fragmentation Indices
Wet grassland and rush pasture provides a relevant case study for the application of our model because, whilst fragmented and patchily distributed across the broader landscape, in total it covers a large area within the PDNP. Understanding the patch density, distribution and other structural properties of such fragmented habitats at fine scales could help to inform future management objectives and form the basis of monitoring projects. The most obvious components of this class are the rushes (
Juncus spp.)—all plant names follow [
46]—purple moor grass (
Molinea caerulea) and sedges (
Carex spp.), which are clearly visible in aerial imagery at the scale used here [
47].
To quantify the habitat fragmentation of wet grassland and rush pasture habitat (F3d), we focused on the cluster of F3d habitats in the South West Peak [
32]. We used the GIS layer of Habitat Networks by Natural England [
44] and selected the “Purple Moor Grass & Rush Pasture” habitats that occurred within the PDNP. We then evaluated our model predictions within these `Primary Habitat’ (PH) polygons from Natural England [
44] and, in particular, the model-predicted F3d polygons. (Model-predicted F3d polygons that extended across the boundary of PH polygons were cut off at the boundary except for the analysis of the buffer zone (see below)).
The following metrics were used for the analysis of habitat fragmentation:
Area of F3d in habitat polygon (fraction): total area of model-predicted F3d polygons inside one PH polygon.
Total F3d edge length normalised by habitat area (1/km): sum of edge lengths of model-predicted F3d polygons inside one PH polygon divided by the area of that one PH polygon.
Average nearest neighbour distance (km): average nearest-neighbour distance between model-predicted F3d polygons inside one PH polygon. PH polygons with fewer than two model-predicted F3d polygons were ignored.
Number of predicted F3d polygons: number of model-predicted F3d polygons inside one PH polygon.
Area of habitat polygon (km2): total area of one PH polygon.
Average global isolation (km): average of global distance of all model-predicted F3d polygons inside one PH polygon, where global distance is the mean distance of the focal model-predicted F3d polygon to all other model-predicted F3d polygons (inside that PH polygon).
Habitat polygon edge length (km): total edge length of one PH polygon.
Total area F3d in 50 m buffer (km2): total area of model-predicted F3d within the 50 m buffer zone around one PH polygon.
2.14. Data and Software Availability
The Convolutional Neural Networks (CNNs) were trained in Python 3.7 using Pytorch as the automatic differentiation package [
48]. All code is available at
https://github.com/pdnpa/cnn-land-cover, accessed on 5 November 2023.
QGIS 3.26 and ArcGIS 3.1.2 were used as the GIS software and for map compositions. Other figures were made using matplotlib [
49], plotly and Inkscape 0.92.
We have made the train and test data sets of RGB images and land cover annotations publicly available [
42]. We have also written an interpretation key with example images and descriptions of each habitat [
38]. The model-predicted land cover of the entire PDNP is available upon request.
5. Conclusions
We developed a multi-stage approach for classifying hierarchical LC schemes with large variations in the density of each class. Deconstructing the classification process into multiple steps achieved high accuracy on a large number of LC classes (95% accuracy on main classes, 92% on C, 72% on D and 87% on E), outperforming single-stage semantic segmentation for uneven class distributions. LC was predicted at high resolution (12.5 cm), enabling the identification of small habitat patches such as individual trees, heather patches and scrub. The multi-stage approach was also able to handle complex cosmopolitan habitats such as wet grassland and rush pasture, which occurs both within moorlands and grasslands, by including it in more than one detailed classifier.
Our approach can be used to detect a wide range of habitats from the same aerial image data: from those with a broad species mix and mosaics to single species. This has wide-ranging applications in landscape ecology and biodiversity monitoring, especially in regions where important habitats are small and mixed. This work helps to overcome the current limitations in spatial resolution and habitat detail for understanding species movement and distributions and measuring progress against nature recovery targets, such as those set out in the UK’s “25 Year Environment Plan” and the UN’s “Sustainability Goals”.