1. Introduction
Advances in remote sensing technology have significantly enhanced environmental monitoring by providing extensive spatial coverage and frequent temporal observations, particularly through the utilization of satellite data, which offers a synoptic view crucial for studying large and remote areas and offering cost-effective solutions for studying various phenomena across scales [
1]. Satellite datasets generated from multiple sensors are invaluable but massive and complex, necessitating sophisticated analytical approaches to extract meaningful information [
2].
Machine learning (ML) has emerged as a powerful instrument for leveraging satellite data in environmental applications by enabling efficient processing and analysis of large volumes of data to identify patterns and make predictions that would be impractical with traditional analytical methods [
3]. In previous studies, ML techniques have been instrumental in various environmental monitoring applications, such as detecting volcanic impacts [
4] using satellite data in conjunction with ML algorithms, which has enabled the monitoring of environmental changes such as vegetation cover changes [
5,
6]; land-use and land-cover classification [
2] and even monitoring microalgae [
7,
8]. These applications highlight the versatility and effectiveness of ML in extracting valuable insights from satellite data for environmental studies, demonstrating the wide-ranging impact of the use of remote sensing combined with ML to address environmental challenges, e.g., monitoring of specific environmental parameters, such as vegetation water content [
9], power-line vegetation [
6] and coastal marine debris [
10]. The improvement in the accuracy of ML algorithms has also contributed to more effective decision making and resource management strategies [
11,
12], as well as the development of innovative solutions for environmental monitoring in aquatic ecosystems, such as smartphone-based microalgae monitoring platforms [
8] and cloudiness assessment in marine environments [
13]. These applications highlight the diverse range of fields where ML techniques can be applied to extract valuable insights from remote sensing data for environmental and health-related studies.
Wetlands are dynamic environments that experience seasonal and long-term changes influenced by many factors, such as hydrological fluctuations, climate variability, and anthropogenic activities. They undergo periodic changes due to seasonal flooding, vegetation growth cycles, and long-term shifts in hydrology. The monitoring of wetlands over time has contributed to better understanding of their expansion or loss due factors such as climate change, land-use conversion, and conservation efforts [
14]. Wetlands are characterized by a near or over-surface water table [
15] and are highly productive, with a wide array of critical ecosystem services, making them essential in terms of supporting biodiversity [
16], maintaining ecological balance, and regulating global cycles [
17]. They support many native and rare species [
18] and contribute to water purification and the regulation of the water cycle, nutrients, and climate by acting as carbon sinks [
19], as well as acting as a buffer against natural hazards and disasters, such as floods. It is estimated that 4–6% of the world’s land surface is wetlands, corresponding to approximately 7–9 million km
2 [
20]. However, the definition of wetland varies among sources, leading to variation in the modeling of wetland extents, and global estimates are reported as high as 27 million km
2 [
21]. The vastness and diversity of the landscape and the need for near-continuous assessments make the use of traditional wetland assessment methods more complicated [
22].
The wetlands of New Zealand, once thriving ecosystems, are now facing a concerning decline due to urbanization, agriculture, and changing land use, posing a threat to biodiversity [
23]. Preserving wetlands is critical, yet globally, 71% of the wetlands have been converted to other land uses since 1900 [
24]. In New Zealand, the extent of lost wetlands is above 90%. The remaining wetlands are diverse, with freshwater wetlands comprising primarily bogs (rainfall-fed), fens (rainfall- and groundwater-fed), swamps (groundwater- and surface water-fed), marshes (surface water-fed), and areas of shallow water [
25], as well as saltmarshes and mangroves in coastal areas.
Enhanced monitoring systems are fundamental and critical for wetland conservation. The National Policy Statement for Freshwater Management 2020 (NPS-FM) of New Zealand established that regional councils must identify and map their natural inland wetlands if they are over 0.05 ha in extent [
26], a process that, until now, has been carried out mostly manually. The current manual methods for wetland mapping in NZ are resource-intensive and often time-constraining, depend on local expertise, and vary in methodological standards across regions. Consequently, wetland monitoring could benefit from methods such as automated and semi-automated approaches that combine the use of machine learning with high-resolution satellite imagery and other data. These methods can significantly enhance wetland mapping and classification while facilitating monitoring and decision-making efforts. By leveraging these technologies, New Zealand could establish a nationally consistent wetland inventory that supports NPS-FM implementation.
Remote sensing techniques (including multispectral and hyperspectral imagery), together with geospatial analysis technologies, have been shown to be useful tools for the mapping and monitoring of wetlands dynamics and vegetation classification, allowing for a better understanding of their ecosystems and enabling more effective conservation and management strategies [
14]. Traditional remote sensing methods rely on spectral indices such as the Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) to distinguish wetlands from other land cover types [
27]. Multispectral and hyperspectral imagery from satellites like Landsat, Sentinel-2, and MODIS are used to understand spectral variations associated with wetland vegetation, water presence, and soil moisture conditions [
28]. Moreover, the use of remote sensing methods such as LiDAR (Light Detection and Ranging) has contributed to further assessment and monitoring of different types of wetlands and their water storage capacity [
29].
To help to refine wetlands classification, predicting models can include ancillary GIS contextual datasets such as Digital Elevation Models (DEMs), soil maps, and hydrological layers, which refine classification by accounting for topographic and hydrological constraints [
30,
31]. However, there have been limited studies that have assessed ecologically relevant predictors for wetland detection [
32,
33,
34]. Given the natural complexity of wetland ecosystems and the difficulty of spectrally separating wetlands from other land cover types, using multiple remote sensing data, as well as other data sources and features, such as spectral indices, radar backscatter, topographic variables, and soil properties, can help to improve classification accuracy. Nevertheless, not all features contribute equally to classification performance, and redundant or non-informative features may introduce noise, reducing model efficiency [
35].
Wetland classification accuracy has benefited from the use of machine learning algorithms such as Random Forest (RF), Support Vector Machines (SVMs), and deep learning models like Convolutional Neural Networks (CNNs) [
22,
36]. It has been demonstrated how RF can outperform traditional thresholding and unsupervised classification methods by leveraging a broader range of spectral, textural, and topographic inputs [
37]. Rodriguez-Galiano et al. [
35] found that while SVM performed well in wetland classification, it was more computationally expensive and sensitive to parameter tuning than RF. In addition, algorithms like Histogram-Based Gradient Boosting (HGB) could be efficient and scalable variants of gradient boosting algorithms, particularly well-suited for large-scale, structured datasets. By discretizing continuous features into histograms, HGB reduces memory usage and accelerates training times without compromising predictive accuracy [
38]. This kind of approach could be especially beneficial in data-intensive domains such as environmental monitoring of wetlands, where rapid processing of vast remote sensing datasets is required. Moreover, XGBoost (Extreme Gradient Boosting) is a widely used machine learning algorithm known for its speed, accuracy, and scalability. It extends traditional gradient boosting methods by incorporating system optimizations such as parallelized tree construction and regularization, which help prevent overfitting [
39]. These features make XGBoost particularly effective in handling structured data for classification and regression tasks. Its ability to handle missing data natively, along with customizable objective functions and tree pruning strategies, makes it a powerful tool for large-scale environmental applications and predictive modeling tasks where both performance and interpretability are critical.
Each ML algorithm has different strengths in handling remote sensing and geospatial datasets. Random Forest (RF) is widely used due to its robustness to noise, ability to handle high-dimensional data, and interpretability [
36]. Support Vector Machines (SVMs) offer strong generalization capabilities but can be computationally expensive and require careful hyperparameter tuning [
35]. Gradient boosting methods such as XGBoost and LightGBM excel in handling imbalanced datasets and feature interactions, making them effective for complex classification tasks [
40]. A less commonly explored but promising approach is the Multilayer Perceptron Classifier (MLPC), a type of artificial neural network (ANN) that can capture non-linear relationships in data. Unlike traditional ML models, MLPC learns hierarchical feature representations through multiple layers, making it particularly useful when working with high-dimensional satellite imagery and multi-source geospatial datasets [
22]. The advantage of MLPC lies in its ability to model intricate patterns in spectral, textural, and topographic features, though it requires more computational resources and careful tuning of hyperparameters such as the number of hidden layers and activation functions.
With the increasing availability of high-resolution satellite imagery and the growing complexity of environmental datasets, machine learning has become central to modern ecological monitoring. Classifiers such as Random Forest (RF) and Multi-Layer Perceptron Classifiers (MLPCs) have been widely used for remote sensing applications, including wetland classification. However, their pixel-based nature can be limited in environments like wetlands, where spatial context and structural coherence are critical for distinguishing diverse and fragmented ecosystem types.
Wetland characteristics vary across different spatial scales, from local site-specific hydrological features to broad regional wetland distributions. Comparing classification performance at multiple spatial scales, ranging from high-resolution UAV and aerial imagery to large-scale satellite-derived national and global wetland datasets, could be essential for understanding scale-dependent classification performance. Some features, such as vegetation indices, texture, and radar backscatter, could be more informative, depending on the resolution [
22]. In addition, comparing wetland detection across different spatial and temporal scales, both over multiple years and from fine to large spatial scales, can provide critical insights about water dynamics, intra-annual variation in water availability, and vegetation phenology, which help to reduce classification uncertainty, as well as improving its accuracy, as both measurements are difficult to obtain in single-date imagery [
22]. Additionally, temporal comparisons improve model generalization by training classification models on multi-year datasets, increasing their robustness to seasonal and inter-annual variations in wetland conditions [
34].
Our motivation for the research presented in this paper was to test the suitability of four machine learning methods for the detection and mapping wetlands at a national scale from fine-spatial-resolution optical remote sensing imagery. If possible, this would enable routing, automated monitoring of wetland systems. This would be especially useful for smaller wetlands (<~0.5 ha), which are currently excluded from available data [
41] but which are required by government policy to be monitored [
26]. Our aim was to develop a machine learning model that can rapidly detect wetlands from widely available eight-band Planet SuperDove imagery, alongside ancillary geospatial data such as topographic data, and use it to map wetland extent and likelihood across New Zealand’s diverse landscapes. By assessing feature importance analysis, this study sought to refine wetland classification models, leading to more precise mapping and monitoring, ultimately supporting effective decision making towards wetland conservation. We only used data that is available internationally (both imagery and ancillary data); therefore, the developed methods can be transferred globally.
To achieve our aim, this research was guided by the following two objectives, the first of which is to develop a pixel-based machine learning classification workflow that integrates satellite imagery and geospatial datasets to detect and delineate wetlands across varying landscapes. This workflow evaluates the effectiveness of different remote sensing-derived features, such as spectral indices and topographic variables, in improving classification accuracy through an ecological lens. The second objective is to compare the performance of machine learning algorithms to determine their suitability for wetland detection. This objective assesses the trade-offs between model accuracy, keeping in mind computational efficiency, and potential future improvements of the models.
2. Materials and Methods
The modeling for wetlands detection developed in this research was performed across New Zealand. Wetlands are characterized by the presence of water, hydric soil, and specialized vegetation adapted to a wet environment [
42]. Alongside satellite imagery, important bio-physical wetland characterization features were identified [
43] and represented by geospatial layers used as supporting layers in the machine learning models. Importantly, all the data used is available globally, ensuring transferability of the methods. The data types, sources, and processing are presented below and in
Figure 1. A summary of the data used is provided in
Table 1.
Satellite imagery: Eight-band SuperDove imagery from PlanetScope [
44] was obtained using the Planet API (PSB.SD), including all bands (coastal blue, blue, green I, green, yellow, red, red edge, and near-infrared) and corrected for surface reflectance (analytic_8b_sr_udm2 bundle) [
45]. A mosaic across New Zealand was produced, primarily using images acquired in spring, between mid-September and October 2024. Spring was selected due to the reduced likelihood of snow in mountain areas while being early in the growing season for vegetation, possibly reducing its impact on classification. For locations that had no data (for example, due to cloud), later images from November and December 2024 were included. Images used in the mosaics were prioritized based on the image clarity identified in the image metadata; mosaic pixels received data from only one source image, with no averaging between images. Importantly, this allowed the wetland predictions for each pixel to be based on one image only, with the image identifier included as part of metadata.
Figure 2 shows the mosaic grid across New Zealand and two example grid boxes with an indication of the locations and dates of images used in the wetland classification.
Geospatial data derived from satellite images: The Normalized Difference Vegetation Index (NDVI) and Normalized Difference Water Index (NDWI) were derived from SuperDove images as supporting layers. The NDVI was included to help distinguish between wetlands and agricultural areas with higher expected values; the NDWI was included to help distinguish between wetlands and areas of open water. To calculate the NDVI and NDWI from PlanetScope imagery, we normalized the spectral bands and computed the indices using standard spectral formulas:
where
NIR (near-infrared) is band 8, red is band 6, and green is band 4 in the underlying SuperDove imagery.
Geospatial data from other sources: We included elevation (Forest And Buildings removed Copernicus Digital Elevation Model—FABDEM [
46]), with lower values expected indicate a higher likelihood of wetlands; the Hydrological Soil Group (HySOG) [
47], which helped by indicating areas with a high propensity of soil saturation; saturated soil hydraulic conductivity (Ksat) [
48], which similarly identifies areas with lower hydrological drainage; and the Topographic Wetness Index (TWI), derived from the MERIT and MERIT Hydro datasets [
49,
50]. The TWI is a widely used hydrological index based on terrain characteristics (slope and upstream contributing flow area) that indicates areas that are more likely to have hydrological saturation (areas with both a low slope and high upslope area of contributing flow). We utilized the MERIT DEM elevation tiles to calculate slope in radians, then combined this with the upstream area (UPA), which was extracted from the MERIT Hydro dataset using the following formula:
where
A is the UPA in meters squared and
tanβ is the local slope angle [
51]. The TWI was then calculated on a per-pixel basis. During the process, no-data values were masked, and invalid TWI values resulting from division by zero or negative slopes were handled by assigning a defined no-data value. The resulting TWI layers represented an important hydrological predictor that reflects water accumulation potential across the landscape.
Available wetland data: To train the detection model, GIS data layers of wetlands and wetland types were obtained. The Land Cover Database (LCDBv5) by Maanaki Whenua [
52] provides national-scale land information at a relatively coarse spatial scale, as it was derived from ~20 m Sentinel 2 imagery. In this research, LCDBv5 was used to identify locations as wetland/non-wetland while ensuring that location sampling included a range of land cover types. The sampling scheme is outlined in
Section 2.2.
Table 1.
Data sources and preprocessing: input datasets used for wetland classification model development. The table details the data type, source, and preprocessing steps applied to each layer. Satellite imagery from PlanetScope SuperDove (8-band) formed the base stack, with additional spectral indices (NDVI and NDWI), topographic layers (FABDEM and TWI), and soil properties (HySOG and Ksat) incorporated as ancillary predictors. All datasets were resampled and aligned to a common 3 m spatial resolution. Reference land cover and wetland labels were derived from LCDB v5.0 with wetland updates, and stratified random sampling was used to generate training points for model input.
Table 1.
Data sources and preprocessing: input datasets used for wetland classification model development. The table details the data type, source, and preprocessing steps applied to each layer. Satellite imagery from PlanetScope SuperDove (8-band) formed the base stack, with additional spectral indices (NDVI and NDWI), topographic layers (FABDEM and TWI), and soil properties (HySOG and Ksat) incorporated as ancillary predictors. All datasets were resampled and aligned to a common 3 m spatial resolution. Reference land cover and wetland labels were derived from LCDB v5.0 with wetland updates, and stratified random sampling was used to generate training points for model input.
Data Type | Layer/Index | Source | Preprocessing Notes |
---|
Satellite Imagery | PlanetScope SuperDove 8 bands (Coastal Blue to NIR) | Planet (https://www.planet.com, accessed on 28 July 2025) | Mosaic created from same-day swath; resampled to ~3 m; used as base stack |
Spectral Indices | NDVI and NDWI | Derived from PlanetScope | Normalized and aligned to Planet imagery; single-band COGs |
Topography | FABDEM (DEM) | Copernicus FABDEM (Forest and Buildings removed DEM) [46] | Resampled to 3 m; aligned and stored as float32 COGs |
Topography | TWI (Topographic Wetness Index) | Derived from MERIT DEM [49] and MERIT Hydro [50] | UPA and slope harmonized; invalid values masked; stored as COG |
Soil Properties | HySOG (Hydrologic Soil Group) | HYSOGs250 m, Ross et al., 2018 [47] | Categorical input; encoded using OneHotEncoder; resampled and tiled |
Soil Properties | Ksat (Saturated Hydraulic Conductivity) | Gupta et al., 2021 [48] | Continuous layer; resampled to 3 m; normalized |
Reference Land Cover | LCDB v5.0 + Wetland Type Update | Manaaki Whenua—Landcare Research [52] | Used for training sample generation and label reference |
Training Samples | Stratified random points within LCDB polygons | Derived using ArcGIS Pro 3.1 (Section 2.2) | Buffered internally (20 m); used for feature extraction |
2.1. Data Preparation
To ensure spatial consistency, all input raster datasets were aligned to a common spatial grid based on the Planet ~3 m mosaics. Categorical layers were processed using nearest-neighbor resampling, while continuous layers used cubic interpolation to preserve gradient values. Rasters from multiple sources, including spectral bands, topographic models, soil characteristics, and derived indices, were stacked into unified multiband datasets. Stacked rasters were saved in compressed, tiled formats to optimize read/write efficiency during training and prediction. We used a combination of Python 3.13 libraries to create a data stack. All data stacks were projected in the same coordinate referencing system (CRS: EPSG 2193), and the core spatial resolution of Planet images (~3 m ground resolution) was maintained, with other data resampled as required to match that resolution. To help with data handling, we tiled the data stack to 768 m × 768 m data cubes across New Zealand, resulting in 582,900 files totaling 1.7 TB, each of which had 14 bands including the eight bands of SuperDove, two of spectral indices (NDVI and NDWI), two of topographic characteristics (elevation and TWI), and two for soil (HySOG and Ksat). Data for example areas are shown in
Figure 3.
2.2. Sample Point Generation and Feature Extraction
Stratified random sampling was employed within polygon features that defined known wetland classes to build a representative training dataset. A stratified random sample of 1000 points for each land cover class in LCDBv5 was generated, with a minimum distance constraint of 50 m applied to reduce spatial autocorrelation. This ensured both class balance and geographic diversity across the sample set. We used an internal buffer of 20 m on the LCDBv5 polygons to avoid sampling close to class spatial boundaries, since there is uncertainty in these resulting from their derivation from 20 m satellite imagery.
For each sample point, values from all feature bands in the stacked raster tiles were extracted, resulting in a database of 68,406 points. Only points located within the central “core” region of each tile were considered, avoiding edge effects that may arise from clipping or resampling. Extracted data were stored along with spatial coordinates and class labels for use in model training.
2.3. Machine Learning Model Development
To evaluate the most effective classification strategy for wetland detection in New Zealand, we implemented and compared the effectiveness of four supervised machine learning models: Random Forest (RF), eXtreme Gradient (XG) Boosting (XGB), Histogram-Based Gradient Boosting (HGB), and Multi-Layer Perceptron Classifier (MLPC). These models were selected based on their demonstrated performance in remote sensing applications and complementary strengths. RF has been widely used in ecological modeling due to its robustness, interpretability, and tolerance to overfitting and noisy labels [
36]. XGB is a gradient boosting framework known for its scalability, accuracy, and ability to natively handle missing values and class imbalance through regularization and optimized tree construction [
39]. HGB, a more recent variant of boosting algorithms, reduces training time and memory usage by discretizing continuous variables into histograms, making it highly efficient for large geospatial datasets [
38]. MLPC, a neural network model, excels in capturing complex non-linear interactions between features, offering advantages when spectral variability and high-dimensional data are present [
22].
Each model was trained on a harmonized feature stack comprising eight-band PlanetScope imagery, vegetation indices (NDVI and NDWI), topographic predictors (FABDEM and TWI), and soil hydrological variables (HySOG and Ksat), ensuring a comprehensive representation of wetland characteristics. Categorical variables (e.g., HySOG) were one-hot encoded, while continuous variables were standardized.
For modeling, the scikit-learn Python library (version 1.5) was used, and OneHotEncoder was used for the categorical features (e.g., HySOG). A linear scalar was used for all other continuous variables. Each stage was optimized for accuracy, spatial consistency, and computational efficiency. All candidate machine learning classifiers were trained using the extracted point-level dataset. Prior to training, data types were standardized, and categorical variables (e.g., soil type) were encoded appropriately. Analysis of feature importance was conducted to identify the most informative predictors. The model was validated using stratified hold-out samples, and performance was evaluated based on overall accuracy and class-specific metrics.
The trained classifier was applied across the full set of stacked raster tiles. Each tile was read, features were preprocessed as required, and class predictions were generated on a per-pixel basis, including both wetland/non-wetland classification and probability maps that reflected classification confidence. The outputs were written as single-band classification rasters with consistent metadata, resolution, and spatial extent.
Model performance was assessed using stratified hold-out validation, with evaluation metrics including overall accuracy, precision, recall, and F1 score at both the class and macro levels. This comparative framework allowed us to systematically evaluate trade-offs in classification accuracy, computational efficiency, spatial consistency, and generalizability across models, guiding the selection of scalable methods for national wetland monitoring.
2.4. National-Scale Deployment of the Model Outputs
To support national-scale wetland classification, seamless classification maps spanning all of New Zealand (area: ~268,000 km2) were generated. Predicted tiles were grouped based on spatial proximity and mosaicked into larger continuous rasters. Overlapping tiles were merged using the first valid pixel value rule (i.e., the value of the first non-NA pixel in a stack of overlapping tiles), ensuring that tile boundaries were not visually prominent in the final output. Final mosaics were compressed and saved as cloud-optimized GeoTIFFs. This approach enabled consistent and scalable production of high-resolution wetland maps that align with New Zealand’s national monitoring and policy needs.
3. Results
3.1. Model Evaluation and Comparison
To evaluate the performance of different machine learning models in classifying wetlands from high-resolution PlanetScope imagery and environmental covariates, we tested four classifiers—RF, HGB, XBG, and MLPC—across six different predictor combinations. Three of these included only SuperDove imagery with band different band combinations: red, green, and blue (RGB; bands 6, 4, and 2); RGB and near-infrared (RGBI; bands 6, 4, 2, and 8); and the full eight-band PlanetScope stacks (PSS8B). One combination included the full eight-band images augmented with vegetation indices (NDVI and NDWI), and two combinations included terrain-derived variables (DEM and TWI) and soil properties (Ksat and HySOG).
As illustrated in
Figure 4, overall classification accuracy peaked at 0.89 for the RF, XGB, and HGB models when using the full feature set, while RGB-based combinations plateaued at approximately 0.83. However, overall accuracy masked important differences in class-level performance. Weighted macro F1 scores (
Figure 5) highlighted superior performance for wetland classes when enriched environmental inputs were used. Specifically, using the full combination of predictor variables (PSS8B_NDVI_NDWI_DEM_TWI_Ksat_HySOG) yielded the highest F1 score for wetland classification—up to 0.73 using XGB, compared to ≤0.60 for RGB-based models. This demonstrates that incorporating soil and hydrological layers notably enhances the model’s ability to correctly identify wetlands, which are typically under-represented and spectrally mixed classes in complex landscapes. Complete model performance metrics are provided in
Appendix A.
Across all feature combinations, model-level comparisons revealed subtle but consistent performance differences. HGB and RF generally outperformed MLPC and XGB in terms of both overall accuracy and F1-score consistency, particularly when richer environmental variables were included. HGB exhibited the highest accuracy (0.89) and competitive weighted F1 scores (up to 0.72), suggesting its robustness in capturing both majority- and minority-class patterns. RF showed similar strengths, particularly in preserving class balance across heterogeneous inputs. MLPC, while slightly trailing in overall accuracy, demonstrated stable performance across most band combinations, indicating it may be more resilient to reduced input dimensionality. In contrast, XGB showed the greatest sensitivity to input features—performing well with full stacks (F1 = 0.73) but less reliably with RGB-only inputs (F1 = 0.56). These results suggest ensemble-based tree models (HGB and RF) offer the most balanced trade-off between accuracy and class-level sensitivity, especially for detecting spectrally and structurally complex wetland features.
When using the full spectral and environmental stack (PSS8B_NDVI_NDWI_DEM_TWI_Ksat_HySOG), both HGB and RF achieved high per-class F1 scores for wetlands (0.74 and 0.73, respectively), indicating strong sensitivity and precision in distinguishing wetland areas from non-wetlands. MLPC also showed reasonable performance on wetlands (F1 ≈ 0.71), though was slightly more variable across simpler band combinations. XGB demonstrated comparable wetland F1 performance (0.73) under full-feature conditions but was more affected by reduced inputs, dropping to 0.60 with RGBI and as low as 0.56 with RGB. Notably, while overall accuracy for all models was high, only the HGB and RF models maintained wetland F1 scores above 0.70 consistently across the more complex feature sets, highlighting their robustness in classifying ecologically important but often under-represented wetland categories.
Among the models that included TWI and HySOG predictors, XGB consistently performed the best overall. When only the TWI was added to the feature set, all models showed modest improvements, but XGB achieved a slightly higher wetland F1 score (0.69) and recall (0.63) compared to others, suggesting a stronger sensitivity to terrain-related features. Once HySOG and other hydrological variables were introduced, performance gains became more substantial across the board. In this context, XGB again led, with the highest wetland F1 score of 0.73 and a recall of 0.67, indicating both precision and sensitivity improved markedly. RF and HGB also performed strongly, both reaching an F1 score of 0.72, while MLPC lagged slightly, at 0.71. These patterns suggest that XGB is particularly effective at leveraging hydrological and soil-related inputs for wetland detection, making it the most reliable model in the TWI and HySOG-enhanced scenarios.
The progression of feature sets reveals clear trends in how band combinations influence model performance across all four classifiers. At the base level, models using only the core 8-band PlanetScope composite (Coastal blue to Near infrared) already achieved strong accuracy (~0.87–0.88) and wetland F1 scores around 0.67–0.72. MLPC and RF performed slightly better than HGB and XGB in terms of wetland recall and F1, indicating that even without additional predictors, these models were effective in identifying wetlands. The addition of NDVI and NDWI improved wetland-specific metrics modestly, particularly for RF and XGB, which both saw increases in recall and precision. This suggests that vegetation and moisture indices, derived from the existing spectral bands, offer marginal gains in discriminating wetland areas. Further incorporating FABDEM and the TWI provided incremental gains. Although the overall accuracy remained similar, wetland precision and recall slightly increased for most models. Interestingly, MLPC showed a small dip in recall compared to using only the NDVI/NDWI, highlighting some sensitivity to terrain variables.
The most notable jump came after adding HySOG and Ksat to the input features. All models improved in wetland-related metrics, with RF and XGB reaching F1 scores above 0.72 and wetland recalls around 0.65–0.67. This confirms that soil group and hydraulic conductivity are particularly informative for wetland classification, offering complementary information beyond spectral and vegetation indices. While the base PlanetScope bands provide a solid foundation, the inclusion of hydrological and topographic predictors like the NDVI, TWI, and especially HySOG leads to consistent and meaningful improvements. Among these, XGB consistently ranks highest or ties for best in wetland F1 score, followed closely by RF, demonstrating their robustness across progressively enriched feature sets.
3.2. Feature Importance
The feature importance analysis reveals notable differences between RF and MLPC in how they prioritize features for wetland classification (
Figure 6). In both models, Green I, NDWI, and NDVI rank among the most influential features, confirming that spectral bands and vegetation indices play a key role in distinguishing wetland types. However, the ranking and emphasis of these features differ.
In RF, Green I is the most important feature, followed by the NDWI, TWI, and NDVI (
Figure 6), suggesting that the model relies on both spectral and topographic features. RF also assigns higher importance to TWI and FABDEM, indicating that it integrates topographical information when making classification decisions. On the other hand, MLPC places greater emphasis on Red Edge, which emerges as the most important feature in that model. The ranking of features such as Near-Infrared, Red, and Coastal Blue is also more pronounced in MLPC, reinforcing its reliance on spectral patterns.
When comparing lower-ranked features, HySOG and Ksat consistently appear as the least influential in both models, suggesting that soil properties contribute less to the classification outcome compared to spectral and hydrological indicators. Additionally, the importance of Coastal Blue is relatively higher in MLPC than in RF, indicating that MLPC might be capturing finer spectral details that RF does not prioritize as strongly.
3.3. Wetland Prediction Across New Zealand
Each model was used to produce wetland maps and wetland class likelihoods across the New Zealand scale for September–October 2024. Example predictor variables and wetland predictions from each model are shown in
Figure 7 for an area in northern New Zealand excluded from the training data, so previously unseen by the models. Comparing the LCDBv5 (ground truth data,
Figure 7b) with the basemap image indicates what appears to be an underestimate in wetland extent in LCDBv5 due to a clear riverine wetland area in the northeast of the area. The predictions for each model shown in the third row of
Figure 7 indicate distinct variability in predicted wetland extent, with RF (
Figure 7i) clearly under-predicting extent and HGB (
Figure 7j) and XGB (
Figure 7k) both matching the LCDBv5 extent reasonably well while also predicting riverine wetlands absent from the LCDBv5 data. Although it has been indicated as an important feature, the coarse ~1 km spatial resolution of Ksat (
Figure 7g) causes a loss of spatial fidelity in the predictions, and there is a need for higher resolution, more detailed soil information.
The prediction likelihood maps for each model (
Figure 7m–p) indicate where the models are most confident in wetland prediction, with the highest values occurring in the location of the wetland extent within the LCDBv5 data. These maps indicate good potential for improvement in the binary wetland maps, for example, by softening the classification. In addition, they may be utilized in multi-temporal monitoring of wetland ecosystems by using the prediction likelihood as a proxy for wetland condition or status, although this would need further analysis.
Each model was used to predict wetland extent and likelihood across New Zealand, with these data made available under an open license. An example map for HGB is shown in
Figure 8a, alongside the LCDBv5 wetland extent in
Figure 8b. This indicates a generally good representation of current wetlands on the national scale, although the model predictions contain far more details at the local level due to the ~3 m imagery used, including the detection of small wetlands.
Figure 8c shows the potential area of wetlands in New Zealand [
48,
49], indicating the likely area of historical wetland systems that have been lost following drainage by European settlers since ~1800. Importantly, the model does not appear to over-predict wetlands within areas where they have been lost, indicating that it is able to discern between areas with a propensity for wetlands and those with current wetland systems. However, the total national area for each model (
Figure 9) indicates that wetland extent is likely to be over-predicted, although it is not possible to confirm this due to the spatial scale of the reference data, which misses small wetlands <1 ha [
41] and may mis-represent the edges of wetlands. By increasing the classification probability threshold from 0.5 to 0.66, the area of predicted wetlands decreases by as much as 64% for the random forest model. By smoothing the probability maps using a mean spatial filter with a 3 × 3 kernel, the total area is reduced further. This indicates that there are substantial areas of predicted wetland that are close to the classification threshold and numerous locations where predicted wetlands are only a few pixels in extent (<~0.01 ha). Consequently, it is likely that the models can be improved, particularly if additional training data are available with higher spatial precision.
4. Discussion
The classification results demonstrate that all four machine learning models—RF, HGB, XGB, and MLPC—performed well in detecting wetlands from high-resolution PlanetScope imagery and environmental covariates, with varying strengths. When using the full feature stack (PSS8B_NDVI_NDWI_FABDEM_TWI_Ksat_HySOG), RF and HGB achieved the highest overall accuracy (both at 0.89), followed closely by XGB and MLPC (both at 0.88). For binary wetland classification, the highest F1 scores were recorded by XGB (0.73) and RF/HGB (both 0.72), indicating a strong ability to distinguish wetland from non-wetland areas when comprehensive spectral and terrain inputs were provided. MLPC also showed competitive performance (wetland F1 score of 0.71), despite its relatively lower spatial consistency.
While each model offers unique strengths, ensemble tree methods (especially RF and XGB) provided the most balanced trade-offs between accuracy, class sensitivity, and computational efficiency. MLPC, though more sensitive to input dimensionality and noise, remains a valuable tool for spectrally based generalization. These findings support the potential of automated, pixel-based classifiers for national-scale wetland detection while also emphasizing the need for continued refinement in model design, training data, and spatial post-processing to improve boundary delineation and class generalization. A comparison of these four models is presented in
Table 2.
The difference in performance in classifying overall wetlands can be attributed to several factors. One factor is that variations in wetlands often exhibit high intra-class variability due to differences in vegetation, water levels, and soil types. This variability makes it more difficult for the model to generalize patterns for the overall wetland class, which comprises of many different types, leading to lower precision and recall compared to the broader wetland category, where such distinctions are less necessary.
Classification performance is also influenced by the quality and resolution of the training data. In this study, training labels were derived from the Land Cover Database (LCDBv5), which focuses primarily on terrestrial vegetation classes and provides only a limited representation of wetlands. As a result, wetlands were often captured in the dataset as vegetated areas with wetland context rather than as distinct hydrological or ecological entities. This limited scope, combined with the absence of comprehensive ground-truth data for wetlands, constrained the model’s ability to fully capture the diversity and boundaries of wetland ecosystems. Enhancing the training dataset with field-verified wetland observations would be critical for improving classification accuracy, especially for under-represented or non-vegetated wetlands.
Wetlands are inherently dynamic ecosystems, and their boundaries are often diffuse and transitional. Seasonal fluctuations in water levels, vegetation growth, and hydrological connectivity, along with human-induced changes such as drainage or land conversion, can significantly alter wetland appearance over time. These factors create blurred distinctions between wetland and non-wetland areas, presenting a challenge for pixel-based classification models that rely on static imagery. As a result, even well-performing models may struggle to consistently and accurately delineate wetlands, particularly in regions where seasonal variability or anthropogenic disturbance is high. Testing of the models with images acquired at other times of year (e.g., summer) should be conducted, including assessments of their ability to be used to monitor wetland state. We suggest that changes to the wetland classification probability metric may be used as an indicator of wetland status, but further research is needed to test this.
The differences in feature importance among different models highlight fundamental distinctions in how each model processes information. RF, XGB, and HGB, as decision-tree-based models, tend to prioritize features that create clear hierarchical splits, which explains the high importance assigned to Green I, NDWI, and TWI. This suggests that the tree-based models leverage both spectral and terrain-based indicators, making them particularly effective when elevation and hydrological factors are relevant in distinguishing wetlands from non-wetlands.
In contrast, MLPC assigns greater importance to spectral bands like Red Edge and Near-Infrared, which suggests that neural networks are better at capturing fine-grained spectral variations that might not be explicitly modeled in RF. This difference implies that MLPC identifies non-linear interactions between features, whereas RF relies on feature separability through decision trees. The higher ranking of Coastal Blue in MLPC further suggests that it detects subtle spectral signatures that RF does not exploit as effectively.
The fact that HySOG and Ksat consistently show lower feature importance in wetland detection models suggests that soil properties alone do not strongly differentiate wetlands. However, this does not mean these features are not relevant; rather, they may be highly correlated with other variables or lack sufficient spatial variation to significantly influence model predictions. The use of data with better spatial resolution might help to improve model performance. The low ranking of the TWI in MLPC compared to RF also suggests that neural networks may not leverage topographic features as effectively as RF, possibly because these features contribute to clear separations in decision-tree models but are less influential in the learned representations of a neural network.
These findings suggest that RF is likely better suited for datasets where topographic and hydrological indicators are crucial, while MLPC may provide better classification accuracy when spectral variations are the dominant distinguishing factors. If the goal is to enhance classification performance, a hybrid approach combining both models could be beneficial, leveraging RF for structure-based classification and MLPC for spectrally based differentiation.
Most of the earlier wetland classification using satellite imagery and machine learning used RF algorithms with variable degrees of accuracy [
40,
50]. Some of these studies had different pixel resolutions, such as one national-scale classification for France with 5 m ground resolution and base DEM pixels combined with Sentinel-2 satellite images of 10 m resolution [
40]. Our models are based on the ~3 m resolution of SuperDove 8-band data, with F1 scores between 0.71 and 0.73, compared to the French model’s F1 score of 0.75. However, as noted, if improved ground-truth data for wetland model training become available, this has potential for improvement.
Evaluating model transferability is important to determine whether classification models trained on high-resolution imagery, such as ~3 m SuperDove data, can still perform well when applied to coarser-resolution satellite imagery, such as 10 m Sentinel-2 or 30 m Landsat data. This is particularly relevant for large-scale wetland mapping, where higher-resolution data may not always be necessary. Additionally, wetlands across New Zealand have varying hydrological and vegetation characteristics that appear differently depending on the spatial scale. Therefore, classification models need to be adaptable to varying resolutions to accurately capture these differences [
51].
Comparing MLPC with traditional ML algorithms like RF, as well as boosting methods, provides insights into their respective trade-offs in terms of classification accuracy, computational efficiency, and generalization. Additionally, model comparison helps determine whether deep learning-based methods offer a substantial performance gain over ensemble-based techniques, given the constraints of available training data. By systematically evaluating different algorithms, this research aimed to establish a reliable and interpretable wetland classification model that balances predictive power with ecological relevance for improved wetland monitoring and conservation strategies.
As it is a neural network, MLPC excels over RF by automatically learning feature interactions from the dataset that may be missed by RF without explicit manual feature engineering [
53]. Studies across various domains, from medical diagnosis to cybersecurity, underscore the strength of multi-layer perceptron (MLP) in handling complex non-linear relationships and offering a flexible architectural design [
54,
55,
56]. These findings highlight MLP’s broad applicability and effectiveness in classification tasks, reinforcing its value in capturing intricate data patterns.
The model validation exercise indicated some underlying issues with the training data extracted from LCDBv5. These data were developed using Sentinel-2 satellite imagery, with a reported average location error of approximately 15 m. The higher resolution data from PlanetScope are inherently a mismatch for the pixel-level classification. More sample points are needed to rectify some of this, and to do so, the best alternative may be a ground-truth field map of wetland types from other sources, such as local government agencies, although these maps have limited coverage. Since wetlands have gone through extensive land use change, it would also be helpful to incorporate land use data, as it can explain or restrict false-positive detections where the bio-physical context suggests the presence of wetlands but that have been altered to other uses.
In this context, a consistent, high-resolution national wetland map is urgently needed to support regional governments and conservation agencies in meeting their obligations under the NPS-FM and to address long-standing data gaps in wetland monitoring. While previous efforts, such as the LCDBv5 and the associated Manaaki Whenua wetland layer, provide broad national coverage, they are often limited in spatial resolution and typically classify vegetated wetlands as a single category, without capturing type-specific distinctions [
41,
52]. Internationally, large-scale efforts—such as the 5 m national wetland map of France [
43], the U.S. National Wetlands Inventory (NWI), and global products like WAD2M [
21]—demonstrate the value and feasibility of mapping wetlands at scale yet often rely on coarser imagery or static input layers. These legacy maps also struggle to reflect small, fragmented, or hydrologically dynamic wetland systems. The growing availability of high-resolution satellite imagery, such as SuperDove, and advances in machine learning present an opportunity to develop scalable, spatially explicit wetland maps with greater thematic detail and accuracy [
22,
44]. A national-scale classification framework, built from transferable and automated methods, can reduce manual mapping burdens, improve consistency, and enhance the repeatability of wetland assessments across diverse landscapes and time periods [
14,
57,
58].
Feature importance assessment can be used for optimizing machine learning models by identifying the most relevant variables, thereby reducing computational costs and improving generalization [
36]. Several techniques, such as permutation importance, SHAP (SHapley Additive exPlanations), and recursive feature elimination, have been employed in previous studies to determine which features contribute most to wetland classification [
33,
34]. The inclusion of ecologically relevant features ensures that the classification model aligns with real-world wetland dynamics, enhancing both interpretability and conservation applicability.
The pixel-based modeling approach has some limitations in classifying objects that are not necessarily homogeneous across the entity, which was particularly apparent in each of the model predictions, given the reduction in national area following an increase in the classification probability threshold and spatial filtering of classification probabilities. Vegetated wetlands are one such type of entity where there is lot of internal variation among pixels to classify them as a single object. Deep learning-based AI models may help delineate such land classes. The neural architecture of MLPC models can be used as an initial model for such deep-learning-based convolutional neural network model exercises.
While the pixel-based machine learning models used in this study demonstrate good performance in identifying general wetland presence, a key limitation lies in their inability to delineate precise wetland boundaries. The classification outputs indicate areas likely to contain wetlands based on spectral, hydrological, and topographic predictors but do not inherently represent ecologically meaningful or legally mappable wetland extents. This is particularly relevant in cases where wetlands occur as fragmented or seasonally variable features embedded within heterogeneous landscapes. As a result, while these models are highly effective for screening and broad-scale wetland detection, further refinement—such as object-based image analysis or integration with field-validated delineation datasets—may be necessary to support regulatory applications or restoration planning. Future work should also explore the use of region-based deep learning architectures to enhance the spatial coherence and boundary accuracy of predicted wetland areas.
It is important to note that the definition of wetlands varies across ecological, hydrological, and policy frameworks. In the context of this study, wetlands were defined narrowly as vegetated areas with persistent or seasonally wet conditions, consistent with the dominant wetland classes represented in the LCDB training data. This operational definition excludes non-vegetated wetland types such as open water bodies, ephemeral saturated flats, and mineral wetlands with sparse vegetation. Consequently, the model outputs reflect this focused scope and may not capture the full range of wetland forms present across New Zealand. Future work could broaden this definition to improve representation of hydrologically dynamic and non-vegetated wetland systems.
To address variations in wetland classification, several strategies can be conducted. First, multi-temporal classification can be utilized to incorporate seasonal variability, providing a dynamic view of wetland changes throughout the year. This approach recognizes that wetlands can exhibit significant seasonal changes in vegetation and water levels, which is critical for accurate classification. Secondly, the inclusion of Synthetic Aperture Radar (SAR) data would enhance the classification process. As reported in the review by Adeli et al. [
59], SAR data has proven effective in aiding in wetland classification due to its ability to penetrate cloud cover and provide high-resolution images, regardless of lighting conditions. This makes it particularly valuable for the monitoring of wetlands in cloudy or rainy environments. For example, Varugu et al. [
60] demonstrated the use of SAR for multi-temporal monitoring of a tidal wetland system. Furthermore, SAR imagery with longer wavelengths (L band) can detect wetlands beneath a vegetation canopy, as demonstrated in Amazonia by Hess et al. [
61]. New L-band missions such as NISAR are likely to provide valuable data to aid in wetland monitoring [
62].
Finally, knowledge transfer can be leveraged by feeding the MLPC model outputs to more sophisticated deep learning-based convolutional neural network (CNN) models. This knowledge-transferable modeling approach aims to refine predictive accuracy by utilizing the nuanced pattern recognition capabilities of CNNs, thereby enhancing the overall effectiveness of wetland classification. These steps represent a comprehensive strategy to improve the monitoring and management of wetlands through advanced technological means. The global coverage of Planet data and the use of global supporting datasets provide an opportunity to upscale the machine learning modeling approach used this this paper to the global level, yet with high spatiotemporal resolution, it can be adapted to the local context, as demonstrated in this research.