Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery

Pawar, Satish; Thomasberger, Aris; Bengtson, Stefan Hein; Pedersen, Malte; Timmermann, Karen

doi:10.3390/rs17142518

Open AccessArticle

Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery

by

Satish Pawar

^1,*

,

Aris Thomasberger

¹

,

Stefan Hein Bengtson

^2,3

,

Malte Pedersen

^2,3

and

Karen Timmermann

¹

Section for Coastal Ecology, National Institute of Aquatic Resources, Technical University of Denmark, Kgs. Lyngby, 2800 Copenhagen, Denmark

²

Visual Analysis and Perception Lab, AAU CREATE, Aalborg University, 9000 Aalborg, Denmark

³

Pioneer Center for AI, 1350 Copenhagen, Denmark

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(14), 2518; https://doi.org/10.3390/rs17142518

Submission received: 6 May 2025 / Revised: 13 July 2025 / Accepted: 17 July 2025 / Published: 19 July 2025

(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)

Download

Browse Figures

Versions Notes

Abstract

The accurate and large-scale mapping of seagrass meadows is essential, as these meadows form primary habitats for marine organisms and large sinks for blue carbon. Image data available for mapping these habitats are often scarce or are acquired through multiple surveys and instruments, resulting in images of varying spatial and spectral characteristics. This study presents an unsupervised domain adaptation (UDA) strategy that combines histogram-matching with the transformer-based SegFormer model to address these challenges. Unoccupied aerial vehicle (UAV)-derived imagery (3-cm resolution) was used for training, while orthophotos from airplane surveys (12.5-cm resolution) served as the target domain. The method was evaluated across three Danish estuaries (Horsens Fjord, Skive Fjord, and Lovns Broad) using one-to-one, leave-one-out, and all-to-one histogram matching strategies. The highest performance was observed at Skive Fjord, achieving an F1-score/IoU = 0.52/0.48 for the leave-one-out test, corresponding to 68% of the benchmark model that was trained on both domains. These results demonstrate the potential of this lightweight UDA approach to generalization across spatial, temporal, and resolution domains, enabling the cost-effective and scalable mapping of submerged vegetation in data-scarce environments. This study also sheds light on contrast as a significant property of target domains that impacts image segmentation.

Keywords:

seagrass mapping; high-resolution remote sensing; unsupervised domain adaptation; deep learning; semantic segmentation; unoccupied aerial vehicles

1. Introduction

Coastal ecosystems are highly productive and serve as biodiversity hotspots, due to the presence of submerged aquatic vegetation (SAV) such as seagrasses and macroalgae [1,2,3]. Eelgrass (Zostera marina) is a type of seagrass found along Northern European and North American coasts, forming a habitat for commercially important fish and shellfish species and thereby maintaining nutrient balance and improving water quality [4,5]. These ecologically important vegetated habitats are degrading globally, due to anthropogenic activities resulting in the loss of habitat area [6,7,8,9]. The ecological importance and central role of eelgrass and similar submerged vegetation impose a need for their regular monitoring and mapping. However, unlike the intertidal seagrass species, subtidal seagrass remains submerged during low tides and is often difficult to study due to the interference of water.

The eelgrass found in Danish coastal waters is monitored via in-situ techniques comprising human diver-based visual field surveys and photo/video evidence that is collected using remotely operated vehicles (ROVs) [3,10]. Although these observations are ecologically accurate, they represent only a limited area surrounding the surveyed transects and lack positional accuracy, due to the unavailability of global navigation satellite system (GNSS) data underwater. Seagrass mapping through satellite remote sensing and unoccupied aerial vehicles (UAVs) has shown promising results [10] for limited geographical areas. Low-altitude surveys using UAV cameras can acquire a high level of detail at the cost of geographical coverage since the field of view (FOV) decreases with a decrease in altitude. Multiple aerial surveys, involving sensors with different properties, may be undertaken in order to achieve large-area mapping, leading to the accumulation of datasets collected over time and space. Additionally, environmental conditions like water quality, depth, and seasonal differences in seagrass meadows can result in diverse information [11]. The resulting variations within such datasets result in the spectral shift and spatial scale variations between the distributions.

Conventional pixel-based image analysis (PBIA) and object-based image analysis (OBIA), which rely on conventional machine learning algorithms, are unsuitable for large image datasets. PBIA uses spectral similarity to cluster pixels into predefined classes, while the more advanced OBIA uses additional properties such as shape, texture, and spatial similarity between the pixels to classify objects. However, the appearance of underwater objects is subject to depth and light penetration, which is governed by water quality. Considering the large dataset size generated across multiple sampling instances by the different imaging sensors, the PBIA would need customization for individual images, with the careful selection of training pixels. Similarly, OBIA requires extensive parameter tuning of conventional machine learning algorithms in order to map objects over the same area that are observed in different seasons [12]. Deep learning algorithms based on artificial neural networks have shown potential for generalized learning from large data [13], proving them more reliable for big data analysis. The exceptional performance of DeepLab, a convolutional neural network (CNN), has been demonstrated in mapping eelgrass from UAV images [14]. This better performance from artificial neural networks is achieved at the cost of large training sets and high demand on computational resources [15]. Such large, labeled datasets as are required for training deep learning models are often limited in shallow underwater benthic habitats [16]. Figure 1 provides a summary of these techniques, developed for image segmentation. A robust mapping system capable of handling variations in image properties and space-time differences while learning from minimal data resources is necessary in order to map SAVs over large coastlines.

Domain adaptation is a useful technique to overcome the data limitations in training from labeled but limited datasets (source domains) in order to adapt to a large, unlabeled dataset (target domain). In Denmark, countrywide airborne imaging surveys using airplane-based optical imaging sensors are conducted for administrative purposes, acquiring images at high resolution (12.5 cm). These surveys cover the terrestrial regions of Denmark, along with most of the coastal areas where important habitats are located. Airplane-based imaging sensors at higher altitudes have a wider FOV, capturing greater contextual information but fewer spatial details of the eelgrass. Hence, this airplane-acquired orthophoto imagery forms an ideal target domain for domain adaptation to map large areas, while UAV images collected at low altitude (≈3 cm spatial resolution) and their annotations form a source domain. The two datasets, acquired at different spatial resolutions, were also collected during different seasons and years, capturing eelgrass growth at varying intervals. This temporal and spatial mismatch introduces both spectral shift and scale difference between the two distributions.

This study proposes a novel approach to address this scale variation and spectral shift with a transformer-based image segmentation model, SegFormer, coupled with histogram matching. The self-attention mechanism, which is central to the transformer networks, enables each element in a sequence or image to contextually aggregate information from all other positions within that sequence. The SegFormer architecture utilizes mixed transformer encoders (MiT) to process images in overlapping patches of varying sizes (Figure 1d), enabling hierarchical feature extraction and efficient learning across multiple spatial scales [17]. This property of SegFormer can be leveraged to adapt to the spatial scale difference between the source and target domains. While the SegFormer functions as a resolution-agnostic image segmentation tool, the spectral shift between the two domains remains unresolved. This spectral shift is mitigated using histogram matching, whereby the probability distribution of pixel brightness intensity of a source domain image is adjusted based on that of a reference image from a target domain. This process can be carried out in an unsupervised manner as an image augmentation step prior to training SegFormer, without pairing the domain images in any particular manner. Typically, domain adaptations aimed at segmenting images involve two neural networks working in tandem, a generative adversarial network (GAN) for domain alignment, coupled with an image segmentation model such as a fully convolutional network (FCN) or U-Net [18]. The GAN model’s performance is improved based on the feedback received from the segmentation network’s performance. This dual arrangement of GAN and a segmentation model demands a larger training set for collaborative training. The semi-supervised domain adaptation technique used for cross-site seagrass mapping with WorldView-2 imagery requires few target labels [19]. Although the GAN-based colormapGAN has been applied for spectral shift [20], using histogram matching for this task minimizes the training data and computational resources required. However, to achieve a generalized performance with histogram matching, it is essential to select target domain images that capture diverse underwater conditions across the region. Hence, a systematic analysis is required to evaluate the image properties and test their combinations to identify those images that can yield optimal performance.

The study presents a domain adaptation technique for mapping seagrass over large underwater coastal regions using high-resolution imagery from two distinct sources. Histogram matching, which was employed as an unsupervised domain adaptation (UDA) technique in conjunction with the SegFormer segmentation model, has been evaluated across three underwater regions of Danish estuaries. This study further investigates the influence of target domain images with diverse characteristics to understand their impact on image segmentation. The results offer a promising insight into addressing data scarcity while demonstrating the potential for large-scale seagrass mapping.

2. Materials and Methods

2.1. Source and Target Images

The UAV platform used to acquire the source (VHR) imagery was a DJI Phantom 4 RTK low-weight and consumer-grade quadcopter. Its payload was a 20-megapixel RGB camera equipped with a 1-inch CMOS sensor, featuring an 84° field of view, 8.8 mm/24 mm of focal length, and an aperture range of f/2.8 to f/11. Flights were conducted at four locations, namely, Nykøbing Mors, Lovns Broad, Horsens Fjord, and Nissum Broad (Figure 2①), Table 1 Site 1) at an altitude of 100 m, achieving a ground sample distance (GSD) of ≈3 cm. The number of UAV images collected during the drone flight varied each time. One georeferenced orthomosaic was created for each location by stitching the obtained images using the image processing software Agisoft Metashape Professional^® ver. 1.7.4 (Agisoft, 2023). The Nykøbing Mors site (Figure 2④) was located along the eastern coast of the island of Mors in the Limfjorden, Denmark (56°46′48.1″N 8°51′43.6″E), and is characterized by patchy seagrass growth forms that are caused by different pressure factors such as waves, ice cover during winter, high summer temperatures, sediment perturbation by lugworms, and anthropogenically induced destruction from fishing gear and boating. Eutrophication limits the seagrass growth to a maximum depth of 2.5 m. High levels of eutrophication also play an important role in the study site located along the northeastern coast of Lovns Broad (Figure 2②), which is likewise part of the Limfjorden, in Denmark (56°37′52.1″N 9°13′57.6″E). The Broad is subject to high nutrient loadings, highly organic sediments, and low light conditions, limiting seagrass growth and making the aerial monitoring of seagrass beds exceptionally challenging. The third study area in the Limfjorden was located in Nissum Bredning (Figure 2③), exhibiting similar features, dominated by high levels of eutrophication and low light conditions. The study site in Horsens Fjord is located in a shallow fjord on the east coast of Jutland, Denmark (55°49′46.0″N 9°59′32.6″E). Here, a large-scale seagrass transplantation exercise (51 × 78 m) was carried out in July 2017. The transplantation site was organized in a chessboard pattern, with alternating vegetated and unvegetated squares of 3 × 3 m in water depths of 1.2 to 1.6 m [16].

All UAV images were obtained with a 90° nadir-viewing angle. The front and side image overlaps were set to 75%, and the flight speed was maintained at 3.5 m/s. Ground truthing of the UAV images was performed using a UAV-mounted underwater camera system after each high-altitude flight [21]. These underwater camera images confirmed the presence or absence of eelgrass at set aerial image locations. All flights were planned and executed using the flight mission planning software UgCS ver. 4.7.685.

The target domain airplane imagery (orthophotography) was obtained from the online portal of the Danish Ministry for Data and Infrastructure (SDFI) (https://dataforsyningen.dk/ (accessed on 19 August 2024)). This airplane imagery is collected annually by the SDFI during the spring season at 12.5 cm GSD with 8-bit RGB (red, green, and blue) channels. The georeferenced 1 km × 1 km images are then made publicly available for use in Denmark. The target images are used for accuracy assessment purposes. The primary differences between the source and target domains come from differences in the spatial resolution and brightness values of the RGB channels. Another important factor causing differences is the time of acquiring the images, as vegetation growth is maximum during and after the summer, due to the optimum light available to vegetation. This ample availability of sunlight is also responsible for the growth of other floating vegetation that may overlap the eelgrass canopy, changing its appearance in images.

The underwater light environment is a function of water depth and water constituents, giving rise to absorption and scattering, resulting in light attenuation. The blue wavelength scatters more intensely, while the longer red wavelength is absorbed rapidly. This differential attenuation reduces the color contrast within images, giving a characteristically hazy and uniform appearance. We used contrast as a property to compare the individual channels of images. This contrast is the ratio of difference and the sum of maximum and minimum pixel intensities observed in the image (Equation (1)), where value 1 indicates the highest contrast.

C o n t r a s t = \frac{I_{m a x} - I_{m i n}}{I_{m a x} + I_{m i n}}

(1)

Source domain images showed a contrast of 1 or nearly 1 in the red channel, with the exception of the Nykøbing Mors image (Table 2). The target domain image contrasts were relatively lower than the source image, with the exception of the Skive Fjord image.

2.2. Training Data Preparation

Ground truth segmentation masks for the images were produced according to the annotation protocol (Appendix A) and QGIS version 3.10. The protocol sets a standard for spatial map scale (1:125) and zoom level (100%) during the annotation process. This process was crucial to avoid human-induced bias in ground truth and maintain a consistent level of details in the segmentation masks, irrespective of the image domain. The ground truth segmentation masks are single-channel binary masks containing pixel values 1 and 0 for eelgrass presence and absence, respectively. Pairs of training, validation, and testing samples of pixel size 512 × 512 × 3 (height × width × channel) of images and 512 × 512 pixels of corresponding segmentation masks were obtained. The training, validation, and test data sizes are given in Table 3. Due to the large meadow sizes at some of the training sites (Figure 2①, Horsens Fjord, and Figure 2②, Nykøbing Mors), samples containing 100% of eelgrass cover were present within the dataset. Hence, the surrounding bare sand patches were also included in the analysis to reduce the similarity between the images. The natural variations between the meadows and bare patches present within a meadow also reduce the similarity and spatial autocorrelation.

2.3. Histogram Matching (HM) with SegFormer

Histogram matching, also known as histogram specification, is a commonly used technique for image enhancement in which the pixel frequency distribution of a reference image is used to calibrate the pixel frequency distribution of a given image. In our implementation, we calibrate the histograms of source domain

D_{s} = \{(X_{i}, Y_{i})\}

images based on the distribution of the target domain, where

X_{i}

and

{X_{i}}^{'}

are the source and target images, while

Y_{i}

represents the labels of the source domains. Mathematically, histogram matching for the 3-channel color images is performed by mapping each source image’s cumulative distribution function (CDF)

F_{C}

to the randomly selected target image’s CDF

G_{C}

for each color channel. The matched output is given by the

{G_{C}}^{- 1}

cumulative distribution function for channel c. The steps involved in the study (Figure 3a) and an example of the histogram matching process for image augmentation are shown in Figure 3b. The images of the source and target domains in this study are 8-bit (2⁸)-depth images with pixel brightness intensities between 0 and 255 in each color channel. The histograms of a source domain image are matched with the histograms of the target domain image. The resulting augmented image carries spatial features of the source domain image and spectral features transferred from the target domain. This augmentation of images is applied prior to training SegFormer.

2.4. Experimental Setup

The SegFormer family of models (MiT b0 to b5) is available pre-trained on a large image dataset, ImageNet-1k, comprising 1.2 million images containing 1000 object categories. Hence, a primary test of mit-b0 on the target images was carried out without any training on the eelgrass dataset as a zero-shot test. This was performed to ensure that the model does not carry any prior knowledge of seagrass segmentation and that experiments in this study are the only source of seagrass segmentation knowledge it can acquire. Secondly, an accuracy benchmark is determined by training and testing an Oracle model. The Oracle model provides an idealized scenario to gauge the expected benchmark performance, based on the available datasets and the model’s parameters. This Oracle model was trained with source as well as target (data size = 3560) sample images. Its performance was tested on 15% of the target set (n = 35), which was reserved as a test set, providing the highest threshold for accuracy achievable with domain adaptation experiments.

Testing the domain adaptation setup involved randomly matching the histograms of source domain sample images with the histograms of target domain samples. Since the test samples at the three target domain sites varied (a = 46, b = 169, and c = 20), random sampling from the entire target sample pool would result in a high probability of picking an image from site b. To avoid this bias, the histogram matching experiments were organized into three approaches: one-to-one, leave-one-out, and all-to-all, maintaining an equal probability of selecting target samples irrespective of the number of images taken at the site.

In the one-to-one approach, images from a single site were used for random histogram matching and were tested across all three test sites.
In the leave-one-out approach, images from two sites were used for histogram matching, while the third site served as the test domain.
In the all-to-all approach, images from all the sites were used collectively.

For both the leave-one-out and all-to-all experiments, a site was first selected at random, followed by a random image from that site. This ensured an equal probability of image selection across the sites, regardless of the number of images available at each location. Beyond addressing the sampling bias, these three strategies were also designed to simulate different levels of domain knowledge and operational scenarios in remote sensing. The one-to-one approach reflects a situation where only a single reference image is available, while the leave-one-out approach tests the model’s ability to generalize to unseen domains. The all-to-one strategy evaluates the performance when histogram references are pooled from multiple domains, mimicking a more generalized training setup. These configurations help assess the model’s robustness to spectral variability and domain shift, which are key challenges in large-scale, multi-source habitat mapping. These experiments also provide a baseline to understand if ideal images for histogram matching can be identified. Similar approaches have shown promising results in aerial imagery, where histogram matching was found to be a lightweight yet effective alternative to adversarial domain adaptation methods [22]. Training was carried out with a batch size of 32 for fewer than 25 epochs or a patience of 5 epochs of loss improvement. Learning rate at 0.00006 was used as described in the original SegFormer implementation study [15]. The Adam optimizer was used for all experiments. Training and testing were carried out on a NVIDIA Tesla V100 GPU with 16 Gigabytes of memory accessed within high performance computing (HPC) system.

2.5. Accuracy Assessment and Domain Alignment Comparison

Metrics suitable for image segmentation in remote sensing applications were selected [23]. The precision and recall are measures that are used to evaluate the predicted segmentation with respect to ground truths. The precision indicates how many predicted true positive pixels are actually positive, while the recall indicates the proportion of predicted positive pixels to that of all positive pixels. High recall scores represent better minimization of false negative predictions. The F1-score (Equation (2)) is a harmonic mean of precision and recall, representing both of these measures in a consolidated manner. The F1-scores range between 0 and 1, where 1 indicates all pixels that were accurately predicted, while 0 indicates that no predictions were correct.

F 1 s c o r e = \frac{2 \times T r u e P o s i t i v e s}{(2 \times T r u e P o s i t i v e s + F a l s e N e g a t i v e s + F a l s e P o s i t i v e s)}

(2)

The intersection over union (IoU) (Equation (3)) is a measure of the overlap between the predicted segmentation mask and the ground truth mask. Unlike the F1-score, which evaluates pixel-to-pixel matches, the IoU expresses the spatial match by comparing the shape of the predictions to that of the ground truth. This is calculated by dividing the area of intersection by the area of union between ground truth and predicted segmentation. Similar to the F1-score, the IoU ranges between 0 and 1, where 1, which indicates the exact shape, is predicted as ground truth, while 0 indicates that there is no overlap between predicted and ground truth.

I o U = \frac{Area of intersection}{Area of union}

(3)

The F1-scores and IoU of each test sample were calculated and averaged for the entire test site.

The Bhattacharya distance (Equation (4)) was used to compare the distances between the normalized histograms of a source and its histogram-matched image, along with the target image used for histogram matching. For two discrete probability distributions, P and Q, from two images, we have the following equation:

D_{B} (P, Q) = - l n (B C (P, Q))

(4)

where the Bhattacharya coefficient BC is given by

B C (P, Q) = \sum_{i = 1}^{n} \sqrt{P (i) Q (i)}

while the distance D_B ranges from 0 to 1 for normalized histograms. The average of the Bhattacharya distance was calculated for normalized histograms of the red, green, and blue channels of the source and target images. Additionally, to visualize transformed classes within the above three images, a t-distributed stochastic neighbor-embedding (t-SNE) plot of 100 randomly selected pixels from each class was prepared. The t-SNE visualization projects high-dimensional information onto a two-dimensional unitless plane, providing an intuitive understanding.

3. Results

The study tested histogram matching at three test sites, Horsens Fjord, Skive Fjord, and Lovns Broad (site a, site b, and site c, respectively). The primary test of SegFormer without any fine-tuning on the eelgrass images resulted in sporadic multi-label class predictions, indicating the presence of no previous seagrass segmentation, and any subsequent segmentations obtained are based on the fine-tuning performed during our experiments. The Oracle model produced a mean F1-score = 0.76 and IoU = 0.70. The training time of this Oracle model was observed to be 1 h 12 min.

3.1. One-to-One Tests

The one-to-one tests included the HM from one site and the tests on all sites individually. The highest F1-scores obtained were for site b when site a was used for HM (Table 4). A similar IoU (0.44) was obtained for site b when sites a and c were used for the histograms. Test site c has the lowest accuracy when site a is used for HM. The maximum F1-score obtained is 61% and 62% of that of the Oracle model, respectively. The average training time of all three training sessions was 42 min.

3.2. Leave-One-Out

The highest accuracy was observed at site b (Table 5) among all the sites. The metrics obtained for site b with the leave-one-out approach were the highest of all three types of experiments performed in this study. The accuracy observed for test site c was higher than for one-to-one (Table 2) and all-to-one (Table 5). The F1-score and IoU obtained for site b were 68% of the benchmark Oracle’s F1-score and IoU. The average training time of all three training regimes was 41 min.

3.3. All-to-One

Similar to the one-to-one and leave-one-out experiments, test site b was found to have the highest accuracy with the all-to-one approach (Table 6). The accuracy of site c was observed to be the least when all sites were used for histogram matching. The all-to-one approach showed the lowest accuracy metrics for all three sites, as compared to the one-to-one and leave-one-out tests. The highest F1-scores and IoU values obtained in this test at site b were 53% and 55%, respectively, of that of benchmark performance. The training time of the model was 41 min.

3.4. Domain Alignment Comparison and Limitations

The t-SNE plot (Figure 4) was used to visualize the domain gap alignment between the pixels from the UAV and the orthophotography. The t-SNE transforms high-dimensional pixel data (two classes of pixels from three images of 3 color channels) into two-dimensional unitless coordinates. The pixels of eelgrass and sand classes sampled from the UAV image (Figure 4: gray and brown dots) show a distinct separation from pixels of the same classes obtained from orthophotos (Figure 4: pink and green dots) before histogram matching. The UAV pixels after histogram matching (Figure 4: cyan and orange) are seen to be aligned with those from orthophotos.

This transformation between the pixels before and after histogram matching was quantified using the Bhattacharya distance (D_B). The D_B measures the similarity between two probability distributions, where values close to 0 indicate a similarity or high overlap, while values close to 1 indicate the least overlap. In this study, D_B was estimated for histograms of two randomly picked UAV and orthophotography images before and after histogram matching. The distance prior to histogram matching was found to be 0.81, while after, it was 0.13, with a reduction of 0.68. Some of the test images and respective predictions are visualized in Figure 5. The F1-score and IoU area were averaged at each test site.

Certain misclassifications were observed at the maximum depth in the images, as well as in the shallow regions of the image where exposed sand appears different from the regions with eelgrass. Prediction at the maximum depth region of the Skive Fjord image (Figure 6a) resulted in an eelgrass region being predicted as sand. Another misclassification was observed at shallow areas with some submerged pixels being classified as eelgrass (Figure 6b).

3.5. Model Implementation at Lovns Broad

To demonstrate the system’s mapping capability over large coastal areas, the model with the highest accuracy during the leave-one-out test (F1-score = 0.52) was used to map the eelgrass at Lovns Broad, which is an enclosed part of the Limfjorden fjord system (Figure 7). The coastal region with a depth of less than 3 m was selected for the prediction, excluding the deeper region (>3 m), considering the depth limit of eelgrass in Danish waters. The orthophotography images covering the region were prepared in subsets to generate a prediction map of eelgrass. The result shows the eelgrass mapped over a large coastal area, using the domain-adapted model. The unprocessed region included terrestrial land surface or deep water, as mentioned above.

4. Discussion

The study demonstrates the mapping of seagrass meadows spread over large areas with a single deep learning model and two multi-source datasets. Although high-resolution imaging sensors with UAVs have been widely used for mapping seagrass using the PBIA and OBIA methods [24,25,26], as well as deep learning [27], their limitation lies in their inability to cover large areas at once [28,29]. Both multi-rotor and fixed-wing types of UAVs are limited in comparison with manned aircraft in terms of aerial coverage, due to battery power limitation and the visual line of sight (VLoS) required for flying drones, rendering these less applicable for seagrass management and conservation [30,31]. Also, the spatial bias in flying UAVs may restrict the mapping to areas known to have the presence of seagrass and accessibility to VLoS, excluding other potential seagrass meadows. Satellite imagery, on the other hand, is able to capture images at high resolution but may suffer from interference from sun glint and cloud cover and requires sophisticated atmospheric correction prior to use [32]. Moreover, users may not opt for using commercial satellite imagery, due to the budgetary requirements for large-scale, high-resolution satellite images [33].

While the methodological components of this study (SegFormer and histogram matching) are established individually, their integration into mapping submerged aquatic vegetation across multi-source, multi-resolution datasets is novel in the context of coastal habitat monitoring. The approach offers a lightweight, scalable alternative to more complex domain adaptation techniques, making it accessible for ecological applications. The spatial patterns observed, particularly the consistent performance seen at Skive Fjord, suggest that image contrast and water clarity are key factors influencing eelgrass detectability. These insights can inform future monitoring strategies and image acquisition planning for coastal management.

By leveraging the technique described in this study, it becomes possible to utilize image data that are available from various platforms for targeting larger areas, with images collected over a longer duration. This is crucial to investigate changes in habitat and relate those to the environmental factors of a water body. The overall averaged metrics reported here for each study site are impacted by low accuracy in some of the images. This is observed in Figure 5c,d, where the predictions may be acceptable for mapping applications but may still be considered substandard from an image analysis perspective. Despite the fact that the F1-score and IoU are common metrics of image segmentation measures, the IoU considers the shapes of the prediction evaluations to be harsher than the F1-score and, hence, results in values lower than the F1-score. The maximum accuracy achieved in the tests is 68% of benchmark accuracy, as demonstrated by the Oracle model. Although this is not substantially closer to the benchmark, it also indicates that the model was not overfitted to the existing dataset. This is crucial to achieve generalized predictions in time and space. The 68% reduction in Bhattacharya distance after matching the histograms indicates a significant domain gap reduction, which is also visualized by the t-SNE plot (Figure 4). The Bhattacharya distance can be an applicable metric by which to quantify the level of spectral shift between the available domains.

The advantage of SegFormer + HM lies in its simplicity of implementation, without additional neural networks and large datasets. Other state-of-the-art methods of domain adaptation were not tested alongside histogram matching, as similar investigations for terrestrial remote sensing imagery of the same spatial resolution but different spectral shift found histogram matching to impart a similar performance to that of the adversarial networks colormapGAN and CycleGAN [34]. Secondly, the time required for each test shows the training time for SegFormer + HM, which is dependent on training data size and is unaffected by the number of target domain images used for histogram matching.

4.1. Limitations and Constraints

The results at individual sites showed site b (Skive fjord) to have the highest accuracy for both metrics in all tests. This consistently better accuracy indicates that the accuracy is dependent on the image properties being tested. Therefore, estimating contrasts and the Bhattacharya distance of image histograms prior to histogram matching may prove useful. Matching the histogram from an image with low contrast in order to segment a higher-contrast image yielded better accuracy than using a higher-contrast image to segment an image with lower contrast. The histogram matching capability in reducing the domain gap is attributable to the fact that the same habitat with similar spectral features is captured in the two domains. However, intertidal regions of sparse seagrass and sandy substratum may produce extreme spectral dissimilarity, degrading the performance of histogram matching.

Water turbidity can be one of the factors influencing the image contrasts affecting image quality [35]. Since the Skive Fjord (site b) and Lovns Broad (site c) sites exist within the same water body, significant differences in water quality at the two sites are likely to be the cause of this image difference. Among the physical properties of the locations, the primary factor influencing the accuracy of individual images observed is depth. The maximum and minimum depths at which eelgrass was observed in the images of site a are 2.77 and 0.9 m, while at site b, it is 2.3 and 0.4 m, respectively. As the depth increased, segmentation accuracy decreased due to the reduced light conditions in the water (Figure 6a). Similarly, in very shallow waters (depth < 0.5 m), the difference in exposed and submerged sand also resulted in misclassification at site b (Figure 6b), even though the overall accuracy was high for the leave-one-out test.

The biological properties of eelgrass plants and their density may also affect the segmentation accuracy. The growth period of submerged plants occurs when optimum levels of sunlight are available from May to October [36]. Since the orthophoto images are collected during the spring season, we expected uniform growth of eelgrass within the test orthophotos. The surface waves generated by winds have been observed to be a cause of salt-and-pepper noise (speckle) in images, where the sides of waves receiving sunlight appear brighter than the other side. Although this does not affect segmentation significantly, knowledge of the wind speed forecasts can be beneficial for planning UAV flights.

4.2. Potential for Coastal Habitat Mapping

The wide coverage of coastal areas can also be obtained by using very high-resolution satellites, such as PlanetScope and Pléiades. This setup of SegFormer + HM, along with high-resolution satellite images, can be extended to map other marine habitats. This can prove economical for mapping vast tropical marine habitats such as coral reefs and seagrasses, where water clarity is superior to that in temperate regions. However, matching the histograms of UAV images from satellite images would require prior adjustment to the bit depth of UAV images, as satellite images are acquired with 12- to 16-bit depth, while the UAV cameras are generally limited to 8-bit depth. The scalability of this SegFormer + HM would be based on this bit-depth difference (radiometric resolution), as well as spatial resolution. Secondly, the optical characteristics of water with respect to suspended particles, phytoplankton, CDOM, and wind speed [33] impacting the detectability of eelgrass need to be further investigated, in order to strengthen the mapping capabilities of this setup. The in situ measurement of turbidity during orthophotography flights can give insight into the levels of turbidity affecting the detectability of eelgrass.

Considering that eelgrass depth limits are the bioindicator of water clarity [37], estimating this depth limit from this mapping process would be of high ecological significance. Another feature that can be extended is to map habitats that can be explored using multi-label training sets. Eelgrass vegetation has been observed to occur with other submerged vegetation, such as macroalgae and shellfish organisms like blue mussels (Mytilus edulis).

5. Conclusions

This study demonstrates a practical and scalable approach for mapping subtidal seagrass habitats using unsupervised domain adaptation (UDA) with the SegFormer model and histogram matching. The method effectively addresses the challenges posed by spatial resolution differences and spectral shifts in multi-source aerial imagery. By training the model on UAV-acquired high-resolution imagery and applying the model to lower-resolution orthophotos, we achieved meaningful segmentation results across three Danish estuarine locations: Horsens Fjord, Skive Fjord, and Lovns Broad.

Among the tested sites, Skive Fjord yielded the highest performance in the leave-one-out setup, achieving an F1-score of 0.52 and an IoU of 0.48, which corresponds to 68% of the benchmark model’s performance (F1 = 0.76, IoU = 0.70). These results highlight the potential of this UDA strategy for generalization across spatial, temporal, and resolution domains, even when trained on limited labeled data. The study also found that histogram matching from lower-contrast images improved the model’s performance on higher-contrast targets, providing a practical guideline for selecting reference images in future applications.

While the absolute accuracy metrics may not appear high, as overfitting to the current dataset was avoided, this approach enables the generalized large-scale mapping of underwater vegetation in data-scarce environments, offering a cost-effective alternative to traditional methods. The successful mapping of eelgrass in Lovns Broad, using a model trained on other sites, further demonstrates the method’s transferability and operational potential.

This work contributes a simple yet effective UDA pipeline for coastal habitat mapping, with implications for monitoring and managing seagrass ecosystems across broad spatial and temporal scales.

Author Contributions

Conceptualization, S.P., K.T., and A.T.; data curation, S.P., A.T., M.P., and S.H.B.; formal analysis, S.P., A.T., S.H.B., and M.P.; investigation, S.P., S.H.B., and M.P.; methodology, S.P., S.H.B., and M.P.; validation, S.P., A.T., S.H.B., and M.P.; supervision, K.T.; writing—original draft, all authors; writing—review and editing, all authors. All authors have read and agreed to the published version of the manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Data Availability Statement

The original data presented in the study are openly available on HuggingFace at https://doi.org/10.57967/hf/5205.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Annotation Protocol for the Segmentation of Shallow Water Eelgrass Habitats

Image selection:

Images of areas with prior knowledge of underwater features are selected.

2.: Reproject to UTM:

Geo-referenced images of the above areas are re-projected in UTM projections.

3.: Use of in situ observations:

In situ point information is overlaid on images if available.

4.: Set the visualization for annotation:

Visualization properties during segmentation are pre-decided, based on airplane orthophoto resolution (12.5 cm), and are kept fixed during the annotation process. An image scale of 1:125 is used, i.e., 1 cm on the computer screen is 250 cm of ground distance. The zoom level is maintained at 100%.

5.: Inclusion/exclusion:

Eelgrass patches of a diameter of less than 1 m are excluded from annotation. Similarly, sand patches of less than 1 m are excluded.

6.: Create vector annotations in the UTM projection:

Geographic vectors carrying value 1 for eelgrass and 0 for sand/background are annotated over the image, while maintaining the properties set out above. Vector files are transformed into raster images.

7.: Check the spatial overlap of the raster:

The UAV image and its corresponding annotated raster image are checked for their spatial overlap. Patches of image annotations are cropped, ensuring that no overlap occurs in the patches.

References

Barbier, E.B.; Hacker, S.D.; Kennedy, C.; Koch, E.W.; Stier, A.C.; Silliman, B.R. The Value of Estuarine and Coastal Ecosystem Services. Ecol. Monogr. 2011, 81, 169–193. [Google Scholar] [CrossRef]
Duarte, C.M.; Cebrián, J. The Fate of Marine Autotrophic Production. Limnol. Oceanogr. 1996, 41, 1758–1766. [Google Scholar] [CrossRef]
Duffy, J.E.; Benedetti-Cecchi, L.; Trinanes, J.; Muller-Karger, F.E.; Ambo-Rappe, R.; Boström, C.; Buschmann, A.H.; Byrnes, J.; Coles, R.G.; Creed, J.; et al. Toward a Coordinated Global Observing System for Seagrasses and Marine Macroalgae. Front. Mar. Sci. 2019, 6, 317. [Google Scholar] [CrossRef]
Cole, S.G.; Moksnes, P.O. Valuing Multiple Eelgrass Ecosystem Services in Sweden: Fish Production and Uptake of Carbon and Nitrogen. Front. Mar. Sci. 2016, 2, 121. [Google Scholar] [CrossRef]
Plummer, M.L.; Harvey, C.J.; Anderson, L.E.; Guerry, A.D.; Ruckelshaus, M.H. The Role of Eelgrass in Marine Community Interactions and Ecosystem Services: Results from Ecosystem-Scale Food Web Models. Ecosystems 2013, 16, 237–251. [Google Scholar] [CrossRef]
Griffiths, L.L.; Connolly, R.M.; Brown, C.J. Critical Gaps in Seagrass Protection Reveal the Need to Address Multiple Pressures and Cumulative Impacts. Ocean Coast. Manag. 2020, 183, 104946. [Google Scholar] [CrossRef]
Orth, R.J.; Dennison, W.C.; Duarte, C.M.; Fourqurean, J.W.; Heck, K.L.; Hughes, A.R.; Kendrick, G.A.; Kenworthy, W.J.; Short, F.T.; Waycott, M.; et al. A Global Crisis for Seagrass Ecosystems. Bioscience 2006, 56, 987–996. [Google Scholar] [CrossRef]
Valdemarsen, T.; Canal-Verges, P.; Kristensen, E.; Holmer, M.; Kristiansen, M.D.; Flindt, M.R. Vulnerability of Zostera marina Seedlings to Physical Stress. Mar. Ecol. Prog. Ser. 2010, 418, 119–130. [Google Scholar] [CrossRef]
Waycott, M.; Duarte, C.M.; Carruthers, T.J.B.; Orth, R.J.; Dennison, W.C.; Olyarnik, S.; Calladine, A.; Fourqurean, J.W.; Heck, K.L.; Hughes, A.R.; et al. Accelerating Loss of Seagrasses across the Globe Threatens Coastal Ecosystems. Proc. Natl. Acad. Sci. USA 2009, 106, 12377–12381. [Google Scholar] [CrossRef] [PubMed]
Lønborg, C.; Thomasberger, A.; Stæhr, P.A.U.; Stockmarr, A.; Sengupta, S.; Rasmussen, M.L.; Nielsen, L.T.; Hansen, L.B.; Timmermann, K. Submerged Aquatic Vegetation: Overview of Monitoring Techniques Used for the Identification and Determination of Spatial Distribution in European Coastal Waters. Integr. Environ. Assess. Manag. 2021, 18, 892–908. [Google Scholar] [CrossRef] [PubMed]
Larkum, A.W.D.; Kendrick, G.A.; Ralph, P.J. Seagrasses of Australia: Structure, Ecology and Conservation; Springer: Cham, Switzerland, 2018; ISBN 9783319713540. [Google Scholar]
Thomasberger, A.; Nielsen, M.M.; Flindt, M.R.; Pawar, S.; Svane, N. Comparative Assessment of Five Machine Learning Algorithms for Supervised Object-Based Classification of Submerged Seagrass Beds Using High-Resolution UAS Imagery. Remote Sens. 2023, 15, 3600. [Google Scholar] [CrossRef]
Ahmed, S.F.; Bin Alam, M.S.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.B.M.; Gandomi, A.H. Deep Learning Modelling Techniques: Current Progress, Applications, Advantages, and Challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Tallam, K.; Nguyen, N.; Ventura, J.; Fricker, A.; Calhoun, S.; O’Leary, J.; Fitzgibbons, M.; Robbins, I.; Walter, R.K. Application of Deep Learning for Classification of Intertidal Eelgrass from Drone-Acquired Imagery. Remote Sens. 2023, 15, 2321. [Google Scholar] [CrossRef]
Bhatnagar, S.; Gill, L.; Ghosh, B. Drone Image Segmentation Using Machine and Deep Learning for Mapping Raised Bog Vegetation Communities. Remote Sens. 2020, 12, 2602. [Google Scholar] [CrossRef]
Lowe, S.C.; Misiuk, B.; Xu, I.; Abdulazizov, S.; Baroi, A.R.; Bastos, A.C.; Best, M.; Ferrini, V.; Friedman, A.; Hart, D.; et al. BenthicNet: A Global Compilation of Seafloor Images for Deep Learning Applications. Sci. Data 2025, 12, 230. [Google Scholar] [CrossRef] [PubMed]
Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.M.; Luo, P. SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers. Adv. Neural Inf. Process. Syst. 2021, 15, 12077–12090. [Google Scholar]
Hoffman, J.; Tzeng, E.; Park, T.; Zhu, J.Y.; Isola, P.; Saenko, K.; Efros, A.A.; Darrell, T. CyCADA: Cycle-Consistent Adversarial Domain Adaptation. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholm, Sweden, 10–15 July 2018; Volume 5, pp. 3162–3174. [Google Scholar]
Islam, K.A.; Hill, V.; Schaeffer, B.; Zimmerman, R.; Li, J. Semi-Supervised Adversarial Domain Adaptation for Seagrass Detection Using Multispectral Images in Coastal Areas. Data Sci. Eng. 2020, 5, 111–125. [Google Scholar] [CrossRef] [PubMed]
Tasar, O.; Happy, S.L.; Tarabalka, Y.; Alliez, P. ColorMapGAN: Unsupervised Domain Adaptation for Semantic Segmentation Using Color Mapping Generative Adversarial Networks. IEEE Trans. Geosci. Remote Sens. 2020, 58, 7178–7193. [Google Scholar] [CrossRef]
Thomasberger, A.; Nielsen, M.M. UAV-Based Subsurface Data Collection Using a Low-Tech Ground-Truthing Payload System Enhances Shallow-Water Monitoring. Drones 2023, 7, 647. [Google Scholar] [CrossRef]
Ghosh Mondal, T.; Shi, Z.; Zhang, H.; Chen, G. Class-Wise Histogram Matching-Based Domain Adaptation in Deep Learning-Based Bridge Element Segmentation. J. Civ. Struct. Health Monit. 2025, 15, 1973–1989. [Google Scholar] [CrossRef]
Maxwell, A.E.; Warner, T.A.; Guillén, L.A. Accuracy Assessment in Convolutional Neural Network-Based Deep Learning Remote Sensing Studies—Part 2: Recommendations and Best Practices. Remote Sens. 2021, 13, 2450. [Google Scholar] [CrossRef]
McKenzie, L.J.; Langlois, L.A.; Roelfsema, C.M. Improving Approaches to Mapping Seagrass within the Great Barrier Reef: From Field to Spaceborne Earth Observation. Remote Sens. 2022, 14, 2604. [Google Scholar] [CrossRef]
Hamad, I.Y.; Staehr, P.A.U.; Rasmussen, M.B.; Sheikh, M. Drone-Based Characterization of Seagrass Habitats in the Tropical Waters of Zanzibar. Remote Sens. 2022, 14, 680. [Google Scholar] [CrossRef]
Krause, J.R.; Hinojosa-Corona, A.; Gray, A.B.; Watson, E.B. Emerging Sensor Platforms Allow for Seagrass Extent Mapping in a Turbid Estuary and from the Meadow to Ecosystem Scale. Remote Sens. 2021, 13, 3861. [Google Scholar] [CrossRef]
Tahara, S.; Sudo, K.; Yamakita, T.; Nakaoka, M. Species Level Mapping of a Seagrass Bed Using an Unmanned Aerial Vehicle and Deep Learning Technique. PeerJ 2022, 10, e14017. [Google Scholar] [CrossRef] [PubMed]
Carpenter, S.; Byfield, V.; Felgate, S.L.; Price, D.M.; Andrade, V.; Cobb, E.; Strong, J.; Lichtschlag, A.; Brittain, H.; Barry, C.; et al. Using Unoccupied Aerial Vehicles (UAVs) to Map Seagrass Cover from Sentinel-2 Imagery. Remote Sens. 2022, 14, 477. [Google Scholar] [CrossRef]
Price, D.M.; Felgate, S.L.; Huvenne, V.A.I.; Strong, J.; Carpenter, S.; Barry, C.; Lichtschlag, A.; Sanders, R.; Carrias, A.; Young, A.; et al. Quantifying the Intra-Habitat Variation of Seagrass Beds with Unoccupied Aerial Vehicles (UAVs). Remote Sens. 2022, 14, 480. [Google Scholar] [CrossRef]
Jessin, J.; Heinzlef, C.; Long, N.; Serre, D. A Systematic Review of UAVs for Island Coastal Environment and Risk Monitoring: Towards a Resilience Assessment. Drones 2023, 7, 206. [Google Scholar] [CrossRef]
Elma, E.; Gaulton, R.; Chudley, T.R.; Scott, C.L.; East, H.K.; Westoby, H.; Fitzsimmons, C. Evaluating UAV-Based Multispectral Imagery for Mapping an Intertidal Seagrass Environment. Aquat. Conserv. Mar. Freshw. Ecosyst. 2024, 34, e4230. [Google Scholar] [CrossRef]
Roelfsema, C.M.; Lyons, M.; Kovacs, E.M.; Maxwell, P.; Saunders, M.I.; Samper-Villarreal, J.; Phinn, S.R. Multi-Temporal Mapping of Seagrass Cover, Species and Biomass: A Semi-Automated Object Based Image Analysis Approach. Remote Sens. Environ. 2014, 150, 172–187. [Google Scholar] [CrossRef]
Nahirnick, N.K.; Reshitnyk, L.; Campbell, M.; Hessing-Lewis, M.; Costa, M.; Yakimishyn, J.; Lee, L. Mapping with Confidence; Delineating Seagrass Habitats Using Unoccupied Aerial Systems (UAS). Remote Sens. Ecol. Conserv. 2019, 5, 121–135. [Google Scholar] [CrossRef]
Yaras, C.; Kassaw, K.; Huang, B.; Bradbury, K.; Malof, J.M. Randomized Histogram Matching: A Simple Augmentation for Unsupervised Domain Adaptation in Overhead Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 1988–1998. [Google Scholar] [CrossRef]
Huber, S.; Hansen, L.B.; Nielsen, L.T.; Rasmussen, M.L.; Sølvsteen, J.; Berglund, J.; Paz von Friesen, C.; Danbolt, M.; Envall, M.; Infantes, E.; et al. Novel Approach to Large-Scale Monitoring of Submerged Aquatic Vegetation: A Nationwide Example from Sweden. Integr. Environ. Assess. Manag. 2022, 18, 909–920. [Google Scholar] [CrossRef] [PubMed]
Eriander, L. Light Requirements for Successful Restoration of Eelgrass (Zostera marina L.) in a High Latitude Environment–Acclimatization, Growth and Carbohydrate Storage. J. Exp. Mar. Bio. Ecol. 2017, 496, 37–48. [Google Scholar] [CrossRef]
Krause-Jensen, D.; Greve, T.M.; Nielsen, K. Eelgrass as a Bioindicator under the European Water Framework Directive. Water Resour. Manag. 2005, 19, 63–75. [Google Scholar] [CrossRef]

Figure 1. Summary of the image segmentation methods applicable for habitat mapping. (a) The pixel-based algorithm groups the pixels based on their spectral similarity. (b) Object-based methods go beyond pixels but are only applicable to a limited dataset. (c) CNN-based networks provide high accuracy and robust implementation. (d) Transformer-based networks such as SegFormer are more scalable and robust than CNNs.

Figure 2. Locations of the training and testing images acquired in coastal water bodies in the Limfjord (top left) and the Horsens and Vejle Fjords (top right) in Danish coastal waters. The numbers within red circles (○) indicate training image (1–4) locations, and the letters within green circles (○) indicate test image (a–c) locations, while the black box (□) over the Limfjorden map indicates the Lovns Broad area.

Figure 3. (a) Flowchart of the steps involved in the study and (b) an example of the histogram-matching process between the source and target domains. The source image (top left) obtained by the UAV at the eelgrass transplantation site of Horsens Fjord indicates a maximum brightness intensity of >150 in its green channel, as seen in the histogram. The target domain (top right) obtained by the airplane camera over the same location contains a lower brightness intensity of < 100 in all channels. This image is used to augment the source image via histogram matching. The annotated ground truth depicts eelgrass (green) and sand (yellow), along with the augmented image output that is used as a training set for SegFormer.

Figure 4. The eelgrass and sand pixels in the RGB space of the UAV and airplane-ortho, along with the histogram-matched UAV image (HM) visualized in a two-dimensional space. The axes represent unitless coordinates for the low-dimensional transformation.

Figure 5. Examples of test images (left column) and predictions, together with their F1-scores (middle column), compared alongside the respective ground truths (right column).

Figure 6. Examples of test images (left column) and misclassifications observed in the predictions (middle column), along with the respective ground truths (right column).

Figure 7. Area of the Lovns Broad (red rectangle) located in Limfjorden, mapped using the unsupervised domain adaptation. The eelgrass is represented in green, while sand is represented using yellow, along with locations with no data from the area.

Table 1. Capture locations of the images, along with maximum depth, image area, eelgrass cover, and the year of collection of the source and target domain images obtained over Danish coastal waters.

UAV Image Locations (Source Domain)							Orthophoto Image Locations (Target Domain)
Site	Location	Max Depth (m)	Image Area (m²)	Eelgrass Cover (m²)	Percentage of Eelgrass Cover (%)	Month and Year of Collection	Site	Location	Max Depth (m)	Image Area (m²)	Eelgrass Cover (m²)	Percentage of Eelgrass Cover (%)	Year of Collection
1	Horsens Fjord	3	564,045.2	56,280	10	August 2020	a	Horsens Fjord	3	184,416	38,048	20.6	2023
2	Lovns Broad	1	98,675.42	40,030.8	40	Feb 2021	b	Skive Fjord	2.1	728,894.05	179,373.09	24.06	2023
3	Nissum Broad	1.1	147,839.6	24,284.84	16	July 2020	c	Lovns Broad	1	98,673.85	54,265.54	55	2021
4	Nykøbing Mors	0.9	88,393	60,955.23	68	April 2021

Table 2. Contrasts in the color channels of the source and target domain images.

Channels	Source Domain (UAV)				Target Domain (Orthophoto)
Channels	Horsens Fjord	Lovns Broad	Nissum Broad	Nykøbing Mors	Horsens Fjord	Skive Fjord	Lovns Broad
Red	1	0.95	1	0.67	0.73	1	0.75
Green	1	0.82	0.91	0.56	0.61	0.68	0.67
Blue	0.95	0.77	0.86	0.6	0.53	0.6	0.63

Table 3. Training, testing, and validation sample sizes from each site.

Site	Location	Training and Validation Samples	Site	Location	Test Samples
1	Horsens Fjord	1756	a	Horsens Fjord (airplane)	45
2	Lovns Broad	256	b	Skive Fjord (airplane)	169
3	Nissum Broad	661	c	Lovns Broad (airplane)	20
4	Nykøbing Mors	148

Table 4. Average of F1-scores and IoU values obtained for one-to-one histogram matches and tests.

	Test Sites
	a		b		c
HM Site	Mean F1	Mean IoU	Mean F1	Mean IoU	Mean F1	Mean IoU
a	0.45	0.42	0.46	0.44	0.06	0.04
b	0.43	0.4	0.43	0.4	0.11	0.08
c	0.4	0.38	0.47	0.44	0.17	0.15

Table 5. Average of F1-scores and IoU values obtained for leave-one-out histogram matching.

HM Sites	Test Site	Mean F1-Score	Mean IoU
b and c	a	0.32	0.31
c and a	b	0.52	0.48
a and b	c	0.47	0.43

Table 6. Average of F1-scores and IoU values for all-to-one tests.

Test Sites	Mean F1-Score	Mean IoU
a	0.34	0.32
b	0.41	0.39
c	0.07	0.05

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Pawar, S.; Thomasberger, A.; Bengtson, S.H.; Pedersen, M.; Timmermann, K. Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery. Remote Sens. 2025, 17, 2518. https://doi.org/10.3390/rs17142518

AMA Style

Pawar S, Thomasberger A, Bengtson SH, Pedersen M, Timmermann K. Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery. Remote Sensing. 2025; 17(14):2518. https://doi.org/10.3390/rs17142518

Chicago/Turabian Style

Pawar, Satish, Aris Thomasberger, Stefan Hein Bengtson, Malte Pedersen, and Karen Timmermann. 2025. "Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery" Remote Sensing 17, no. 14: 2518. https://doi.org/10.3390/rs17142518

APA Style

Pawar, S., Thomasberger, A., Bengtson, S. H., Pedersen, M., & Timmermann, K. (2025). Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery. Remote Sensing, 17(14), 2518. https://doi.org/10.3390/rs17142518

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Source and Target Images

2.2. Training Data Preparation

2.3. Histogram Matching (HM) with SegFormer

2.4. Experimental Setup

2.5. Accuracy Assessment and Domain Alignment Comparison

3. Results

3.1. One-to-One Tests

3.2. Leave-One-Out

3.3. All-to-One

3.4. Domain Alignment Comparison and Limitations

3.5. Model Implementation at Lovns Broad

4. Discussion

4.1. Limitations and Constraints

4.2. Potential for Coastal Habitat Mapping

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Annotation Protocol for the Segmentation of Shallow Water Eelgrass Habitats

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI