Next Article in Journal
A Single Drone as Two Observers: Increasing Wildlife Detection Availability in Complex Environments Using Repeated Drone Flights with Offset Paths
Previous Article in Journal
Wrapped Unsupervised Hyperspectral Band Selection via Reconstruction Error from Wasserstein Generative Adversarial Network
Previous Article in Special Issue
Evaluating Consistency and Accuracy of Public Tidal Flat Datasets in China’s Coastal Zone
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Integrating AI for In-Depth Segmentation of Coastal Environments in Remote Sensing Imagery

by
Pelagia Drakopoulou
1,*,
Paraskevi Tzouveli
1,
Aikaterini Karditsa
2 and
Serafim Poulos
3,4
1
Artificial Intelligence and Learning Systems Laboratory, School of Electrical and Computer Engineering, National Technical University of Athens, 9, Iroon Polytechniou Str., Zografou, 15772 Athens, Greece
2
Department of Port Management and Shipping, National and Kapodistrian University of Athens, 34400 Psachna, Evia, Greece
3
Laboratory of Physical Geography, Deparment of Geology and Geoenvironment, Panepistimoupolis, Zografou, 15784 Athens, Greece
4
Institute of Applied and Computational Mathematics, Foundation for Research and Technology-Hellas, 100, Nikolaou Plastira Str., Vassilika Vouton, 70013 Heraklion, Crete, Greece
*
Author to whom correspondence should be addressed.
Remote Sens. 2026, 18(2), 325; https://doi.org/10.3390/rs18020325
Submission received: 7 December 2025 / Revised: 9 January 2026 / Accepted: 14 January 2026 / Published: 19 January 2026

Highlights

What are the main findings?
  • Direct fine-tuning on the Greek Coastline dataset captures region-specific coastal features but suffers from limited generalization and weaker class separation due to data scarcity.
  • A two-stage fine-tuning strategy significantly improves segmentation accuracy and class differentiation, particularly for visually similar and transitional coastal classes.
What are the implications of the main finding?
  • Progressive domain adaptation using a large, pre-labeled coastal dataset is crucial for achieving reliable, high-resolution segmentation in complex coastal environments.
  • Loss reweighting and staged training enhance performance without architectural changes, making the approach scalable to other data-scarce coastal regions.

Abstract

Mapping coastal landforms is critical for the sustainable management of ecosystems influenced by both natural dynamics and human activity. This study investigates the application of Transformer-based semantic segmentation models for pixel-level classification of key surface types such as water, sandy shores, rocky areas, vegetation, and built structures. We utilize a diverse, multi-resolution dataset that includes NAIP (1 m), Quadrangle (6 m), Sentinel-2 (10 m), and Landsat-8 (15 m) imagery from U.S. coastlines, along with high-resolution aerial images of the Greek coastline provided by the Hellenic Land Registry. Due to the lack of labeled Greek data, models were pre-trained on U.S. datasets and fine-tuned using a manually annotated subset of Greek images. We evaluate the performance of three advanced Transformer architectures, with Mask2Former achieving the most robust results, further improved 11 through a coastal-class weighted focal loss to enhance boundary precision. The findings demonstrate that Transformer-based models offer an effective, scalable, and cost-efficient solution for automated coastal monitoring. This work highlights the potential of AI-driven remote sensing to replace or complement traditional in-situ surveys, and lays the foundation for future research in multimodal data integration and regional adaptation for environmental analysis.

1. Introduction

Coastal environments are dynamic and ecologically critical systems that provide essential services such as shoreline protection, biodiversity support, and economic benefits [1,2,3,4]. However, these areas are increasingly under pressure due to both natural processes and human activities, including urbanization, infrastructure development, and climate change.
Coastal zones across the European Mediterranean are experiencing growing environmental pressures, with approximately 20% of the total shoreline currently being affected by erosion [5,6]. Greece, which has a coastline exceeding 16,000 km and more than 6000 beach zones [7], faces an even more critical situation: over one-third of its national shoreline is undergoing active erosion, with loose-sediment beaches being the most vulnerable [8]. Future projections under high-emission (SSP5-8.5) IPCC scenarios indicate that up to 99% of these beaches could retreat by more than 20% of their maximum width, around 72% may experience retreat exceeding 50%, and up to 20% could vanish entirely by the end of the century [9,10]. These changes pose significant threats to ecosystems [11], tourism-based economies [12], and cultural heritage [13], underscoring the socio-economic consequences for coastal communities [14] and the need for accurate, scalable coastal monitoring.
Traditional monitoring methods, such as in situ surveys or manual image interpretation, are often expensive, labor-intensive, and spatially constrained. Remote sensing (RS) technologies offer a scalable solution through satellite and aerial imagery, but their effective use is limited by the complexity of coastal landscapes and the need for detailed, localized interpretation.
Recent developments in Artificial Intelligence (AI), particularly in deep learning, have enabled more advanced analysis of geospatial data. Semantic segmentation, which classifies each pixel in an image into predefined categories, has become a valuable tool to extract environmental characteristics from RS images. Convolutional Neural Networks (CNNs) have been widely used in this domain due to their capacity for hierarchical feature learning [15]. Building on these advances, Transformer-based models such as MaskFormer and Mask2Former have recently demonstrated superior performance, owing to their ability to capture long-range dependencies and richer contextual information [16,17].
Despite these advances, coastal applications face a notable bottleneck: the limited availability of high-resolution, annotated datasets specific to coastal environments. Many state-of-the-art models are pretrained on urban or generic datasets (e.g., Cityscapes [18], ADE20K [19]), which do not generalize well to heterogeneous shorelines. The Coast Train dataset [20] represents a significant step toward addressing this, offering labeled coastal imagery from the United States. However, regional data gaps remain, particularly in morphologically diverse coastlines such as Greece, where features like tidal zones, sediment transitions, and dune systems are underrepresented in training data.
Manual annotation of coastal imagery is also a major barrier. It requires detailed pixel-wise labeling, often supplemented by in situ data or expensive sensing techniques such as LiDAR. This limits the operational deployment of AI models, particularly in data-scarce or remote regions.
This study proposes a two-stage semantic segmentation pipeline to address these challenges. Specifically, we achieve the following:
  • Fine-tune transformer-based semantic segmentation models on the publicly available Coast Train dataset, covering various coastal imagery.
  • Adapt the pretrained models to the Greek coastline using a limited, manually labeled set of high-resolution aerial images.
  • Evaluate model performance quantitatively using mean Intersection over Union (mIoU) and qualitatively through visual analysis of segmentation outputs.
  • Demonstrate the effectiveness of transfer learning and localized fine-tuning in enhancing model accuracy across diverse coastal environments.

2. Related Work

Traditional approaches to mapping coastal environments relied on manual interpretation or classical image processing techniques, which often lack scalability and robustness. Recent progress in Artificial Intelligence has significantly improved the analysis of remote sensing (RS) imagery, enabling more detailed modeling of coastal dynamics [21]. Despite this global trend, the application of machine learning for coastal mapping in Greece remains limited.
Early studies employed CNN-based architectures such as U-Net and its variants, achieving strong performance in sea–land segmentation and shoreline delineation [22,23]. Additional improvements were achieved through attention mechanisms, edge-aware learning, and hybrid modules, as demonstrated in works such as [24,25,26]. These approaches addressed challenges related to complex boundaries, varying resolutions, and low-contrast coastal scenes.
Beyond binary water/land separation, more recent research has focused on multi-class coastal segmentation. For example, [27] applied a U-Net-based model to classify pixels into categories such as water, sand, vegetation, and built structures, highlighting the need for finer-grained coastal information. Other works have incorporated temporal analyses of coastal dynamics, such as multi-year shoreline change detection [24], or focused on multi-resolution robustness [28]. Lightweight architectures have also been developed for real-time segmentation in aquatic or urban water environments [29].
General-purpose models such as the Segment Anything Model (SAM) [30] have attracted interest in coastal mapping due to their prompt-based segmentation capabilities. However, reviews such as [31,32] emphasize that RS-specific spectral variability and resolution differences often require domain adaptation for optimal coastal performance.
Recent advances in transformer-based segmentation address these limitations by leveraging self-attention mechanisms capable of modeling both local and global spatial contexts. This characteristic is particularly beneficial for high-resolution coastal imagery, where intricate boundaries and heterogeneous surface types are common. Building on these developments, our study evaluates three state-of-the-art transformer-based models—SegFormer [33], MaskFormer [34], and Mask2Former [35]—for the multi-class segmentation of coastal remote sensing images.
Transformer architectures have recently attracted substantial attention in remote sensing image segmentation due to their ability to model long-range spatial dependencies and integrate multi-scale contextual information [33,34,35]. In this work, we evaluate three representative transformer-based models—SegFormer, MaskFormer, and Mask2Former—due to their strong performance across diverse segmentation benchmarks and their increasing adoption in Earth observation applications.
SegFormer [33] introduces a hierarchical Transformer encoder that produces multi-resolution feature maps without relying on positional encodings. This design enables effective capture of both fine details and global structure while maintaining computational efficiency. A lightweight MLP decoder fuses the multi-scale features into the final segmentation prediction, making SegFormer suitable for high-resolution remote sensing tasks where efficiency is important.
MaskFormer [34] reconceptualizes semantic segmentation as a mask-classification problem rather than per-pixel labeling. A Transformer decoder processes a set of learnable queries, each predicting a binary mask and a class label. This set-based prediction strategy improves robustness to class imbalance, ambiguous coastlines, and heterogeneous land–water boundaries commonly found in coastal imagery.
Mask2Former [35] extends MaskFormer by introducing masked attention, where attention is computed only within spatial regions defined by predicted masks. Combined with a hierarchical backbone such as the Swin Transformer [36] and a multi-scale pixel decoder, this approach enhances boundary delineation and improves segmentation performance in complex scenes, including coastal regions with subtle spectral transitions.

3. Study Areas and Data

In this study, we implement and compare three Transformer-based semantic segmentation models, SegFormer, MaskFormer, and Mask2Former, to evaluate their suitability for coastal remote sensing applications. Coastal environments are characterized by strong spatial heterogeneity, elongated linear structures, and gradual transitions between land and water, which pose challenges for traditional pixel-based and local-context models. The self-attention mechanism employed by Transformer architectures enables the modeling of long-range dependencies while preserving fine-scale spatial details, making these models particularly well suited for complex coastal scenes.
The selected architectures jointly capture local texture information (e.g., sediment patterns, vegetation structure, and built-up areas) and global spatial context (e.g., shoreline continuity and coastal morphology). This combination is critical for accurately delineating dynamic coastal features such as surf zones, sediment–water boundaries, and vegetated buffers. By evaluating these models across multi-resolution datasets and distinct geographic regions, we aim to assess their robustness, transferability, and practical applicability for automated coastal monitoring and environmental analysis.
To support the semantic segmentation of coastal environments, we employed two complementary datasets: the publicly available Coast Train dataset [20] from the United States and high-resolution aerial imagery from the Greek coastline, provided by the Hellenic Cadastre [37]. These datasets differ in geographic origin, spatial resolution, and coastal morphology, offering a diverse basis for evaluating geographic generalization and cross-regional model adaptation. The inclusion of morphologically distinct coastlines introduces variability in landforms, textures, and environmental conditions—factors that are critical for training robust segmentation models.
A small subset of the Greek imagery was manually annotated to enable supervised fine-tuning. Both datasets used the same labeling scheme, enabling a unified training pipeline and a consistent evaluation across regions.
The Coast Train dataset [20] is a large-scale, human-labeled collection of orthomosaic and satellite imagery from diverse U.S. coastal environments. It provides pixel-level annotations across several land cover classes relevant to coastal monitoring.
The imagery spans spatial resolutions from 0.05 m (orthomosaics) to 15 m (Sentinel-2 and Landsat-8). This multi-resolution design enables learning across different sensor types and spatial scales, supporting generalization to heterogeneous inputs.
For preprocessing, images smaller than 300 × 300 pixels and non-coastal samples were removed, yielding a curated set of 645 images. All remaining images were resized to 512 × 512 pixels for compatibility with the transformer-based models. The dataset was split into 70% training, 15% validation, and 15% testing, ensuring balanced representation across classes.
The class taxonomy is shown in Table 1.
To evaluate cross-regional generalization, a complementary dataset was constructed using high-resolution aerial imagery (25 cm per pixel) from sections of the southwestern Greek coastline, obtained from the Hellenic Cadastre [37]. These images depict coastal morphologies not represented in Coast Train, including narrow beaches, rocky outcrops, mixed sediment types, and dense shoreline vegetation (Figure 1).
Raw images were divided into 512 × 512 patches using zero-padding where necessary to ensure consistent dimensions (Figure 2, Figure 3 and Figure 4). This procedure preserved fine spatial detail while maintaining compatibility with model input requirements.
A total of 420 manually annotated patches were generated using the Segments.ai platform. Annotations followed the Coast Train label definitions (Table 1). Special care was taken to accurately capture transitional shoreline zones—areas where automated models typically struggle due to fine-grained boundaries or mixed surface types.
The patches were divided into 60% training, 20% validation, and 20% testing. The training portion was used exclusively for fine-tuning, while validation and test sets served to assess cross-regional adaptation.
The Coast Train dataset served as the primary training source, while the Greek patches enabled targeted fine-tuning and systematic evaluation in a distinct geographic region. This dual-dataset framework supports both large-scale model development and region-specific adaptation.
The pilot site selected for this study lies along the southwestern coastal zone of the Peloponnese, Greece, specifically bordering the Kyparissiakos Gulf. The study area extends approximately 60 km along the shoreline and reaches up to 300 m inland (Figure 5). This zone includes a wide range of geomorphological formations, such as sandy beaches, vegetated backshores, active dune systems, and rocky coastal segments. Settlements including Kyparissia, Filiatra, and Kalo Nero are interspersed along this stretch, introducing diverse land-use patterns and varying levels of human-induced alteration.
The natural variability within the study region, ranging from unaltered coastal ecosystems to semi-urbanized zones, provides an ideal ground truth for evaluating segmentation model performance in real-world settings. Particularly relevant are the multiple hydrological inputs, such as the Neda River and smaller streams that flow into the Gulf, which contribute to sediment dynamics and influence nearshore patterns. Much of the area is included in the Natura 2000 ecological network, which underscores its ecological sensitivity and the need for high-resolution, frequent environmental monitoring to support effective conservation and land-use decision-making.

4. Methodology

4.1. Training Framework

Semantic segmentation of coastal imagery is essential for applications such as environmental monitoring, shoreline assessment, and coastal zone management. In this study, we evaluate three transformer-based models—SegFormer, MaskFormer, and Mask2Former—for pixel-level classification into seven coastal classes: water, sea foam, sediment, vegetation, development, natural terrain, and unknown.
A central challenge lies in the fact that the Greek Coastline dataset contains only a small number of manually annotated samples, limiting its suitability for fully supervised training. To address this, we implement and compare two training strategies: (i) direct fine-tuning solely on the Greek annotations, and (ii) a two-stage domain adaptation process leveraging the large, labeled Coast Train dataset prior to adaptation to the Greek domain.
In the first approach, we directly fine-tune the pre-trained transformer models on the manually annotated subset of the Greek Coastline dataset (Section 5.4 and Section 5.5), without using the Coast Train dataset as an intermediate domain. This method evaluates whether region-specific training data alone are sufficient for effective segmentation (Figure 6).
Fine-tuning of the limited Greek annotations enables the models to adapt to local morphological and spectral patterns, including shoreline geometries, sediment types, and vegetation structures. However, the restricted size of the data set results in reduced generalization, particularly for visually similar classes (e.g., sediment vs. natural terrain) and transition zones such as sea foam or mixed vegetation–sand regions. Although this approach avoids additional preprocessing steps, it produces lower segmentation accuracy and weaker boundary consistency compared to the two-stage strategy (Section 5.6).

4.2. Classification Scheme

To achieve higher accuracy and improved generalization, we adopt a two-stage fine-tuning strategy grounded in transfer learning principles (Figure 7). This process enables the models to first learn broad coastal characteristics before adapting to region-specific Greek imagery.
In the first stage, the transformer models are fine-tuned on the labeled Coast Train dataset, which includes diverse orthomosaic and satellite imagery from a wide range of U.S. coastal settings. Exposure to heterogeneous landforms, spectral conditions, and coastal structures allows the models to learn generalizable representations of coastal environments.
This stage also bridges the gap between generic pre-training (e.g., ImageNet) and the specialized coastal domain, establishing meaningful semantic priors prior to adaptation to the Greek region.
In the second stage, the models are fine-tuned on the manually annotated Greek Coastline patches. This step addresses domain shifts arising from differences in terrain morphology, vegetation composition, lighting conditions, and imaging resolution.
Despite the limited number of annotated samples, fine-tuning on this localized dataset significantly improves segmentation performance, particularly for narrow shoreline structures and region-specific spectral signatures. The two-stage strategy therefore provides a practical approach for deploying deep learning models in data-scarce regions by combining large-scale general-domain training with targeted local adaptation.

5. Experiments

5.1. Experimental Setup

The experimental framework was designed to ensure reproducibility and rigorously evaluate the transferability of Transformer-based models to diverse coastal geographies. All experiments were conducted on the Kaggle cloud computing platform using an NVIDIA Tesla P100 GPU (16 GB of VRAM), manufactured by NVIDIA Corporation (Santa Clara, CA, USA). Model training and evaluation were implemented using the PyTorch deep learning framework (version 1.13.1; Meta Platforms, Inc., Menlo Park, CA, USA) and the MMsegmentation library.

5.1.1. Architectural Configurations

While existing Transformer architectures —SegFormer (2021), MaskFormer (2021), and Mask2Former (2022)—serve as the backbones; their selection is predicated on their ability to capture long-range dependencies, which is critical for the continuity of linear coastal features. We utilized the “Large” variants (e.g., SegFormer-B5, Mask2Former-L) to maximize the feature extraction capacity during the two-stage domain adaptation.

5.1.2. Hyperparameters and Optimization

To maintain consistency across architectures and training stages, the following hyperparameters were employed:
  • Optimizer: AdamW, with a weight decay of 0.01.
  • Learning Rate: A base learning rate of 1 × 10 4 was used, governed by a poly learning rate schedule with a power of 0.9 to ensure smooth convergence.
  • Batch Size: A total batch size of 8 (4 images per GPU iteration).
  • Training Duration: Stage 1 (Coast Train) was conducted for 160,000 iterations, while Stage 2 (Greek Adaptation) was performed for 40,000 iterations to prevent overfitting on the smaller Greek dataset.
  • Data Augmentation: Standard techniques, including random horizontal flipping, random scaling (0.5× to 2.0×), and random cropping to 512 × 512 pixels, were applied to enhance model robustness.

5.1.3. Loss Function

To address the inherent class imbalance in coastal scenes (e.g., the dominance of ‘Water’ over ‘Sea foam’), we utilized a combination of Cross-Entropy Loss and Dice Loss. In Stage 2, a coastal-class weighted focal loss was integrated to prioritize the accurate delineation of high-contrast boundaries, such as the sediment–water interface.

5.2. Evaluation Methodology

Model performance was assessed using the Mean Intersection over Union (mIoU), a standard metric for multi-class semantic segmentation that quantifies agreement between predicted and ground-truth labels.
For a given class i, the Intersection over Union (IoU) is defined as follows:
I o U i = T P i T P i + F P i + F N i
where T P i , F P i , and F N i denote true positives, false positives, and false negatives, respectively.
The overall mIoU across all k classes is computed as follows:
m I o U = 1 k i = 1 k I o U i .
This metric is particularly informative for coastal segmentation, where classes may exhibit fine-scale boundaries and strong spatial imbalance. Reporting mIoU for both training strategies allows for a comprehensive comparison of generalization capability, region-specific adaptation, and the overall effectiveness of the proposed framework.

5.3. Quantitative Results

This section presents a comprehensive evaluation of the proposed two-stage training pipeline across three transformer-based architectures—SegFormer, MaskFormer, and Mask2Former. Models were first fine-tuned on the Coast Train dataset (Stage 1) and subsequently adapted to the Greek Coastline dataset via targeted fine-tuning (Stage 2). Performance is reported using mean Intersection over Union (mIoU), mean accuracy, and F1 score. We additionally evaluate a baseline condition in which models are trained directly on the Greek Coastline dataset without Stage 1 pretraining.
Overall, the results demonstrate (i) strong Stage 1 generalization from Coast Train, (ii) limited effectiveness when training directly on Greek imagery, and (iii) substantial improvements following Stage 2 fine-tuning, particularly for models pretrained on Cityscapes.

5.4. Stage 1: Training on the Coast Train Dataset

Table 2, Table 3 and Table 4 summarize the Stage 1 results. Across all architectures, Cityscapes’ pretraining consistently outperforms ADE20K, reflecting the closer visual and structural similarity between Cityscapes and coastal scenes. Among the SegFormer variants, SegFormer-B5 pretrained on Cityscapes achieved the best performance (82.69% mIoU). MaskFormer-Large reached 82.18% mIoU, while the Mask2Former-Large model pretrained on Cityscapes attained the highest Stage 1 score of 84.06% mIoU.
The pretraining dataset comparison (Table 5) reveals that the initialization of Cityscapes provides consistent advantages across architectures, with the largest gain observed for SegFormer-B5 (+5.32%). This finding suggests that pretraining on datasets with similar visual characteristics to the target domain—in this case, structured scenes with clear boundaries between land cover types—facilitates more effective transfer learning than pretraining on diverse but visually dissimilar datasets like ADE20K.
Across all architectures, Mask2Former-Large (Cityscapes) provides the strongest foundation for downstream fine-tuning, establishing this as the baseline for subsequent experiments.

5.5. Direct Training on the Greek Coastline Dataset

Direct fine-tuning on Greek imagery, without Stage 1, provides insight into how well models adapt using only local, limited annotations. Results are reported in Table 6. While functional, performance is consistently lower than the two-stage approach. Mask2Former-Large pretrained on Cityscapes achieved the best direct score (82.42% mIoU), outperforming SegFormer and MaskFormer variants in this baseline setting.
The reduced performance of direct training compared to the Stage 1 results (82.42% vs. 84.06% for Mask2Former) demonstrates the value of large-scale pretraining on diverse coastal imagery. The limited size of the Greek dataset (420 patches) is insufficient for models to learn robust coastal representations from scratch, even when initialized with general-purpose weights from Cityscapes or ADE20K.

5.6. Stage 2: Fine-Tuning on the Greek Coastline Dataset

The second training stage adapts each model to the Greek coastal domain using a limited set of manually annotated high-resolution aerial images. Fine-tuning consistently improves performance across all architectures, with Mask2Former-Large achieving the strongest results (85.43% mIoU). Table 7 summarizes Stage 2 results.
Comparing Stage 1 and Stage 2 reveals clear gains from geographic adaptation. Mask2Former-Large improves by +1.37% mIoU after exposure to even a small number of Greek training patches, highlighting the efficiency of targeted fine-tuning. The improvement is particularly notable given that Stage 2 training uses only 252 images (60% of 420 patches), demonstrating that strategic transfer learning can achieve substantial performance gains with minimal additional annotation effort.
Table 8 presents a consolidated view of the final Stage 2 performance across all three architectures. The results clearly demonstrate Mask2Former’s superiority, with a 4.84 percentage point advantage over MaskFormer and an 8.62 percentage point advantage over SegFormer.

6. Inference on Unseen Data

6.1. Model Predictions

Inference was performed using the best model of each architecture on unseen Greek coastal images. Based on the classification of Figure 8, Figure 9, Figure 10 and Figure 11 illustrate representative predictions. Mask2Former demonstrates the most coherent shoreline boundaries, reduced confusion between sediment and terrain, and improved delineation of dynamic features such as sea foam.

6.2. Model Comparison (Stage 2)

Our experimental evaluation demonstrates two critical findings, summarized in Table 9 and Table 10.
Our initial finding indicates that Mask2Former achieves significantly better performance in coastal segmentation tasks. The systematic architectural comparison (Table 8) reveals that Mask2Former-Large achieves 85.43% mIoU on challenging Greek coastal imagery—outperforming MaskFormer by 4.84 percentage points and SegFormer by 8.62 percentage points. This performance advantage is consistent across all evaluation metrics (mean accuracy: 94.33%, F1 score: 96.27%). The masked attention mechanism in Mask2Former proves to be particularly effective for capturing both fine-grained coastal boundaries and broader spatial context.
The second key finding is that two-stage domain adaptation proves to be the most effective training strategy. As shown in Table 10, this reaches an mIoU of 85.43%, which represents gains of 3.01 percentage points over direct Greek training and 1.37 percentage points over Stage-1-only training.
Three key insights emerge from our comprehensive evaluation:
1. Domain-general training provides a strong baseline. Stage 1 models capture broad coastal structure but miss fine regional details. Cityscapes pretraining consistently outperforms ADE20K across all architectures (Table 5), with Mask2Former achieving 84.06% vs. 83.92%. This reflects closer visual similarity between Cityscapes’ urban environments and coastal scenes, both of which contain distinct boundaries between land cover types, water bodies, and infrastructure.
2. Localized fine-tuning is highly effective. Even 420 annotated Greek patches significantly improved accuracy (+1.37% mIoU for Mask2Former), boundary sharpness, and class separability. Direct training on Greek data alone achieves only 82.42% mIoU, demonstrating that the Stage 1 pretraining on Coast Train provides essential foundational features that accelerate adaptation to the target domain.
3. Mask2Former is consistently the best performer. Its masked attention mechanism yields superior adaptation in complex coastal zones, especially along foam–water and sediment–terrain transitions. Final performance: 85.43% mIoU vs 80.59% for MaskFormer and 76.81% for SegFormer. The architecture’s ability to selectively attend to relevant spatial regions while suppressing irrelevant background enables more precise delineation of coastal features.
Overall, the combination of Mask2Former architecture and two-stage domain adaptation enables scalable, geographically adaptable coastal segmentation while minimizing annotation costs.

6.3. Qualitative Analysis

The qualitative analysis, shown in Figure 9, Figure 10, Figure 11 and Figure 12, indicates that Transformer-based architectures—especially Mask2Former—more effectively address the typical “fragmentation” problem observed in conventional CNN-based models for coastal remote sensing. By exploiting global self-attention mechanisms, these models preserve the spatial coherence of thin, elongated structures, including seawalls, narrow sandbars, and the swash zone. In the Greek test locations, where the landscape shifts sharply from exposed rock to deep water, the models exhibit a strong contextual understanding, accurately labeling small vegetation patches embedded in the natural terrain that local-window convolutions frequently misclassify as noise. This “structural prior,” strengthened by the two-stage training framework, allows the model to prioritize global land–water connectivity before refining local spectral signatures, resulting in boundaries that are geomorphologically more plausible than those generated by baseline approaches.
However, a deeper visual inspection uncovers persistent challenges in visually ambiguous transition zones, specifically where sediment merges into natural terrain or where shadows from coastal cliffs mimic water signatures. In these high-entropy regions, the constraints of RGB-only imagery become evident, as the models occasionally struggle to differentiate between spectrally similar classes like wet sand and rocky substrate under varying illumination. Furthermore, dynamic sea-state variables, such as surf foam and wave turbidity, introduce localized “salt-and-pepper” artifacts, reflecting the difficulty of cross-regional adaptation from the leaf-off conditions of US NAIP data to the micro-tidal Mediterranean environment. These qualitative findings suggest that while Transformer-based “long-range dependencies” significantly improve boundary stability, the next leap in reliability will require multimodal integration (e.g., NIR for water penetration or LiDAR for elevation) to resolve classes that are spectrally inseparable in the visible spectrum.

7. Discussion

7.1. Implications of the Results

The results demonstrate the strong potential of AI-driven segmentation for mapping and monitoring coastal environments. By systematically evaluating three transformer-based architectures, the proposed framework achieves high segmentation accuracy (85.43% mIoU) while capturing the heterogeneity of coastal landscape features.
The two-stage training pipeline effectively leverages large-scale public data and adapts it to local conditions with minimal manual annotation. Our approach demonstrates that only 420 manually annotated Greek coastal patches—when combined with strategic transfer learning—can yield substantial performance gains (+3.01% over direct training, +1.37% over Stage 1 alone). Accurate coastal mapping is essential for environmental protection and sustainable coastal management, supporting early detection of shoreline change, erosion, and habitat degradation [2,38].
AI-based monitoring also enables the detection of unauthorized development in coastal buffer zones, supporting evidence-based enforcement of environmental regulations [39]. This automated approach provides a scalable and cost-efficient alternative to traditional surveys, useful for applications such as shoreline monitoring, urban encroachment assessment, and coastal resilience planning. Moreover, improved segmentation of geomorphological features can assist in disaster risk modelling and climate adaptation efforts [40]. Overall, the proposed framework aligns closely with the goals of integrated coastal zone management, enabling repeatable and data-driven coastal assessment [31].

7.2. Challenges and Limitations

Despite achieving strong performance, several limitations warrant discussion: A primary limitation of this study lies in the need for manually annotated, high-resolution labels. Although only a small set of Greek coastline images was annotated, generating pixel-level masks remains time-consuming and costly, which can hinder rapid deployment in new regions. Nevertheless, high-quality annotations remain essential for reliable model adaptation [24,31].
Another challenge concerns domain shift between the Coast Train dataset and the Greek coastline. Differences in geomorphology, texture, sensor characteristics, and illumination can reduce generalization.
Our two-stage fine-tuning strategy mitigated these effects, as evidenced by the +3.01% improvement over direct training on Greek data alone. However, further improvements could be achieved by incorporating more diverse coastal imagery from multiple geographic regions and climate zones. The current evaluation on a single Mediterranean coastal region, while demonstrating proof-of-concept, does not establish generalization to global coastal environments.
The study also relied exclusively on RGB aerial imagery, reflecting the constraints of nationally provided datasets (Hellenic Cadastre). Access to additional modalities (e.g., infrared bands for vegetation indices, multispectral data for water quality assessment, LiDAR for elevation-based coastal zone delineation) could further enhance classification accuracy and robustness in complex environments. Multispectral integration represents a particularly promising direction, as spectral indices provide complementary information that could reduce confusion between sediment and natural terrain classes.
The current framework does not explicitly model temporal dynamics or seasonal variations. Future work should validate model consistency across multi-temporal imagery and assess performance degradation due to seasonal morphological alterations.

7.3. Future Research Directions

Future work may focus on several promising directions:
  • Prompt-based annotation: Interactive segmentation models such as the Segment Anything Model (SAM) could reduce annotation effort and accelerate dataset creation.
  • Finer-grained semantic classes: Introducing more detailed taxonomies (e.g., rocky vs. sandy coasts, infrastructure sub-classes) would enable more informative coastal analyses.
  • Multimodal data integration: Combining RGB with elevation, infrared, or multispectral data could improve class separability, especially in visually ambiguous regions.
  • Architectural and training improvements: Exploring next-generation segmentation models (e.g., OneFormer, InternImage) and training strategies (e.g., curriculum learning, multiscale sampling) may yield performance gains.
  • Weak supervision and label transfer: Techniques such as pseudo-labeling, domain adaptation, and prompt-tuned segmentation could reduce reliance on manual labeling.
  • Continual learning: Adapting models incrementally as new coastal imagery becomes available would enable long-term monitoring without full retraining.

8. Conclusions

This study introduced a comprehensive two-stage semantic segmentation framework tailored to coastal environments, systematically evaluating three transformer-based architectures (SegFormer, MaskFormer, Mask2Former) combined with a transfer learning strategy. We demonstrate two key contributions: First, we establish that Mask2Former architecture with masked attention provides superior performance for coastal segmentation, achieving 85.43% mIoU on challenging Greek coastal imagery—outperforming MaskFormer and SegFormer. This systematic architectural comparison, absent in the prior coastal segmentation literature, provides empirical guidance for method selection in operational coastal monitoring systems. Second, we validate a two-stage domain adaptation strategy that achieves efficient transfer from large-scale U.S. coastal data (Coast Train) to a data-scarce Greek coastal region. This approach outperforms both direct training on limited Greek data (+3.01% mIoU) and single-stage training on Coast Train alone (+1.37% mIoU). The results demonstrate that strategic transfer learning enables state-of-the-art performance with only 420 manually annotated patches, addressing a critical barrier to AI deployment in regions lacking extensive labeled datasets.
The findings highlight the effectiveness of hybrid training pipelines that balance generalization and locality, demonstrating that strong performance can be obtained even with limited manual annotation. The success of transformer-based models, particularly Mask2Former’s masked attention mechanism, further underscores the growing relevance of attention-based architectures in geospatial and environmental applications where both local fine-grained detail and global spatial context are essential.

Broader Implications

The proposed pipeline offers a scalable and cost-effective solution for high-resolution coastal monitoring, supporting climate-resilient planning, shoreline change detection, and sustainable land-use management. As pressures on coastal zones intensify, such AI-based tools provide a robust alternative to traditional field surveys, enabling more frequent, consistent, and wide-area assessments.
The methodological contributions—systematic transformer architecture evaluation and validated two-stage adaptation strategy—are immediately applicable to operational coastal management. National mapping agencies, environmental protection authorities, and coastal zone planners can leverage this framework to establish automated monitoring systems that complement or replace labor-intensive manual interpretation. The achieved segmentation accuracy enables the precise quantification of shoreline retreat rates, critical for erosion risk assessment and coastal adaptation planning under sea-level rise scenarios.
From a research perspective, this work establishes important benchmarks: (i) the first systematic comparison of transformer architectures for coastal segmentation, and (ii) validated cost–benefit metrics for transfer learning strategies in geographic adaptation scenarios. These contributions provide a foundation for future research in multimodal integration (combining optical, radar, and elevation data), temporal modeling (multi-year change detection), and global-scale coastal monitoring systems.
Future extensions incorporating multi-temporal data, multispectral features, or ancillary geospatial layers (bathymetry, wave exposure indices, historical shoreline positions) may further enhance model reliability and support informed decision-making at multiple governance levels. The demonstrated effectiveness of transformer architectures and strategic transfer learning suggests that comprehensive, automated coastal monitoring at national or continental scales is now technically feasible, with the remaining challenges primarily occurring in operational deployment, computational optimization, and multi-regional validation rather than fundamental methodological limitations.
In conclusion, this work advances the state-of-the-art in AI-driven coastal remote sensing by demonstrating that careful attention to architecture selection and training strategy can achieve robust performance even under challenging conditions of limited labeled data and significant domain shift. The practical applicability of these methods to pressing coastal management challenges positions this research as a step toward operational, scalable, and cost-effective automated coastal monitoring systems that can support evidence-based environmental policy and climate adaptation planning.

Author Contributions

Conceptualization, P.T., A.K. and S.P.; methodology, P.D. and P.T.; software, P.D.; validation, P.T., A.K. and S.P.; formal analysis, P.T., A.K. and S.P.; investigation, P.D. and P.T.; resources, P.D., P.T., A.K. and S.P.; data curation, P.D. and A.K.; writing—original draft preparation, P.D., P.T. and A.K.; writing—review and editing, P.D. and A.K.; visualization, P.D.; supervision, P.T., A.K. and S.P.; project administration, P.T.; funding acquisition, S.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project ATHINAIKI RIVIERA (MIS 5217202.) entitled: “Study for the development of smart infrastructure along the coastal front of Southwest Attica, with emphasis on the promotion of the natural and cultural reserve for the enhancement of tourist services”, Attica Regional Programme (2014–2020). The APC was funded by the National and Kapodistrian University of Athens (Project Number 17454).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon request. Publicly available datasets used in this study are referenced in the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Kontogianni, A.; Tourkolias, C.; Damigos, D.; Skourtos, M. Assessing Sea Level Rise Costs and Adaptation Benefits Under Uncertainty in Greece. Environ. Sci. Policy 2013, 37, 61–78. [Google Scholar] [CrossRef]
  2. Liebowitz, D.; Nielsen, K.; Dugan, J.; Morgan, S.; Malone, D.; Largier, J.; Hubbard, D.; Carr, M. Ecosystem Connectivity and Trophic Subsidies of Sandy Beaches. Ecosphere 2016, 7, e01503. [Google Scholar] [CrossRef]
  3. Neumann, B.; Vafeidis, A.T.; Zimmermann, J.; Nicholls, R.J. Future Coastal Population Growth and Exposure to Sea-Level Rise and Coastal Flooding—A Global Assessment. PLoS ONE 2015, 10, e0118571. [Google Scholar] [CrossRef]
  4. Schooler, E.; Zage, D.; Sedayao, J.; Moustafa, H.; Brown, A.; Ambrosin, M. An Architectural Vision for a Data-Centric IoT: Rethinking Things, Trust and Clouds. In Proceedings of the 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS), Atlanta, GA, USA, 5–8 June 2017. [Google Scholar] [CrossRef]
  5. EUROSION. Living with Coastal Erosion in Europe: Sediment and Space for Sustainability. Part II: Maps and Statistics; European Commission: Brussels, Belgium, 2004. [Google Scholar]
  6. van de Wal, R.; Melet, A.; Bellafiore, D.; Camus, P.; Ferrarin, C.; Oude Essink, G.; Haigh, I.D.; Lionello, P.; Luijendijk, A.; Toimil, A. Sea Level Rise in Europe: Impacts and Consequences. In Sea Level Rise in Europe: 1st Assessment Report of the Knowledge Hub on Sea Level Rise (SLRE1); Copernicus Publications: Göttingen, Germany, 2024. [Google Scholar]
  7. Bourma, E.; Perivoliotis, L.; Petihakis, G.; Korres, G.; Frangoulis, C.; Ballas, D.; Zervakis, V.; Tragou, E.; Katsafados, P.; Spyrou, C.; et al. The Hellenic Marine Observing, Forecasting and Technology System—An Integrated Infrastructure for Marine Research. J. Mar. Sci. Eng. 2022, 10, 329. [Google Scholar] [CrossRef]
  8. Alexandrakis, G.; Karditsa, A.; Poulos, S.; Ghionis, G.; Kampanis, N.A. Vulnerability Assessment for the Erosion of the Coastal Zone to a Potential Sea Level Rise: The Case of the Aegean Hellenic Coast. In Environmental Systems; UNESCO: Oxford, UK, 2010. [Google Scholar]
  9. Monioudi, I.N.; Karditsa, A.; Chatzipavlis, A.; Alexandrakis, G.; Andreadis, O.P.; Velegrakis, A.F.; Poulos, S.E.; Ghionis, G.; Petrakis, S.; Sifnioti, D. Assessment of the Vulnerability of the Eastern Cretan Beaches (Greece) to Sea Level Rise. Reg. Environ. Change 2016, 16, 1951–1962. [Google Scholar] [CrossRef]
  10. Monioudi, I.N.; Velegrakis, A.F.; Chatzipavlis, A.; Rigos, A.; Karambas, T.; Vousdoukas, M.I.; Hasiotis, T.; Koukourouvli, N.; Peduzzi, P.; Manoutsoglou, E. Assessment of Island Beach Erosion Due to Sea Level Rise in the Aegean Archipelago. Nat. Hazards Earth Syst. Sci. 2017, 17, 449–466. [Google Scholar] [CrossRef]
  11. Dimitriadis, C.; Karditsa, A.; Almpanidou, V.; Anastasatou, M.; Petrakis, S.; Poulos, S.; Koutsoubas, D.; Sourbes, L.; Mazaris, A.D. Sea Level Rise Threatens Critical Nesting Sites of Marine Turtles in the Mediterranean. Reg. Environ. Change 2022, 22, 56. [Google Scholar] [CrossRef]
  12. Karditsa, A.; Niavis, S.; Paramana, T.; Monioudi, I.; Poulos, S.; Hatzaki, M. Is the Insular Coastal Tourism of Western Greece at Risk Due to Climate-Induced Sea Level Rise? Ocean. Coast. Manag. 2024, 251, 107088. [Google Scholar]
  13. Novikova, A.; Belova, N.; Baranskaya, A.; Aleksyutina, D.; Maslakov, A.; Zelenin, E.; Shabanova, N.; Ogorodov, S. Dynamics of Permafrost Coasts of Baydaratskaya Bay (Kara Sea) Based on Multi-Temporal RS Data. Remote Sens. 2018, 10, 1481. [Google Scholar] [CrossRef]
  14. Karditsa, A.; Poulos, S. Socio-economic Risk Assessment of Setback Zones in Beaches Threatened by SLR-Induced Retreat. Anthr. Coasts 2024, 7, 25. [Google Scholar] [CrossRef]
  15. Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
  16. Cheng, B.; Misra, I.; Schwing, A.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. arXiv 2021, arXiv:2112.01527. [Google Scholar]
  17. Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer for Universal Image Segmentation. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
  18. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schieleet, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  19. Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene Parsing through ADE20K Dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
  20. Buscombe, D.; Wernette, P.; Fitzpatrick, S.; Favela, J.; Goldstein, E.B.; Enwright, N.M. A 1.2 Billion Pixel Human-Labeled Dataset for Coastal Classification. Sci. Data 2023, 10, 46. [Google Scholar] [CrossRef]
  21. Pourzangbar, A.; Jalali, M.; Brocchini, M. Machine Learning Application in Modelling Marine and Coastal Phenomena. Front. Environ. Eng. 2023, 2, 1235557. [Google Scholar] [CrossRef]
  22. Li, R.; Liu, W.; Yang, L.; Sun, S.; Hu, W.; Zhang, F.; Li, W. DeepUNet: A Deep FCN for Sea–Land Segmentation. arXiv 2017, arXiv:1709.00201. [Google Scholar]
  23. Shamsolmoali, P.; Zareapoor, M.; Wang, R.; Zhou, H.; Yang, J. A Novel Deep Structure U-Net for Sea–Land Segmentation. arXiv 2020, arXiv:2003.07784. [Google Scholar]
  24. Lv, Q.; Wang, Q.; Song, X.; Ge, B.; Guan, H.; Lu, T.; Tao, Z. Research on Coastline Extraction and Dynamic Change from RS Images Based on Deep Learning. Front. Environ. Sci. 2024, 12, 1443512. [Google Scholar] [CrossRef]
  25. Heidler, K.; Mou, L.; Baumhoer, C.; Dietz, A.; Zhu, X.X. HED-UNet: Combined Segmentation and Edge Detection for Antarctic Coastline Monitoring. arXiv 2021, arXiv:2103.01849. [Google Scholar] [CrossRef]
  26. Lymperopoulos, E.; Tzouveli, P.; Kollias, S. Satellite Image Super-Resolution for Forest Localization. In Proceedings of the 2023 International Conference on Machine Intelligence for GeoAnalytics and Remote Sensing (MIGARS), Hyderabad, India, 27–29 January 2023. [Google Scholar]
  27. Scala, P.; Manno, G.; Ciraolo, G. Semantic Segmentation of Coastal Aerial/Satellite Images: Application to Coastline Detection. Comput. Geosci. 2024, 192, 107504. [Google Scholar] [CrossRef]
  28. Mahmoud, A.; Mohamed, S.; Helmy, A.; Nasr, A. BDCN-UNet: Advanced Shoreline Extraction Integrating Deep Learning. Earth Sci. Inform. 2025, 18, 187. [Google Scholar] [CrossRef]
  29. Ye, J.; Li, P.; Zhang, Y.; Guo, Z.; Zeng, S.; Zhan, Y. MLHI-Net: Multi-level hybrid lightweight water body segmentation network for urban shoreline detection. Sci. Rep. 2025, 15, 4746. [Google Scholar] [CrossRef]
  30. Fang, Y.; Xu, C.; Chen, T.; Li, X. Sea–Land Segmentation with Prompt Learning-Based SAM. Remote Sens. 2023, 16, 3432. [Google Scholar]
  31. Chen, J.; Zhang, L. Remote Sensing Image Interpretation for Coastal Zones: A Review. Remote Sens. 2024, 16, 4701. [Google Scholar]
  32. Jahan, I.; Islam, M. Coastal Boundary Extraction for Erosion Monitoring: A Review. Remote Sens. Appl. 2024, 5, 9. [Google Scholar]
  33. Xie, E.; Wang, W.; Yu, Z.; Anandkumar, A.; Alvarez, J.; Luo, P. SegFormer: Simple and Efficient Transformer for Semantic Segmentation. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
  34. Cheng, B.; Schwing, A.; Kirillov, A. Per-Pixel Classification Is Not All You Need. In Proceedings of the 35th Conference on Neural Information Processing Systems (NeurIPS 2021), Online, 6–14 December 2021. [Google Scholar]
  35. Cheng, B.; Misra, I.; Schwing, A.; Kirillov, A.; Girdhar, R. Masked-Attention Mask Transformer. In Proceedings of the 36th Conference on Neural Information Processing Systems (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
  36. Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar] [CrossRef]
  37. Hellenic Cadastre. National Cadastral and Mapping Agency of Greece. Available online: https://ktimatologio.gr/ (accessed on 21 May 2024).
  38. Turner, R.; Subak, S.; Adger, W. Pressures, Trends, and Impacts in Coastal zones: Interactions between socioeconomic and natural systems. Environ. Manag. 1996, 20, 159–173. [Google Scholar] [CrossRef]
  39. Seto, K.; Güneralp, B.; Hutyra, L. Global Forecasts of Urban Expansion to 2030. Proc. Natl. Acad. Sci. USA 2012, 109, 16083–16088. [Google Scholar] [CrossRef] [PubMed]
  40. Hinkel, J.; Lincke, D.; Vafeidis, A.T.; Perrette, M.; Nicholls, R.J.; Tol, R.S.; Marzeion, B.; Fettweis, X.; Ionescu, C.; Levermann, A. Coastal flood damage and adaptation costs under 21st century sea-level rise. Proc. Natl. Acad. Sci. USA 2014, 111, 3292–3297. [Google Scholar] [CrossRef]
Figure 1. Example image (a), label (b), and overlay (c) from Coast Train [20] (San Diego, California).
Figure 1. Example image (a), label (b), and overlay (c) from Coast Train [20] (San Diego, California).
Remotesensing 18 00325 g001
Figure 2. Image preprocessing pipeline: zero-padding and patch extraction.
Figure 2. Image preprocessing pipeline: zero-padding and patch extraction.
Remotesensing 18 00325 g002
Figure 3. Example 512 × 512 patch from the Greek coastline dataset.
Figure 3. Example 512 × 512 patch from the Greek coastline dataset.
Remotesensing 18 00325 g003
Figure 4. Original Greek image (left) and corresponding segmentation mask (right).
Figure 4. Original Greek image (left) and corresponding segmentation mask (right).
Remotesensing 18 00325 g004
Figure 5. Pilot study site along the Kyparissiakos Gulf (yellow box).
Figure 5. Pilot study site along the Kyparissiakos Gulf (yellow box).
Remotesensing 18 00325 g005
Figure 6. Direct fine-tuning in the Greek coastline.
Figure 6. Direct fine-tuning in the Greek coastline.
Remotesensing 18 00325 g006
Figure 7. Overview of the proposed two-stage domain adaptation framework.
Figure 7. Overview of the proposed two-stage domain adaptation framework.
Remotesensing 18 00325 g007
Figure 8. Segmentation color legend.
Figure 8. Segmentation color legend.
Remotesensing 18 00325 g008
Figure 9. Inference of the best SegFormer model on example Greek coastline images. For color classification see Figure 8.
Figure 9. Inference of the best SegFormer model on example Greek coastline images. For color classification see Figure 8.
Remotesensing 18 00325 g009
Figure 10. Inference of the best MaskFormer model on example Greek coastline images. For color classification see Figure 8.
Figure 10. Inference of the best MaskFormer model on example Greek coastline images. For color classification see Figure 8.
Remotesensing 18 00325 g010
Figure 11. Inference of the best Mask2Former model on example Greek coastline images. For color classification see Figure 8).
Figure 11. Inference of the best Mask2Former model on example Greek coastline images. For color classification see Figure 8).
Remotesensing 18 00325 g011
Figure 12. Qualitative comparison of the best Stage 2 SegFormer, MaskFormer, and Mask2Former models. For color classification see Figure 8.
Figure 12. Qualitative comparison of the best Stage 2 SegFormer, MaskFormer, and Mask2Former models. For color classification see Figure 8.
Remotesensing 18 00325 g012
Table 1. Class definitions used in both datasets.
Table 1. Class definitions used in both datasets.
Classes
IdLabel
0water
1sea foam
2sediment
3development
4natural terrain
5vegetation
6unknown
Table 2. Stage 1: SegFormer results on coast train dataset.
Table 2. Stage 1: SegFormer results on coast train dataset.
ModelPretrainedmIoU (%)Mean Acc (%)F1 (%)
SegFormer-B4Cityscapes79.2086.0993.54
SegFormer-B4ADE20K76.1585.6992.83
SegFormer-B5Cityscapes82.6990.6094.23
SegFormer-B5ADE20K77.3785.4592.45
Table 3. Stage 1: MaskFormer results on coast train dataset.
Table 3. Stage 1: MaskFormer results on coast train dataset.
ModelPretrainedmIoU (%)Mean Acc (%)F1 (%)
MaskFormer-BaseADE20K81.4089.6493.96
MaskFormer-LargeADE20K82.1891.7694.79
Table 4. Stage 1: Mask2Former results on coast train dataset.
Table 4. Stage 1: Mask2Former results on coast train dataset.
ModelPretrainedmIoU (%)Mean Acc (%)F1 (%)
Mask2Former-LargeCityscapes84.0693.4295.45
Mask2Former-LargeADE20K83.9291.1494.67
Table 5. Ablation study: impact of pretraining dataset (Stage 1).
Table 5. Ablation study: impact of pretraining dataset (Stage 1).
ModelCityscapes mIoU (%)ADE20K mIoU (%)Cityscapes Advantage
SegFormer-B479.2076.15+3.05%
SegFormer-B582.6977.37+5.32%
Mask2Former-Large84.0683.92+0.14%
Cityscapes pretraining consistently outperforms ADE20K.
Table 6. Direct Training on Greek Coastline Dataset (No Stage 1).
Table 6. Direct Training on Greek Coastline Dataset (No Stage 1).
ModelmIoU (%)Mean Acc (%)F1 (%)
SegFormer-B5 (Cityscapes)72.4678.8794.56
MaskFormer-Large (ADE20K)78.7984.4894.27
Mask2Former-Large (Cityscapes)82.4291.2896.20
Table 7. Stage 2: Fine-tuning results on Greek coastline dataset.
Table 7. Stage 2: Fine-tuning results on Greek coastline dataset.
ModelmIoU (%)Mean Acc (%)F1 (%)
SegFormer-B5 (Cityscapes)76.8184.1994.79
MaskFormer-Large (ADE20K)80.5989.9393.87
Mask2Former-Large (Cityscapes)85.4394.3396.27
Table 8. Stage 2 final performance: architecture comparison.
Table 8. Stage 2 final performance: architecture comparison.
ArchitecturemIoU (%)Mean Acc (%)F1 (%)
SegFormer-B5 (Cityscapes)76.8184.1994.79
MaskFormer-Large (ADE20K)80.5989.9393.87
Mask2Former-Large (Cityscapes)85.4394.3396.27
Table 9. Performance summary: best Mask2Former-Large model across training stages.
Table 9. Performance summary: best Mask2Former-Large model across training stages.
Training StagemIoU (%)Improvement
Stage 1—Coast Train only84.06baseline
Direct Greek Training82.42−1.64%
Stage 2—Two-Stage Adaptation85.43+1.37%
Note: Improvements relative to Stage 1 baseline (84.06%).
Table 10. Training strategy impact analysis.
Table 10. Training strategy impact analysis.
Training StrategyAnnotation RequirementmIoU (%)Relative Performance
Stage 1 Only (Coast Train)645 US images84.06baseline
Direct Greek Training420 Greek patches82.42−1.95%
Two-Stage Adaptation645 US + 420 Greek85.43+1.63%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Drakopoulou, P.; Tzouveli, P.; Karditsa, A.; Poulos, S. Integrating AI for In-Depth Segmentation of Coastal Environments in Remote Sensing Imagery. Remote Sens. 2026, 18, 325. https://doi.org/10.3390/rs18020325

AMA Style

Drakopoulou P, Tzouveli P, Karditsa A, Poulos S. Integrating AI for In-Depth Segmentation of Coastal Environments in Remote Sensing Imagery. Remote Sensing. 2026; 18(2):325. https://doi.org/10.3390/rs18020325

Chicago/Turabian Style

Drakopoulou, Pelagia, Paraskevi Tzouveli, Aikaterini Karditsa, and Serafim Poulos. 2026. "Integrating AI for In-Depth Segmentation of Coastal Environments in Remote Sensing Imagery" Remote Sensing 18, no. 2: 325. https://doi.org/10.3390/rs18020325

APA Style

Drakopoulou, P., Tzouveli, P., Karditsa, A., & Poulos, S. (2026). Integrating AI for In-Depth Segmentation of Coastal Environments in Remote Sensing Imagery. Remote Sensing, 18(2), 325. https://doi.org/10.3390/rs18020325

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop