GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy

Zhu, Yuanyi; Wang, Fangxiong; Hou, Yingzi; Cui, Zhenqi; Yu, Haomiao; Zhang, Shuai; Liao, Zhiying; Li, Peng; Lu, Yi

doi:10.3390/rs18040607

Open AccessArticle

GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy

by

Yuanyi Zhu

^1,2,†

,

Fangxiong Wang

^1,2,†

,

Yingzi Hou

^1,2

,

Zhenqi Cui

^1,2

,

Haomiao Yu

³

,

Shuai Zhang

^4,*

,

Zhiying Liao

^1,2

,

Peng Li

⁵ and

Yi Lu

¹

School of Geographical Sciences, Liaoning Normal University, Dalian 116029, China

²

Liaoning Provincial Key Laboratory of Physical Geography and Geomatics, Liaoning Normal University, Dalian 116029, China

³

School of Environmental Science and Engineering, Dalian University of Technology, Dalian 116081, China

⁴

The School of Geodesy and Geomatics, Wuhan University, Wuhan 430079, China

⁵

The Institute of Marine Sustainable Development, Liaoning Normal University, Dalian 116029, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Remote Sens. 2026, 18(4), 607; https://doi.org/10.3390/rs18040607

Submission received: 7 January 2026 / Revised: 5 February 2026 / Accepted: 12 February 2026 / Published: 14 February 2026

(This article belongs to the Special Issue AI-Empowered Remote Sensing Monitoring and Geospatial Analysis for Ocean and Coastal Environments)

Download

Browse Figures

Versions Notes

Highlights

What are the main findings?

Proposed GTSegNet, a novel collaborative perception framework that effectively tackles boundary blurring and topological discontinuity in complex island environments. It integrates a Graph Contextual modeling module (GCB) to capture global semantic information and a morphological Topology-Aware Refinement Module (TARM) to sharpen boundaries.
Achieved state-of-the-art performance with an mIoU of 96.96% and a Recall of 98.54% on the self-constructed S2_China_Islands_2024 dataset. Quantitative and qualitative evaluations demonstrate its significant superiority over mainstream methods like U-Net and Mask2Former in accurately identifying small-scale islands.

What are the implications of the main findings?

Technically, demonstrated that combining global contextual dependencies with morphological priors is a highly efficient strategy for high-resolution remote sensing tasks. The synergistic interaction between GCB and TARM provides a new paradigm for maintaining topological consistency in challenging maritime segmentation scenarios.
Practically, provided a robust and automated tool capable of large-scale marine mapping and stable monitoring of coastal dynamics over time. Its excellent generalization ability, validated on both Landsat-8 imagery and multi-temporal datasets, offers critical scientific support for marine resource management and ecological protection.

Abstract

Island coastline segmentation plays a crucial role in remote sensing image processing, especially when the island environment is complex and the scale is small, making segmentation challenging. The complex morphology of the islands and the small islands are the main causes of boundary blurring and topological discontinuity in the segmentation of the island coast. Therefore, this study proposes GTSegNet, an island coastline segmentation method designed to address the issues of boundary blurring and topological discontinuity in complex backgrounds. First, by introducing the Graph Contextual Modeling Module (GCB), the model captures global information and addresses the issue of neglected local features due to complex backgrounds and scale differences, thereby improving the model’s ability to discern blurry boundaries. Secondly, the Morphological Topology-Aware Refinement Module (TARM) is used for boundary sharpening and false response suppression, specifically addressing the issue of topological discontinuity, thus improving the accuracy of boundary localization and the continuity of topological structures. The two modules work synergistically, significantly improving the accuracy of the boundaries and topological continuity of the island coastline. Training and comparative experiments conducted on the newly constructed island coastline dataset demonstrate that GTSegNet achieves an outstanding performance with an mIoU of 96.96% and a Recall of 98.54%. Compared to other remote sensing semantic segmentation methods, GTSegNet consistently exhibits stable advantages in both quantitative and qualitative assessments, showcasing its great potential for large-scale marine mapping and macro-scale monitoring tasks.

Keywords:

semantic segmentation; Sentinel-2; Chinese islands; Topology-Aware Refinement; graph context

1. Introduction

Island coastline refers to the boundary between the island and the ocean, typically the area where the island’s surface meets the seawater [1]. As a key zone of land–ocean interaction, it serves as a crucial basis for scientifically assessing island resources, maintaining regional ecological balance, and is a central element in achieving sustainable coastal zone development [2,3]. On a global scale, island coastlines hold significant ecological, economic, and strategic value. They are core habitats for marine biodiversity, support industries such as fisheries, tourism, transportation, and resource development, and play a key role in marine rights, disaster prevention, and climate change research [4,5,6]. Therefore, accurate extraction and dynamic monitoring of island coastlines are essential foundations for marine scientific research and sustainable coastal zone management. In recent years, driven by global climate change and increasing marine development intensity, the frequency of changes in island coastlines has accelerated, and their morphology has become increasingly complex, particularly in areas with intricate topography such as small islands and reefs, detrital and muddy coasts, and nearshore shoals [7]. Against this backdrop, the demand for high-precision, automated coastline extraction has significantly increased to support decision-making across various domains, including ecological assessment, resource management, and disaster prevention and mitigation [8]. Compared to traditional manual surveying and aerial photography, satellite remote sensing offers advantages such as wide coverage, high temporal resolution, and long-term traceability, enabling the rapid acquisition of high-quality imagery over large areas, multiple time periods, and various spectra [9,10]. In recent years, the open sharing of high-resolution satellite data from sources such as the European Sentinel and the U.S. Landsat satellites has greatly expanded the spatiotemporal coverage of remote sensing imagery and enhanced the accuracy and efficiency of coastline extraction. This development has allowed remote sensing-based coast extraction technologies to better meet the demand for high-precision automated extraction, providing more reliable technical support for fields such as ecological monitoring, resource management, and disaster prevention and mitigation [11,12].

Automated coastline extraction methods based on remote sensing imagery are diverse and can generally be categorized into threshold segmentation, edge detection, active contours, machine learning, multi-scale analysis, and object-based approaches [13]. The threshold segmentation method distinguishes water from land by setting spectral difference thresholds (e.g., NDWI). Although the process is simple, the choice of threshold is highly sensitive to environmental conditions, limiting its robustness [14]. Edge detection operators (e.g., Sobel, Canny) extract features from the coastline using gradient-based methods. While these tools are widely used, they often struggle with noisy data and surface patterns, creating broken or wrong outlines [15]. Active contour models (e.g., Snake) iteratively fit the shape of the coastline, achieving high precision, but at the cost of increased model complexity and computational demands [16]. Machine learning methods (e.g., supervised classification, clustering) can achieve semi-automated or fully automated recognition using statistical features. However, they rely heavily on feature engineering and the quality of training samples, and their performance can be unstable in complex scenarios [17]. Multi-scale and object-based methods can extract more details in complex backgrounds, but are still limited in terms of direction consistency and processing efficiency in large areas [18,19]. In general, traditional methods still exhibit shortcomings in terms of refinement, automation, and cross-regional adaptability [20].

In recent years, deep learning has demonstrated powerful feature extraction and representation capabilities in remote sensing image analysis, particularly excelling in multi-scale feature modeling and contextual semantic fusion [21]. In coastline extraction tasks, modern deep convolutional neural networks can capture both fine spatial details and long-range dependencies, making them highly suitable for complex, irregular coastlines with intricate boundaries and small-scale island features in high-resolution, multi-source remote sensing imagery. Existing studies have shown that deep learning-based semantic segmentation methods significantly outperform traditional approaches in terms of accuracy and automation. The Sentinel-2 Water Edges Dataset (SWED) proposed by Seale et al. [22], by integrating U-Net with Sobel edge loss, the model enhances boundary details and significantly reduces inaccurate segmentation of fine sandy beaches and rocky coastlines, achieving an F1 score of approximately 93%. However, the method struggles with inadequate detail handling in areas with complex boundaries and significant texture variations, failing to fully extract local features, resulting in inaccurate segmentation or unclear boundaries. Aghdami-Nia et al. [23] improved U-Net and integrated multispectral information, improving the precision of coastline extraction for Landsat-8 imagery. However, the method still relies on simple convolutional neural networks and does not fully leverage complex feature fusion, particularly underperforming in small target segmentation. Sun et al. built a multi-level model that works well with messy backgrounds; however, it needs too much power and ignores the big picture of the data, making it slow for huge tasks [24]. Surprisingly, basic light-based tools can sometimes beat complex deep models in fixed cases. For example, O’Sullivan et al. [25] found that in the Landsat-based Irish coast segmentation task, NDWI precision (97.2%) outperformed U-Net (95.0%), suggesting that deep models need further improvement in data adaptability and robustness. Even if deep learning models show clear gains for finding coasts, they still struggle to fix rough edges, handle different sizes, or spot tiny islands.

In recent years, researchers have made significant efforts to improve the accuracy of remote sensing image semantic segmentation, focusing primarily on three main areas: attention mechanisms, multi-scale feature modeling, and edge-aware optimization [26,27,28]. Attention mechanisms enhance feature representation by focusing on key regions. Common approaches, such as Dual Attention networks and the pyramidal pooling attention mechanism, have achieved significant improvements in precision [29]. Multi-scale feature modeling improves the fusion of global semantics and local details without increasing computational load, using methods like pyramid pooling and dilated convolutions. Networks such as PSPNet and RAANet have shown excellent performance in high-resolution remote sensing imagery [30,31]. Edge-aware optimization improves boundary precision in complex scenes by incorporating explicit edge information. For example, the edge-aware network proposed by Yuan et al. has demonstrated an effective enhancement of boundary precision in remote sensing cloud detection [32,33]. Still, when dealing with tiny islands and messy views, the lack of clear edges makes shores look fuzzy and causes hidden small objects.

To address the challenges faced by existing methods in complex island coastline extraction, this paper proposes an innovative deep learning-based framework. The framework uses ResNet-50 [34] as the backbone network and constructs a segmentation network that integrates multi-scale pyramid pooling and graph contextual modeling. The network enhances long-range semantic dependencies through a non-local attention mechanism and introduces a topology refinement module, which adaptively optimizes boundary details using morphological gradient priors. Data confirms that this model fixes broken lines and spots small lands easily. It cuts heavy workloads while keeping the output exact.

2. Materials

2.1. Study Area

This study selects nine typical island regions along the eastern and southern coasts of China as research subjects, covering provinces such as Liaoning, Shandong, Zhejiang, Fujian, and Guangdong. Spanning the Yellow and South China Seas, the chosen sites include varied settings. The list runs from tropical shores to temperate coasts, mixing silt-heavy shallows with transparent ocean waves. They provide a comprehensive reflection of the characteristics of the island coast in typical scenarios, with distinct geographical differentiation and representativeness with their geographical locations shown in Figure 1.

Specifically, the study area includes Changhai County, Liaoning (T4,

122 ° 10^{'}

–

122 ° 50^{'}

E,

38 ° 50^{'}

–

39 ° 10^{'}

N), where the islands are dense and the coast is winding; Offshore Yantai, Shandong (T6,

120 ° 40^{'}

–

121 ° 00^{'}

E,

27 ° 50^{'}

–

28 ° 10^{'}

N), where the islands are close to the land and are significantly influenced by human activities; The Zhoushan Archipelago, Zhejiang (T1, T3,

121 ° 40^{'}

–

121 ° 50^{'}

E,

29 ° 50^{'}

–

30 ° 00^{'}

N), significantly affected by tides and sediment, with various island and coastline forms; Hong Kong (T2,

113 ° 45^{'}

–

114 ° 00^{'}

E,

22 ° 10^{'}

–

22 ° 20^{'}

N); Pingtan Island, Fujian (T5,

119 ° 20^{'}

–

119 ° 50^{'}

E,

25 ° 20^{'}

–

25 ° 40^{'}

N), with various forms of the coast, large island size and natural and anthropogenic features; The Taiwan Strait (T7,

119 ° 30^{'}

–

119 ° 50^{'}

E,

23 ° 00^{'}

–

23 ° 30^{'}

N), with widely distributed islands and complex strait waters; Offshore Shantou, Guangdong (T8,

117 ° 00^{'}

–

117 ° 30^{'}

E,

23 ° 50^{'}

–

24 ° 10^{'}

N), with various forms of islands and coastlines, combining both natural and anthropogenic features; Weizhou Island, Guangxi (T9,

109 ° 00^{'}

–

109 ° 10^{'}

E,

21 ° 00^{'}

–

21 ° 05^{'}

N), with clear and transparent waters, distinct coastlines and typical tropical island coasts.

2.2. S2_China_Islands_2024 Dataset and Data Processing Strategy

This study relies on the Google Earth Engine (GEE) platform (https://developers.google.com/earth-engine/, accessed on 26 April 2025) which integrates global multi-source remote sensing imagery and high-performance computing capabilities. It supports large-scale spatiotemporal image filtering, preprocessing, and export operations, providing an efficient approach to rapidly build remote sensing datasets [35,36].

Based on this, this study constructs a remote sensing image dataset for the semantic segmentation of the island coastline in China, named S2_China_Islands_2024. The dataset uses the European Space Agency (ESA) Sentinel-2 satellite as the primary data source [37], covering typical coastal island regions in Guangdong, Fujian, Liaoning, and other provinces. The image time range is set from 1 October 2024, to 30 December 2024, with strict control to ensure that cloud cover is below 1%. This temporal selection aligns with established spatiotemporal analyzes, indicating that the fall months provide the optimal window for acquiring clear optical imagery in the study area. Securing high-fidelity observations during this period is a prerequisite for ensuring the reliability of the manual boundary delineation in the proposed dataset [38,39]. The masking of clouds and cirrus is performed using the quality band (QA60) provided by GEE [40], ensuring the clarity and consistency of the image. For band selection, we chose four 10m spatial resolution bands: red, green, blue, and near-infrared (B2, B3, B4, B8), and generated high-quality cloud-free images using pixel-wise mean synthesis. To effectively enhance the spectral difference between water and non-water areas, we computed the Normalized Difference Water Index (NDWI):

N D W I = \frac{B 3 - B 8}{B 3 + B 8},

(1)

where B3 and B8 are the remotely sensed reflectance values of Sentinel-2 bands 3 and 8, respectively.

Data labeling was performed using ArcGIS Pro 3.1.5 for pixel-level precise annotation, with categories including “island” and “water”. The samples and labeled images for each region were then cropped using a sliding window (with a step size of 128). After removing samples from areas without islands, a total of 6721 valid images of size 256 × 256 were obtained. The images were randomly divided into a training set and a validation set in an 80:20 ratio, resulting in 5377 training images and 1344 validation images, which were used for model optimization, and some data samples are shown in Figure 2.

3. Methodology

3.1. Overall GTSegNet Architecture

To address issues such as complex backgrounds, insufficient long-range dependencies, and blurred boundaries in island coastline segmentation, this paper proposes an improved deep semantic segmentation framework, GTSegNet, and the network architecture is illustrated in Figure 3. This study proposes an improved semantic segmentation architecture based on a deep residual network (ResNet-50), which balances global semantic perception and local detail preservation. Our design adds two special parts. First, the TARM uses shape-based steps to pull out edges and makes them sharper for better lines. Next, the GCB builds a map of links between pixels, helping the system understand how distant spots relate. The overall architecture ensures sensitivity to small-scale islands while maintaining robustness in complex maritime backgrounds.

3.2. Topology-Aware Refinement Module

To improve boundary accuracy in island coastline segmentation and maintain topological structure, this paper proposes the Topology-Aware Refinement Module (TARM). This unit fixes edges using shape-based math and the gap between high and low pooling steps. This plan solves the problem of fuzzy lines seen in older tools. It significantly enhances segmentation performance, especially when dealing with complex coastlines and small-scale islands [41,42,43], the structure of this block is shown in Figure 4. The module enhances segmentation performance through the following key steps:

Calculation of the Morphological Gradient: The morphological gradient is a classical method to highlight the contours of the object. The TARM model calculates the morphological gradient by computing the difference between the max pooling and the negative max pooling (i.e., the min pool). Such math boosts how the model sees messy island borders. The true form of the land stays safe, allowing for sharp and clear boundary results.
Convolution and Gating Mechanism: After the characteristics of the morphological gradient are enhanced by convolutional layers, they are adjusted by the gating mechanism [44]. This mechanism automatically controls the enhancement or suppression of the morphological gradient in boundary regions through a trainable gating coefficient. This smart gate keeps edge shapes connected and helps the final lines stay true to real borders.
Target Refinement and Connectivity Assurance: In traditional convolutional neural networks, downsampling operations often lead to blurred boundaries, which in turn affect segmentation accuracy. TARM combines the calculation of the morphological gradient with the extraction of convolutional features to precisely refine the target boundaries and reduce the occurrence of false responses. Especially when handling small-scale islands and complex coastlines, TARM effectively enhances the feature response in boundary regions, reducing the boundary blurring issues caused by downsampling and spatial smoothing operations. Additionally, TARM keeps the whole island linked properly, making sure no parts get lost and the edge stays in one piece. This flow can be denoted as follows:

D (F) = max_{u, v \in Ω} F (u, v)

(2)

E (F) = min_{u, v \in Ω} F (u, v)

(3)

G_{m} (F_{g r a p h}) = D (F_{g r a p h}) - E (F_{g r a p h})

(4)

R = σ (α) \cdot M (G_{m} (F_{g r a p h}))

(5)

F_{r e f i n e} = F_{g r a p h} + R

(6)

where

F_{g r a p h}

is the output feature map of the Graph Context Block;

D (F)

denotes the Dilation operator, which takes the maximum value in the neighborhood to highlight the external contour of edges;

E (F)

represents the Erosion operator, which takes the minimum value in the neighborhood to highlight the internal contour of edges;

G_{m} (F graph) = D (F_{graph}) - E (F_{graph})

is the Morphological Gradient, representing the difference between dilation and erosion and used to extract edge contours; M refers to a module comprising convolution and nonlinear activation functions (e.g., Conv + BN + ReLU) for further encoding of edge features;

R = σ (α) \cdot M (G_{m} (F_{graph}))

is the edge refinement residual term, where

σ

denotes the Sigmoid activation function and

α

is a scaling coefficient; and

F_{r e f i n e} = F_{g r a p h} + R

is the final output feature map, which combines the original features with the edge refinement residual to achieve topological structure optimization.

3.3. Graph Context Block

To enhance the global semantic consistency of the feature map, this paper introduces the Graph Contextual Modeling Module (GCB). Its core idea is to construct pixel points as graph nodes and leverage graph attention mechanisms to capture long-range dependencies and spatial structural correlations, addressing the limitations of traditional convolutional operations in global context modeling [44]; the structure of this block is shown in Figure 5. The specific implementation of this module includes the following steps:

Model Projection and Similarity Calculation: First, the input feature map is linearly projected into a lower-dimensional space. In the lower-dimensional space, the semantic similarity between nodes is calculated, generating a similarity matrix that represents the relationships between pixels. This step involves encoding each pixel in the feature map, allowing the network to capture the similarities between different pixels and represent them within the graph structure.
Attention mechanism for graph structure: Using the calculated similarity matrix, GCB constructs an attention map for each node (pixel). This map computes the similarity between pixels and applies a Top-K selection mechanism for weighted aggregation [45]. The Top-K selection helps maintain the sparsity of the graph structure, avoiding computational redundancy in fully connected self-attention calculations, while retaining important long-range dependencies. It efficiently captures long-range dependencies and aggregates features related to the target region.
Weighted Aggregation and Contextual Feature Update: After similarity calculation and Top-K selection, GCB fuses neighborhood information with target pixel features through weighted aggregation to generate new contextual features. These aggregated features capture long-range dependencies in the image, improving the structural perception and semantic consistency of the target, thus improving the extraction of meaningful features in complex scenarios. This flow can be denoted as follows:

s_{i j} = \frac{q_{i} \cdot k_{j}}{\sqrt{d_{k}}}

(7)

α_{i j} = \frac{exp (s_{i j})}{Σ_{j \in N (i)} exp (s_{i j})}

(8)

y_{i} = \sum_{j \in N (i)} α_{i j} ν_{j}

(9)

F_{g r a p h} = F + γ Y

(10)

where

s_{i j}

is the similarity score;

q_{i}

and

k_{j}

are the query and key vectors;

d_{k}

is the dimension of the characteristic;

α_{i j}

is the attention weight;

N (i)

denotes the set of neighboring nodes of node i;

y_{i}

is the aggregated contextual characteristic;

v_{j}

is the value vector;

F_{g r a p h}

is the output feature map; F is the input feature map; Y is the global context feature map; and

γ

is a learnable parameter.

3.4. Implementation Details

To validate the superior performance of GTSegNet in extracting island coastlines from remote sensing imagery and ensure the fairness of the comparative experiments, this study adopts the PyTorch 2.0.0 deep learning framework, with the runtime environment set to Python 3.8 (Ubuntu 20.04) and GPU acceleration provided by CUDA 11.8. In terms of hardware, an NVIDIA RTX 4090D (24 GB VRAM) is used as the primary computing device. For training parameters, the AdamW optimizer is used, combined with momentum = 0.95 and the Cosine Annealing learning rate decay strategy to ensure stable convergence and strong generalization performance. The initial learning rate is set to 0.0001, with a batch size of 32 and 50 epochs. Additionally, dropout = 0.2 is introduced during network training to mitigate the risk of overfitting.

In terms of the loss function, a combined loss of Cross-Entropy Loss and Dice Loss is used to balance pixel-level classification accuracy and boundary delineation capability.This configuration ensures fast model convergence while effectively improving the robustness and edge prediction performance in segmentation tasks.

3.5. Evaluation Metrics

To quantitatively evaluate the performance of the proposed GTSegNet in island coastline segmentation, we adopt four commonly used segmentation evaluation metrics: mean Intersection over Union (mIoU, %) [46], mean Pixel Accuracy (mPA, %) [47], mean Precision (mPrecision, %) [48], and mean Recall (mRecall, %) [49]. These metrics provide a comprehensive reflection of the model’s ability to distinguish between categories and its overall performance in the segmentation task. These metrics are calculated as follows:

m P r e c i s i o n = \frac{T P}{T P + F P},

(11)

m R e c a l l = \frac{T P}{T P + T N},

(12)

I o U = \frac{T P}{T P + F P + F N},

(13)

m P A = \frac{\sum_{i = 1}^{N} T P_{i}}{\sum_{i = 1}^{N} (T P_{i} + F P_{i})}

(14)

where

T P

is the number of samples for which the model correctly predicts positive cases,

T N

is the number of samples for which the model correctly predicts negative cases,

F P

is the number of samples for which the model incorrectly predicts actual negative cases as positive cases, and

F N

is the number of samples for which the model incorrectly predicts actual positive cases as negative cases.

4. Results

4.1. Comparative Experiments

To comprehensively evaluate the performance of the proposed method, we selected seven representative semantic segmentation models for comparison with GTSegNet, including UNet, DeepLabv3+, DDRNet, BiSeNet, PSPNet, SegFormer, and Mask2Former. UNet uses a symmetric encoder-decoder structure and bridges the encoder and decoder with skip connections to fuse multi-scale features, thus finely preserving image details and accurately localizing segmentation boundaries [50]; DeepLabv3 + adds a lightweight decoding module to the DeepLabv3 framework to finely recover spatial information lost during downsampling, thus enhancing segmentation performance at object boundaries [51]; DDRNet consists of two deep branches with high and low resolutions, performing multiple bidirectional feature fusions between the branches. Introduces a Deep aggregate Pyramid Pooling Module (DAPPM) to expand the receptive field and integrate multi-scale context, achieving a good balance between real-time performance and precision [52]; BiSeNet uses a dual-branch architecture with Spatial Path and Context Path, retaining high-resolution details while using a low-resolution backbone to extract large-scale semantic information. The two paths are fused to balance segmentation speed and accuracy [53]; PSPNet introduces the Pyramid Pooling Module, which pools the feature map at multiple scales and merges local and global features, providing effective global context priors to improve pixel-level parsing capabilities [30]. In addition, we compared two advanced transformer-based segmentation models: SegFormer and Mask2Former.SegFormer uses a hierarchical Transformer encoder (removing explicit positional encoding) and a lightweight full MLP decoder, achieving a simple architecture that efficiently outputs high-precision segmentation results without complex designs [54]; Mask2Former uses a Transformer decoder combined with a mask attention mechanism, restricting cross-modal attention within the predicted mask region. This allows the model to handle semantic, instance, and panoptic segmentation tasks within the same architecture, achieving state-of-the-art performance on various segmentation benchmarks [55].

The quantitative evaluation results are shown in Table 1. GTSegNet significantly outperforms the comparison methods in all four metrics. Specifically, mIoU reaches 96.96%, improving by 3.34% compared to the second-best model, Mask2Former (93.62%), which highlights its advantage in pixel-level classification accuracy. Additionally, GTSegNet achieves the highest values in mPA (98.54%), mPrecision (98.37%), and mRecall (98.54%), with an overall lead of approximately 1 percentage point. This proves the network maintains robustness by suppressing false predictions, ensuring the consistency of the final output. It should be noted that in this experiment, the values of mPA and mRecall are identical due to the inherent extreme class imbalance in the coastline extraction task: most pixels belong to simple backgrounds, so the main category shows little gap between its two main score types. In summary, GTSegNet not only surpasses existing methods in overall accuracy, but also maintains optimal performance across key metrics, demonstrating its excellent performance in island extraction tasks from remote sensing imagery.

To qualitatively assess performance differences, this paper selects several representative regions and visualizes and compares the extraction results produced by various methods.

As shown in Figure 6, the first two rows are the ground truth images and their corresponding labels. The yellow rectangular boxes in the figure highlight the areas of interest. As shown in Figure 6, in scenes (a), (b) and (c), due to the complexity of the winding peninsula-shaped island coasts, the model fails to accurately extract the edge of the island coast. After segmentation by U-Net, DeepLabv3+, and DDRNet, the island coastlines initially have low curvature and exhibit connectivity issues, causing the extracted island contours to remain open and fail to form a complete coastline. In addition, there is an issue of excessive smoothing, which does not accurately reflect the complex edge features of the islands. In scenes (d), (e), and (f), which contain multiple isolated islands, different networks show various problems. First, DeepLabv3+, PSPNet, and Mask2Former experience missed detection for small islands. Even when some small islands are detected, their predicted contours are severely distorted, often degenerating into a simple point or blob, completely losing the true geographic shape. Next, U-Net, BiSeNet V2, and SegFormer face frequent fragmentation and disconnection issues when extracting the coastlines of large islands. The output coastlines generally suffer from excessive smoothing, with curvature much lower than the true values, making it impossible to accurately capture the complex geometric shapes of the coastlines. In the above complex scenarios, our proposed GTSegNet model demonstrates stronger robustness, with a higher detection rate for small islands. The generated coastlines closely match the real ones in terms of continuity and contour details.Finally, in the composite topographic scenes of scenes (g) and (h), which include both peninsulas and isolated islands, the issues of various networks are not only reflected in missed and false detections, but also in the low spatial structural quality of their output (e.g., coastline breaks and excessive smoothing). GTSegNet specifically addresses these challenges, with significant improvements in coastline continuity and morphological fidelity.

In summary, GTSegNet demonstrates optimal performance in boundary clarity, regional connectivity, and control of false negatives/positives by combining global context modeling with topology detail optimization. Crucially, the comparisons demonstrate that the quantitative improvement of approximately 4% is not merely statistical, but reflects substantial enhancements specifically concentrated near the coastline boundaries. As evidenced by the visual comparisons in Figure 6 scenes (a–h), GTSegNet effectively resolves critical topological issues, including boundary fractures, over-smoothing, and missed detections of small islands. These corrections ensure higher geometric fidelity and topological continuity, which are essential for preserving the morphological integrity of islands and ensuring the precise extraction of coastline lengths, thereby validating the model’s practical superiority beyond pixel-level metrics.

4.2. Model Complexity and Inference Efficiency Analysis

To evaluate the feasibility of GTSegNet for real-time satellite edge applications, we conducted a comprehensive assessment of model complexity (parameters) and inference speed (FPS). All tests were performed using the hardware configuration detailed in Section 3.4. The results are provided in Table 2, and the efficiency trade-off is visualized in Figure 7, where larger bubbles indicate higher latency.

As shown in Figure 7 and Table 2, GTSegNet strikes an optimal balance between model complexity and inference speed. While lightweight models such as BiSeNet (291 FPS) and DDRNet (207 FPS) achieve higher speeds, their significantly lower parameter counts restrict their ability to capture fine-grained coastline details.

Among high-capacity models, GTSegNet stands out with competitive efficiency. With 46.7 M parameters, it achieves 92 FPS (10.87 ms latency), comparable to DeepLabv3+ (91 FPS) and faster than the heavier Mask2Former (83 FPS, 12.05 ms). While not the absolute fastest compared to ultra-light models, GTSegNet offers a favorable trade-off, providing sufficient real-time capability for satellite-based segmentation without compromising the representational power needed for high-precision tasks.

4.3. GTSegNet Training Parameter Analysis and Stability Verification

To ensure the reliability and reproducibility of the performance evaluation of the GTSegNet model and verify the stability of its training process, this study performs a systematic comparative analysis of key training hyperparameters—Learning Rate [56] and Optimizer [57]. The purpose of this experiment is to verify the training stability of GTSegNet and its ability to maintain high performance under different parameter configurations, which is a crucial prerequisite for assessing its generalizability.

The learning rate is a critical factor affecting the model’s convergence speed and accuracy. Searching for the best fit, our team tested a list of four learning paces. We picked points between 0.0001 and 0.0004, adding a small gap of 0.0001 between each test. The experimental results are shown in Table 3. When the learning rate is 0.0001, the mIoU reaches the highest value of 96.96%, mPA is 98.54%, mPrecision is 98.37%, and mRecall is 98.54%, significantly outperforming other learning rate configurations. This suggests that a smaller learning rate effectively suppresses gradient oscillations, facilitates stable convergence, and significantly improves segmentation accuracy. A larger learning rate (e.g., 0.0004) accelerates training but leads to a decrease in precision for detail recovery, with mIoU at 95.82%. This indicates that an excessively high learning rate may cause unstable convergence and accuracy loss. Therefore, selecting 0.0001 as the final learning rate configuration helps achieve the best segmentation accuracy and ensures a stable training process.

Optimizers play a key role in the convergence speed, accuracy, and generalization ability of deep learning models. In this study, we compared four commonly used optimizers: SGD, a classical optimizer with good stability but prone to local optima [58]; Adam, which combines first and second moment estimates and is suitable for most tasks with faster convergence [59]; AdaGrad, an adaptive learning rate optimizer that works well with sparse data but suffers from premature learning rate decay [60]; and AdamW, which adds weight decay to Adam to help mitigate overfitting. Comparative experiments were conducted to evaluate the performance of different optimizers in this task [61]. The experimental results, as shown in the Table 4, indicate that AdamW outperforms other optimizers and achieves the best performance across all evaluation metrics. When using AdamW, mIoU reaches 96.96%, mPA is 98.54%, mPrecision is 98.37%, and mRecall is 98.54%. In contrast, AdaGrad performs the worst due to premature learning rate decay (mIoU = 95.48%), while SGD (mIoU = 95.91%) and Adam (mIoU = 96.42%) perform better but still fall short of AdamW. This result indicates that AdamW not only improves model accuracy but also effectively prevents overfitting, demonstrating the best stability and generalization ability.

Based on the above experimental results, the optimal hyperparameter configuration for GTSegNet is a learning rate of 0.0001 and the AdamW optimizer. This configuration not only effectively improves training stability but also excels in all four metrics: mIoU, mPA, mPrecision, and mRecall, ensuring optimal segmentation accuracy and detail recovery, while also guaranteeing the fairness of the evaluation and the rigor of the conclusions.

4.4. Ablation Study

To comprehensively validate the effectiveness of the core modules of the proposed GTSegNet, this study designed a systematic ablation experiment. The experiment uses ResNet-50 as the backbone network and PSPNet with the pyramid pooling module as the baseline, progressively introducing TARM and GCB to compare the impact of different module combinations on segmentation results; Another group involves the visualization of the Gradient-weighted Class Activation Mapping (Grad-CAM) of feature responses [62], which reveals changes in the model’s focus on semantic regions at the feature level, intuitively demonstrating the enhancement effect of the modules on semantic representation and boundary recognition. The combination of these two experimental groups allows for a systematic validation of the effectiveness of the proposed modules’ on both quantitative metrics and visualization results.

The quantitative analysis, as shown in Table 5, reveals that the Baseline model has limited performance across all metrics, with an mIoU of 93.70% and an mPA of 96.87%. After introducing TARM, the boundary refinement ability improved significantly. The mIoU increased by 1.96% (from 93.70% to 95.66%), mPA increased by 0.94% (from 96.87% to 97.81%), mPrecision increased by 1.14% (from 96.60% to 97.74%), and mRecall increased by 0.94% (from 96.87% to 97.81%). After introducing GCB, modeling global dependencies effectively mitigated the omission problem. The mIoU increased by 1.25% (from 93.70% to 94.95%), mPA increased by 0.69% (from 96.87% to 97.56%), mPrecision increased by 0.65% (from 96.60% to 97.25%), and mRecall increased by 0.69% (from 96.87% to 97.56%). Finally, when both TARM and GCB modules are combined, the mIoU increased by 3.26% (from 93.70% to 96.96%), mPA increased by 1.67% (from 96.87% to 98.54%), mPrecision increased by 1.77% (from 96.60% to 98.37%), and mRecall increased by 1.67% (from 96.87% to 98.54%), fully verifying the complementary advantages of TARM and GCB in boundary refinement and global modeling.

Qualitative analysis further supports the quantitative conclusions, with the results shown in Figure 8. Qualitative comparisons of the models of the ablation experiment were conducted in typical peninsula-shaped island scenes (a–d). The analysis found that the Baseline results generally suffer from coastline smoothing, with the extracted contours showing significantly lower curvature than the real terrain. After introducing the Topology-Aware Refinement Module (TARM), the model’s ability to capture coastline details significantly improved, and the curvature was notably enhanced; similarly, the introduction of the Graphic Contextual Module (GCB) also enhanced long-range dependencies, making the coast direction more accurate. However, a notable phenomenon is that when the TARM and GCB modules were introduced individually, they unexpectedly caused patchy local missed detections in certain areas (as indicated by the white rectangles in the figure), which were correctly identified by the baseline model. This phenomenon suggests that, while a single module enhances a specific capability, it may introduce instability in other dimensions. In isolated island scenes (e–h), the baseline can detect the target, but the contour shape is distorted. In contrast, models with the proposed modules, especially the full model, significantly optimized the integrity of the island contours. The extracted results closely matched the true shape, further validating the universal advantage of the proposed modules in improving the fidelity of the shape. The GTSegNet model, which combines both modules, performs best in boundary clarity and target coherence, nearly eliminating loss of detail and artifacts.

To further reveal the function of each module, we use Grad-CAM to perform an attention visualization analysis on different modules, thereby enhancing the interpretability of the modules used in this study, with the results shown in Figure 9. The transition from blue to red indicates an increased model attention to the corresponding regions. The response of the Baseline model is relatively scattered, and some real target areas do not receive adequate attention. After adding TARM, the red and yellow regions in the heatmap are significantly concentrated around the target area. The model becomes more sensitive to the capture of edge information, effectively enhancing boundary clarity and reducing background interference. After GCB is introduced, the response area extends to the entire target, significantly enhancing the capture of global structure. Finally, the activation of the TARM + GCB model almost exclusively covers the real target area, with background interference effectively suppressed, further enhancing the model’s semantic focus ability.

5. Discussion

5.1. Model Applications

Accurate and efficient island coastline extraction is crucial for coastal zone resource management, ecosystem monitoring, and island dynamic change analysis. However, due to the complex geomorphological features of islands and significant influences from tides and human activities, achieving high-precision, operationally feasible automatic coastline extraction remains a challenge. To validate the practical value and generalization ability of the GTSegNet model proposed in this study in real and diverse scenarios, we applied it to typical islands in different seas of China and conducted large-scale, multi-temporal coastline extraction experiments.

By segmenting remote sensing images of multiple island regions (such as islands in Liaoning, Guangdong, and Fujian provinces) from 2020 to 2025, this study demonstrates the potential of the model for monitoring the large-scale island coast.

From the results shown in Figure 10, the GTSegNet model can effectively identify the coastline contours of most islands and successfully capture the main characteristics of the coastline of the islands.

Further analysis shows that the GTSegNet model demonstrates stable and efficient performance in island coastline extraction tasks across different regions. In certain island areas of Liaoning Province, the area in 2020 and 2025 was 232.62 km² and 229.73 km², respectively, with a change of about 1.26%. The perimeter increased from 37.49 km to 38.48 km, with a change of approximately 2.65%. For some islands in Guangdong Province, the area changed from 241.25 km² to 238.66 km², with a change of approximately 1.08%. The perimeter increased from 290.99 km to 292.56 km, with a change of approximately 0.54%. In contrast, some islands in Fujian Province had an area of 321.47 km² in 2020 and 331.91 km² in 2025, with a change of approximately 3.27%. The perimeter increased from 236.71 km to 241.18 km, with a change of approximately 1.89%. The most significant change was due to the construction of Xiamen Xiang’an International Airport, a project that involved large-scale land reclamation and artificial alterations to the coast.

5.2. Generalization Ability of GTSegNet

To further validate the effectiveness and generalizability of the proposed method, we conducted a cross-dataset evaluation on a public dataset [63]. This study selected the widely used public dataset in the remote sensing field—the Landsat-8 OLI land–water segmentation dataset—for experimental comparison and analysis [64]. This data set is widely used in land–water semantic segmentation research and serves as an important benchmark for evaluating the performance of the coastline extraction algorithm. This dataset comprises multi-temporal images with cloud cover less than 5%. The data set provides two commonly used band combinations. Considering the input band configuration of the proposed method, this study selected the true color RGB bands (4-3-2) for training and validations: true color RGB bands (4-3-2) and combined near-infrared bands (5-6-4), with the results shown in Figure 11.

After an experimental cross-dataset evaluation, GTSegNet demonstrated outstanding overall performance on the Landsat-8 OLI land–water segmentation dataset, as shown in Table 6. In particular, its mean inter-union (mIoU) reached 96.66%, an improvement of nearly 4 percentage points compared to the best result reported in the original literature (92.98%), marking a substantial breakthrough in segmentation accuracy.

Although GTSegNet is not the highest in terms of overall accuracy (98.91%) and recall (98.35%), its leading performance in mIoU is more convincing. This is because in land–water segmentation tasks with imbalanced class distributions, overall accuracy is often dominated by the background class (e.g., vast oceans) that occupies the majority of pixels, making it difficult to accurately reflect the segmentation quality of foreground targets (e.g., coastline boundaries). The advantage of GTSegNet in the key metric mIoU demonstrates the effectiveness of the synergistic interaction between its proposed TARM and GCB modules, which optimize the segmentation of complex coastline boundaries and are crucial for the practical application of coastline extraction.

Environmental dynamics, such as vegetation phenology and tidal fluctuations, inevitably alter the spectral characteristics of coastal boundaries [65,66]. Furthermore, variable atmospheric conditions, particularly differing levels of cloud cover, impose varying degrees of interference and partial occlusion [67]. Unlike conventional spectral-thresholding techniques that are inherently susceptible to such shifts, the proposed GTSegNet improves robustness against these variations through collaborative perception strategies. Specifically, the Graph Contextual Modeling (GCB) and Topology-Aware Refinement (TARM) modules capture intrinsic topological structures and global semantic contexts, which remain robust despite environmental and atmospheric changes. This capability is empirically validated by the model’s superior generalization performance on the Landsat-8 OLI land–water segmentation dataset [64]. As this dataset comprises multi-temporal imagery spanning different years, distinct climatic zones, and varying cloud conditions (up to 5%), the consistent high performance confirms that GTSegNet possesses strong generalization capabilities and is robust to the complex spatiotemporal and atmospheric challenges.

5.3. Potential for Multi-Modal Extension

Although this study focuses on validating the effectiveness of GTSegNet using Sentinel-2 optical imagery, we recognize the unique advantages of Synthetic Aperture Radar (SAR) in maritime remote sensing [68]. Unlike optical sensors constrained by solar illumination and cloud cover, SAR provides day-and-night all-weather imaging capabilities, which are particularly valuable for continuous coastal monitoring in cloud-prone tropical and subtropical regions [69]. Recent studies have shown that SAR data, either alone or in combination with optical imagery, can support automated coastline extraction by exploiting the backscatter contrast between relatively smooth water surfaces and rougher land areas [70,71]. These findings indicate that SAR-based or multi-modal approaches can serve as effective complements when optical observations are unavailable or severely degraded.

Despite these advantages, the current implementation of GTSegNet is primarily tailored for optical imagery and exploits reflectance-based cues for precise water–land discrimination. These cues are intrinsically associated with optical sensing mechanisms and differ from the microwave backscattering characteristics of SAR data. Nevertheless, the core network architecture, particularly the GCB and TARM modules, is not inherently modality-specific. Consequently, by adapting the input representations or incorporating optical–SAR data fusion strategies [72], the proposed framework provides a solid foundation for extension to SAR-based coastal monitoring in future research.

6. Conclusions

In this study, the GTSegNet model is proposed to address a series of challenges in complex island coast extraction tasks, particularly issues such as boundary blurring, insufficient long-range dependency modeling, and small-scale island detection. Through in-depth analysis and reflection on the limitations of traditional coast extraction methods, this study uses multimodal remote sensing data and introduces an innovative deep learning architecture, aiming to improve the accuracy, robustness, and generalizability of island coast extraction.

GTSegNet is based on the ResNet-50 architecture and captures long-range dependencies through the Graph Contextual Modeling Module (GCB), enhancing semantic consistency. To overcome the limitations of traditional methods in handling boundary blurring and topological structure degradation, the Topology-Aware Refinement Module (TARM) was designed. This module strengthens edge details and object connectivity by incorporating morphological gradient information, significantly improving segmentation accuracy.

By constructing the high-quality S2_China_Islands_2024 data set and conducting multiple quantitative experiments and ablation analyzes, the experimental results show that GTSegNet significantly outperforms traditional methods and existing deep learning approaches in multiple evaluation metrics. mIoU of 96.96%, mPA of 98.54%, Precision of 98.37%, and Recall of 98.54%, which represent significant improvements of 2–4 percentage points compared to U-Net (92.84%), PSPNet (93.70%) and Mask2Former (93.62%), further validating the effectiveness and applicability of the model in different environments. Ablation experiments and Grad-CAM visualization analysis further reveal the synergistic gain of the GCB and TARM modules in enhancing semantic focus and boundary refinement, clarifying the key contributions of each module to the model’s performance.

Additionally, this study also demonstrates the application potential of GTSegNet in large-scale remote sensing imagery. By segmenting remote sensing images from multiple typical regions in 2020 and 2025, the generalizability and sensitivity of the model to temporal changes were validated. This achievement not only provides an advanced tool for the precise extraction of the island coastline but also provides scientific support for decision-making in fields such as marine resource management and ecological protection.

In general, the GTSegNet model overcomes several limitations of traditional methods in complex island coast extraction, offering superior segmentation accuracy and broad application potential. With the continued growth of remote sensing data and the ongoing advancement of deep learning technologies, GTSegNet is expected to be widely applied in more remote sensing image analysis tasks, providing more efficient and accurate solutions for large-scale ocean monitoring.

Author Contributions

Conceptualization, F.W. and S.Z.; methodology, Y.Z.; software, Y.Z. and Y.L.; validation, F.W., Y.H. and Z.C.; formal analysis, F.W. and Z.C.; investigation, Z.L.; resources, F.W.; data curation, Y.H. and H.Y.; writing—original draft preparation, Y.Z.; writing—review and editing, F.W.; visualization, Y.Z.; supervision, S.Z.; project administration, H.Y.; funding acquisition, Y.L. and P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (NSFC), grant number 42502295, and the Doctoral Research Initiation Fund Project of Liaoning Province, grant number 2025-BS-0770.

Data Availability Statement

The datasets generated during the study are available from the corresponding author upon reasonable request.

Acknowledgments

The authors would like to sincerely thank the editors and the anonymous reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wessel, P.; Smith, W.H. A global, self-consistent, hierarchical, high-resolution shoreline database. J. Geophys. Res. Solid Earth 1996, 101, 8741–8743. [Google Scholar] [CrossRef]
Derolez, V.; Bec, B.; Munaron, D.; Fiandrino, A.; Pete, R.; Simier, M.; Souchu, P.; Laugier, T.; Aliaume, C.; Malet, N. Recovery trajectories following the reduction of urban nutrient inputs along the eutrophication gradient in French Mediterranean lagoons. Ocean Coast. Manag. 2019, 171, 1–10. [Google Scholar] [CrossRef]
Genz, A.S.; Fletcher, C.H.; Dunn, R.A.; Frazer, L.N.; Rooney, J.J. The predictive accuracy of shoreline change rate methods and alongshore beach variation on Maui, Hawaii. J. Coast. Res. 2007, 23, 87–105. [Google Scholar] [CrossRef]
Suárez-de Vivero, J.L.; Mateos, J.C.R.; del Corral, D.F.; Barragán, M.J.; Calado, H.; Kjellevold, M.; Miasik, E.J. Food security and maritime security: A new challenge for the European Union’s ocean policy. Mar. Policy 2019, 108, 103640. [Google Scholar] [CrossRef]
Brown, C.J.; Smith, S.J.; Lawton, P.; Anderson, J.T. Benthic habitat mapping: A review of progress towards improved understanding of the spatial ecology of the seafloor using acoustic techniques. Estuar. Coast. Shelf Sci. 2011, 92, 502–520. [Google Scholar] [CrossRef]
Duffy, J.E. Biodiversity and the functioning of seagrass ecosystems. Mar. Ecol. Prog. Ser. 2006, 311, 233–250. [Google Scholar] [CrossRef]
Sannigrahi, S.; Joshi, P.K.; Keesstra, S.; Paul, S.K.; Sen, S.; Roy, P.; Chakraborti, S.; Bhatt, S. Evaluating landscape capacity to provide spatially explicit valued ecosystem services for sustainable coastal resource management. Ocean Coast. Manag. 2019, 182, 104918. [Google Scholar] [CrossRef]
McEvoy, S.; Haasnoot, M.; Biesbroek, R. How are European countries planning for sea level rise? Ocean Coast. Manag. 2021, 203, 105512. [Google Scholar] [CrossRef]
Chavez, P.S., Jr. An improved dark-object subtraction technique for atmospheric scattering correction of multispectral data. Remote Sens. Environ. 1988, 24, 459–479. [Google Scholar] [CrossRef]
Geng, X.; Jiao, L.; Li, L.; Liu, F.; Liu, X.; Yang, S.; Zhang, X. Multisource joint representation learning fusion classification for remote sensing images. IEEE Trans. Geosci. Remote Sens. 2023, 61, 4406414. [Google Scholar] [CrossRef]
Wang, J.; Wang, S.; Zou, D.; Chen, H.; Zhong, R.; Li, H.; Zhou, W.; Yan, K. Social network and bibliometric analysis of unmanned aerial vehicle remote sensing applications from 2010 to 2021. Remote Sens. 2021, 13, 2912. [Google Scholar] [CrossRef]
Kim, J.-I.; Kim, H.-C.; Kim, T. Robust mosaicking of lightweight UAV images using hybrid image transformation modeling. Remote Sens. 2020, 12, 1002. [Google Scholar] [CrossRef]
Gens, R. Remote sensing of coastlines: Detection, extraction and monitoring. Int. J. Remote Sens. 2010, 31, 1819–1836. [Google Scholar] [CrossRef]
McFeeters, S.K. The use of the Normalized Difference Water Index (NDWI) in the delineation of open water features. Int. J. Remote Sens. 1996, 17, 1425–1432. [Google Scholar] [CrossRef]
Alesheikh, A.A.; Ghorbanali, A.; Nouri, N. Coastline change detection using remote sensing. Int. J. Environ. Sci. Technol. 2007, 4, 61–66. [Google Scholar] [CrossRef]
Kass, M.; Witkin, A.; Terzopoulos, D. Snakes: Active contour models. Int. J. Comput. Vis. 1988, 1, 321–331. [Google Scholar] [CrossRef]
van der Werff, H.M. Mapping shoreline indicators on a sandy beach with supervised edge detection of soil moisture differences. Int. J. Appl. Earth Obs. Geoinf. 2019, 74, 231–238. [Google Scholar] [CrossRef]
Zhang, T.; Yang, X.; Hu, S.; Su, F. Extraction of coastline in aquaculture coast from multispectral remote sensing images: Object-based region growing integrating edge detection. Remote Sens. 2013, 5, 4470–4487. [Google Scholar] [CrossRef]
Sekovski, I.; Stecchi, F.; Mancini, F.; Del Rio, L. Image classification methods applied to shoreline extraction on very high-resolution multispectral imagery. Int. J. Remote Sens. 2014, 35, 3556–3578. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Yuan, X.; Shi, J.; Gu, L. A review of deep learning methods for semantic segmentation of remote sensing imagery. Expert Syst. Appl. 2021, 169, 114417. [Google Scholar] [CrossRef]
Seale, C.; Redfern, T.; Chatfield, P.; Luo, C.; Dempsey, K. Coastline detection in satellite imagery: A deep learning approach on new benchmark data. Remote Sens. Environ. 2022, 278, 113044. [Google Scholar] [CrossRef]
Aghdami-Nia, M.; Shah-Hosseini, R.; Rostami, A.; Homayouni, S. Automatic coastline extraction through enhanced sea-land segmentation by modifying Standard U-Net. Int. J. Appl. Earth Obs. Geoinf. 2022, 109, 102785. [Google Scholar] [CrossRef]
Sun, S.; Mu, L.; Feng, R.; Chen, Y.; Han, W. Quadtree decomposition-based Deep learning method for multiscale coastline extraction with high-resolution remote sensing imagery. Sci. Remote Sens. 2024, 9, 100112. [Google Scholar] [CrossRef]
O’Sullivan, C.; Kashyap, A.; Coveney, S.; Monteys, X.; Dev, S. Enhancing coastal water body segmentation with landsat irish coastal segmentation (lics) dataset. Remote Sens. Appl. Soc. Environ. 2024, 36, 101276. [Google Scholar] [CrossRef]
Niu, Z.; Zhong, G.; Yu, H. A review on the attention mechanism of deep learning. Neurocomputing 2021, 452, 48–62. [Google Scholar] [CrossRef]
Elizar, E.; Zulkifley, M.A.; Muharar, R.; Zaman, M.H.M.; Mustaza, S.M. A review on multiscale-deep-learning applications. Sensors 2022, 22, 7384. [Google Scholar] [CrossRef]
Xu, L.; Ren, J.; Yan, Q.; Liao, R.; Jia, J. Deep edge-aware filters. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 1669–1678. [Google Scholar]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual attention network for scene segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 3146–3154. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Liu, R.; Tao, F.; Liu, X.; Na, J.; Leng, H.; Wu, J.; Zhou, T. RAANet: A residual ASPP with attention framework for semantic segmentation of high-resolution remote sensing images. Remote Sens. 2022, 14, 3109. [Google Scholar] [CrossRef]
Chen, Y.; Yang, Z.; Zhang, L.; Cai, W. A semi-supervised boundary segmentation network for remote sensing images. Sci. Rep. 2025, 15, 2007. [Google Scholar] [CrossRef]
Yuan, K.; Meng, G.; Cheng, D.; Bai, J.; Xiang, S.; Pan, C. Efficient cloud detection in remote sensing images using edge-aware segmentation network and easy-to-hard training strategy. In Proceedings of the 2017 IEEE International Conference on Image Processing (ICIP), Beijing, China, 17–20 September 2017; pp. 61–65. [Google Scholar]
Koonce, B. ResNet 50. In Convolutional Neural Networks with Swift for TensorFlow; Apress: Berkeley, CA, USA, 2021; pp. 63–72. [Google Scholar]
Zhao, Q.; Yu, L.; Li, X.; Peng, D.; Zhang, Y.; Gong, P. Progress and trends in the application of Google Earth and Google Earth Engine. Remote Sens. 2021, 13, 3778. [Google Scholar] [CrossRef]
Mutanga, O.; Kumar, L. Google earth engine applications. Remote Sens. 2019, 11, 591. [Google Scholar] [CrossRef]
Hoerber, T.C. The European Space Agency and the European Union: The next step on the road to the stars. J. Contemp. Eur. Res. 2009, 5, 405–414. [Google Scholar] [CrossRef]
Li, J.; Wu, Z.; Hu, R.; Wang, X. Spatial-Temporal Approach and Dataset for Enhancing Cloud Detection in Sentinel-2 Imagery: A Case Study in China. Remote Sens. 2024, 16, 973. [Google Scholar]
He, T.; Liang, S.; Song, D.-X. Spatio-temporal differences in cloud cover of Landsat-8 OLI observations across China during 2013–2016. J. Geogr. Sci. 2018, 28, 429–444. [Google Scholar] [CrossRef]
Li, J.; Wang, L.; Liu, S.; Peng, B.; Ye, H. An automatic cloud detection model for Sentinel-2 imagery based on Google Earth Engine. Remote Sens. Lett. 2022, 13, 196–206. [Google Scholar] [CrossRef]
Evans, A.N.; Liu, X.U. A morphological gradient approach to color edge detection. IEEE Trans. Image Process. 2006, 15, 1454–1463. [Google Scholar] [CrossRef]
Nagi, J.; Ducatelle, F.; Di Caro, G.A.; Cireşan, D.; Meier, U.; Giusti, A.; Nagi, F.; Schmidhuber, J.; Gambardella, L.M. Max-pooling convolutional neural networks for vision-based hand gesture recognition. In Proceedings of the 2011 IEEE International Conference on Signal and Image Processing Applications (ICSIPA), Kuala Lumpur, Malaysia, 16–18 November 2011; pp. 342–347. [Google Scholar]
Li, C.; Li, L.; Qi, J. A self-attentive model with gate mechanism for spoken language understanding. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 3824–3833. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Fagin, R.; Kumar, R.; Sivakumar, D. Comparing top k lists. SIAM J. Discrete Math. 2003, 17, 134–160. [Google Scholar] [CrossRef]
Rahman, M.A.; Wang, Y. Optimizing intersection-over-union in deep neural networks for image segmentation. In Proceedings of the International Symposium on Visual Computing, Las Vegas, NV, USA, 12–14 December 2016; pp. 234–244. [Google Scholar]
Ye, S.; Pontius, R.G., Jr.; Rakshit, R. A review of accuracy assessment for object-based image analysis: From per-pixel to per-polygon approaches. ISPRS J. Photogramm. Remote Sens. 2018, 141, 137–147. [Google Scholar] [CrossRef]
Wang, Y.; Yang, L.; Liu, X.; Yan, P. An improved semantic segmentation algorithm for high-resolution remote sensing images based on DeepLabv3+. Sci. Rep. 2024, 14, 9716. [Google Scholar] [CrossRef]
Cheng, X.; Lei, H. Semantic segmentation of remote sensing imagery based on multiscale deformable CNN and DenseCRF. Remote Sens. 2023, 15, 1229. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Chen, L.-C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
Yan, S.; Wu, C.; Wang, L.; Xu, F.; An, L.; Guo, K.; Liu, Y. Ddrnet: Depth map denoising and refinement for consumer depth cameras using cascaded cnns. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 151–167. [Google Scholar]
Yu, C.; Gao, C.; Wang, J.; Yu, G.; Shen, C.; Sang, N. Bisenet v2: Bilateral network with guided aggregation for real-time semantic segmentation. Int. J. Comput. Vis. 2021, 129, 3051–3068. [Google Scholar] [CrossRef]
Thisanke, H.; Deshan, C.; Chamith, K.; Seneviratne, S.; Vidanaarachchi, R.; Herath, D. Semantic segmentation using vision transformers: A survey. Eng. Appl. Artif. Intell. 2023, 126, 106669. [Google Scholar] [CrossRef]
Cheng, B.; Misra, I.; Schwing, A.G.; Kirillov, A.; Girdhar, R. Masked-attention mask transformer for universal image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 1290–1299. [Google Scholar]
Smith, L.N. Cyclical learning rates for training neural networks. In Proceedings of the 2017 IEEE Winter Conference on Applications of Computer Vision (WACV), Santa Rosa, CA, USA, 24–31 March 2017; pp. 464–472. [Google Scholar]
Zhang, Z. Improved adam optimizer for deep neural networks. In Proceedings of the 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), Banff, AB, Canada, 4–6 June 2018; pp. 1–2. [Google Scholar]
Gower, R.M.; Loizou, N.; Qian, X.; Sailanbayev, A.; Shulgin, E.; Richtárik, P. SGD: General analysis and improved rates. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 5200–5209. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Ward, R.; Wu, X.; Bottou, L. Adagrad stepsizes: Sharp convergence over nonconvex landscapes. J. Mach. Learn. Res. 2020, 21, 1–30. [Google Scholar]
Zhou, P.; Xie, X.; Lin, Z.; Yan, S. Towards understanding convergence and generalization of AdamW. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6486–6493. [Google Scholar] [CrossRef] [PubMed]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Gesnouin, J.; Pechberti, S.; Stanciulescu, B.; Moutarde, F. Assessing cross-dataset generalization of pedestrian crossing predictors. In Proceedings of the 2022 IEEE Intelligent Vehicles Symposium (IV), Aachen, Germany, 4–9 June 2022; pp. 419–426. [Google Scholar]
Yang, T.; Jiang, S.; Hong, Z.; Zhang, Y.; Han, Y.; Zhou, R.; Wang, J.; Yang, S.; Tong, X.; Kuc, T.-y. Sea-land segmentation using deep learning techniques for landsat-8 OLI imagery. Mar. Geod. 2020, 43, 105–133. [Google Scholar] [CrossRef]
Feyisa, G.L.; Meilby, H.; Fensholt, R.; Proud, S.R. Automated Water Extraction Index (AWEI): A new technique for surface water mapping using Landsat imagery. Remote Sens. Environ. 2014, 140, 23–35. [Google Scholar] [CrossRef]
Pardo-Pascual, J.E.; Almonacid-Caballer, J.; Ruiz, L.A.; Palomar-Vázquez, J. Automatic extraction of shorelines from Landsat TM and ETM+ multi-temporal images with subpixel precision. Remote Sens. Environ. 2012, 123, 1–11. [Google Scholar] [CrossRef]
Ling, J.; Zhang, H.; Lin, Y. Improving Urban Land Cover Classification in Cloud-Prone Areas with Polarimetric SAR Images. Remote Sens. 2021, 13, 4708. [Google Scholar] [CrossRef]
Gens, R. Oceanographic applications of SAR remote sensing. Giscience Remote Sens. 2008, 45, 275–305. [Google Scholar] [CrossRef]
Passarello, G.; Filippi, F.; Sorbello, F. Coastline extraction using SAR images and deep learning. In IEEE International Geoscience and Remote Sensing Symposium (IGARSS); IEEE: New York, NY, USA, 2024; pp. 1012–1015. [Google Scholar]
Musthafa, S.M.; Dwarakish, G.S. Application of SAR-Optical fusion to extract shoreline position from Cloud-Contaminated satellite images. Remote Sens. Appl. Soc. Environ. 2020, 20, 100416. [Google Scholar]
Tajima, Y.; Saruwatari, Y.; Arikawa, S. An automatic shoreline extraction method from SAR imagery using DeepLab-v3+ and its versatility. Coast. Eng. J. 2024, 67, 106–118. [Google Scholar]
Yu, H.; Wang, F.; Hou, Y.; Guo, J. MSARG-Net: A Multimodal Offshore Floating Raft Aquaculture Area Extraction Network for Remote Sensing Images Based on Multiscale SAR Guidance. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2024, 17, 18319–18334. [Google Scholar] [CrossRef]

Figure 1. Overview of the study area. In the figure, the Zhoushan Archipelago (T1,T3) in Zhejiang, Hong Kong (T2), Changhai County (T4) in Liaoning, Pingtan Island (T5) in Fujian, Yantai (T6) in Shandong, the Taiwan Strait (T7), Offshore Shantou (T8) in Guangdong, and Weizhou Island (T9) in Guangxi.

Figure 2. Examples of images from the constructed S2_China_Islands_2024 dataset.

Figure 3. Overall structure of GTSegNet, as developed in this study.

Figure 4. Topology-Aware Refinement Module (TARM) structure.

Figure 5. Graph Context Block (GCB).

Figure 6. Visual comparison of island coastline extraction results between GTSegNet and other mainstream networks (red areas represent islands, black areas represent background, and yellow rectangular boxes highlight key comparison areas). In the image, (a–h) depict various scenes.

Figure 7. Comparison of the complexity and inference efficiency of different network models.

Figure 8. Comparison among the prediction results obtained by GTSegNet when trained on with different image modalities, with the red rectangular box as the focused comparison area (red areas represent islands, black areas represent background, and yellow, white rectangular boxes highlight key comparison areas). In the image, (a–h) depict various scenes.

Figure 9. Visualization of the ablation experiment results of GTSegNet. In the label images, black represents the background area, and red represents the island area. In the last four rows of the heatmap, brighter regions indicate areas that have a greater impact on the model’s decision-making process. The transition from blue to red represents an increase in the model’s attention to the corresponding regions. In the image, (a–h) depict various scenes.

Figure 10. Island extraction results for certain islands in Liaoning, Guangdong, and Fujian provinces in 2020 and 2025.

Figure 11. The performance of GTSegNet on the Landsat-8 OLI land–water segmentation dataset ((a,c) is the RGB image, and (b,d) is the segmentation result by GTSegNet).

Table 1. Quantitative evaluation of the island coastline extraction results of different models on the S2_China_Islands_2024 dataset.

Method	mIoU (%)	mPA (%)	mPrecision (%)	mRecall (%)
UNet	92.84	96.61	95.95	96.61
DeepLabv3+	93.46	96.83	96.40	96.83
DDRNet	93.54	96.93	96.39	96.93
BiSeNet	93.36	96.78	96.33	96.78
PSPNet	93.70	96.87	96.60	96.87
SegFormer	93.53	96.80	96.47	96.80
Mask2Former	93.62	96.84	96.52	96.84
GTSegNet	96.96	98.54	98.37	98.54

The best performance is marked in bold.

Table 2. Comparison of model complexity (Parameters) and inference speed (FPS) among different methods.

Method	Params (M)	FPS	Infer (ms)
Unet	30.7	127	7.87
DeepLabv3+	43.7	91	10.99
DDRNet	6.13	207	4.83
BiSeNet	3.34	291	3.44
PSPNet	42.5	101	9.90
SegFormer	24.2	143	6.99
Mask2Former	47.4	83	12.05
GTSegNet	46.7	92	10.87

Table 3. Accuracy changes of mIoU, mPA, mPrecision, and mRecall under different learning rates.

Learning Rate	mIoU (%)	mPA (%)	mPrecision (%)	mRecall (%)
0.0004	95.82	97.96	97.84	97.68
0.0003	96.34	98.12	98.06	98.02
0.0002	96.75	98.37	98.21	98.21
0.0001	96.96	98.54	98.37	98.54

The best performance is marked in bold.

Table 4. Accuracy changes of mIoU, mPA, mPrecision, and mRecall under different Optimizers.

Learning Rate	mIoU (%)	mPA (%)	mPrecision (%)	mRecall (%)
AdaGrad	95.48	97.36	97.18	97.22
SGD	95.91	97.84	97.63	97.79
Adam	96.42	98.12	98.03	98.10
AdamW	96.96	98.54	98.37	98.54

The best performance is marked in bold.

Table 5. Results of the ablation experiments on each module in GTSegNet.

Name	mIoU (%)	mPA (%)	mPrecision (%)	mRecall (%)
Baseline	93.70	96.87	96.60	96.87
Baseline + TARM	95.66	97.81	97.74	97.81
Baseline + GCB	94.95	97.56	97.25	97.56
Baseline + TARM + GCB	96.96	98.54	98.37	98.54

Table 6. Comparison of GTSegNet with the models in the Landsat-8 OLI land–water segmentation dataset article.

Method	Accuracy (%)	Recall (%)	mIoU (%)
RefineNet	99.04	99.05	92.42
FC-DenseNet	99.55	99.55	92.72
DeepLabV3+	99.40	99.40	92.98
PSPNet	99.50	99.51	92.63
SegNet	98.64	98.64	91.21
U-Net	99.38	99.38	92.79
GTSegNet (ours)	98.91	98.35	96.66

The best performance is marked in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhu, Y.; Wang, F.; Hou, Y.; Cui, Z.; Yu, H.; Zhang, S.; Liao, Z.; Li, P.; Lu, Y. GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy. Remote Sens. 2026, 18, 607. https://doi.org/10.3390/rs18040607

AMA Style

Zhu Y, Wang F, Hou Y, Cui Z, Yu H, Zhang S, Liao Z, Li P, Lu Y. GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy. Remote Sensing. 2026; 18(4):607. https://doi.org/10.3390/rs18040607

Chicago/Turabian Style

Zhu, Yuanyi, Fangxiong Wang, Yingzi Hou, Zhenqi Cui, Haomiao Yu, Shuai Zhang, Zhiying Liao, Peng Li, and Yi Lu. 2026. "GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy" Remote Sensing 18, no. 4: 607. https://doi.org/10.3390/rs18040607

APA Style

Zhu, Y., Wang, F., Hou, Y., Cui, Z., Yu, H., Zhang, S., Liao, Z., Li, P., & Lu, Y. (2026). GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy. Remote Sensing, 18(4), 607. https://doi.org/10.3390/rs18040607

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

GTSegNet: An Island Coastline Segmentation Model Based on Collaborative Perception Strategy

Highlights

Abstract

1. Introduction

2. Materials

2.1. Study Area

2.2. S2_China_Islands_2024 Dataset and Data Processing Strategy

3. Methodology

3.1. Overall GTSegNet Architecture

3.2. Topology-Aware Refinement Module

3.3. Graph Context Block

3.4. Implementation Details

3.5. Evaluation Metrics

4. Results

4.1. Comparative Experiments

4.2. Model Complexity and Inference Efficiency Analysis

4.3. GTSegNet Training Parameter Analysis and Stability Verification

4.4. Ablation Study

5. Discussion

5.1. Model Applications

5.2. Generalization Ability of GTSegNet

5.3. Potential for Multi-Modal Extension

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI