Abstract
Airborne LiDAR bathymetry (ALB) provides dense three-dimensional point clouds that enable the detailed mapping of tidal flat environments. However, surface classification using these point clouds remains challenging due to residual noise, water surface reflectivity, and subtle class boundaries that persist even after standard preprocessing. To address these challenges, this study introduces Tidal flat-Attentive Graph Network (TAGNet), a graph-based deep learning framework designed to leverage both local geometric relationships and global contextual cues for the point-wise classification of tidal flat surface classes. The model incorporates multi-scale EdgeConv layers for capturing fine-grained neighborhood structures and employs squeeze-and-excitation channel attention to enhance global feature representation. To validate TAGNet’s effectiveness, classification was conducted on ALB point clouds collected from adjacent tidal flat regions, focusing on four major surface classes: exposed flat, sea surface, sea floor, and vegetation. In benchmarking tests against baseline models, including Dynamic Graph Convolutional Neural Network, PointNeXt with Single-Scale Grouping, and PointNet Transformer, TAGNet consistently achieved higher macro F1-scores. Moreover, ablation studies isolating positional encoding, attention mechanisms, and detrended Z-features confirmed their complementary contributions to TAGNet’s performance. Notably, the full TAGNet outperformed all baselines by a substantial margin, particularly when distinguishing closely related classes, such as sea floor and exposed flat. These findings highlight the potential of graph-based architectures specifically designed for ALB data in enhancing the precision of coastal monitoring and habitat mapping.
1. Introduction
Tidal flats and shallow coastal waters are highly dynamic environments that serve as critical interfaces between terrestrial and aquatic ecosystems. They provide a diverse range of ecosystem services, supporting biodiversity enhancing nutrient cycling, and storing carbon, and are nursery habitat for many marine species [,]. They also protect coastal areas by buffering against storm surges and mitigating erosion. Given these ecological and socio-economic functions, the accurate and timely mapping of intertidal zones is essential for effective coastal zone management, marine spatial planning, and disaster response []. As such mapping applications are accomplished using remote sensing data, distinguishing between land, water, and submerged seabed surface classes in these areas is fundamental for understanding geomorphological changes, tidal inundation patterns, and habitat distribution [,]. Additionally, elevation gradients are typically gradual in coastal environments, and the optical characteristics of surfaces can vary considerably due to water’s reflectivity, sediment suspension, and vegetation cover, making such separation difficult. Misclassification in these transitional zones can introduce substantial error into digital elevation models (DEMs) and tidal inundation maps, affecting ecosystem monitoring. Accurate delineation of these surface types is further complicated by mixed returns and waveform overlap, both of which are especially pernicious in shallow waters and complex intertidal areas. Consequently, effective classification requires high-resolution, multi-dimensional datasets that capture the subtle topographic changes and parse the reflectance effects characteristic of transitional zones.
Among available remote sensing technologies, airborne LiDAR bathymetry (ALB) has proven to be an invaluable tool for capturing high-resolution three-dimensional (3D) data in coastal zones, particularly in shallow and complex nearshore environments where other sensor types, such as optical or radar systems, are hindered by turbidity, wave disturbance, or the lack of surface features []. The ALB method is especially effective for integrated sea–land surveys over reefs, beaches, and small islands [,]. Compared to other sensors, ALB systems can penetrate shallow water and simultaneously acquire returns from land, water, and submerged terrain surfaces, producing dense point-cloud data with waveform-derived attributes, such as amplitude, width, and echo order []. These attributes have proven useful for distinguishing surface types in complex coastal environments.
However, classifying ALB point clouds in shallow-water or intertidal zones remains challenging due to both environmental complexity and sensor-related limitations, and it is also operationally demanding, as large data volumes and hybrid workflows combining automated and manual correction require substantial time and computational effort. The point clouds often contain ambiguous or mixed returns caused by shallow water dynamics, sea surface reflectivity, and low-relief seabed morphologies. To address these issues, previous studies have predominantly relied on waveform-derived features, such as amplitude, pulse width, and echo order, to infer surface types [,]. Although these attributes offer important cues, their limited feature dimensionality and sensitivity to external factors like depth, turbidity, and reflectance often reduce classification accuracy. Thus, point-wise learning techniques that can directly leverage geometric and spatial-contextual cues from raw ALB point cloud data may offer improvement.
To overcome the limitations of waveform-only approaches, several studies have adopted point cloud-based classification frameworks that leverage the full 3D structure captured by ALB systems. For instance, Tysiac [] assessed coastal erosion in the Baltic Sea using the CANUPO classifier by exploiting geometric descriptors of underwater terrain, and Wagner et al. [] employed multiple features, including the echo ratio, point density, and elevation, to distinguish submerged macrophytes from other surfaces, differentiating between vegetation, water, and seafloor surface types with high accuracy. Lowell and Calder [] proposed a hybrid bathymetric surface identification method that integrates pulse attributes with density-based measures using machine learning techniques. Similarly, Janowski et al. [] introduced an automated LiDAR bathymetry classification pipeline that combines rule-based analysis and statistical features with object-based image analysis to map seabed geomorphology in shallow coastal zones. Collectively, these studies highlight the advantages of ALB point cloud classification methods in capturing local topographic structure, while also revealing the constraints of handcrafted feature engineering in complex intertidal environments.
Several hybrid approaches have sought to further enhance classification accuracy by integrating ALB point clouds with complementary data sources. For example, Guo et al. [] developed a water–land classification method that combines waveform feature statistics with point cloud neighborhood analysis, improving the identification of sea surface and inland water boundaries. Su et al. (2025) [] proposed a hybrid seabed sediment-classification framework that fuses ALB data with multispectral satellite imagery at the feature level, using a SIFT-RANSAC model for geometric alignment and a dual-branch convolutional neural network (CNN) for classification. This model effectively differentiated sediments into five categories. While these methods demonstrate notable performance gains, they remain dependent on precise sensor alignment and data-specific configurations.
In summary, although point cloud-based and hybrid frameworks have improved surface classification in shallow-water environments, their reliance on handcrafted features, scene-specific tuning, and complex registration processes limits their scalability and generalizability across diverse intertidal settings. To overcome these challenges, TAGNet (Tidal flat–Attentive Graph Network) was developed as a graph-based deep learning framework that directly learns discriminative representations from raw ALB point clouds through a multi-attention graph architecture. By eliminating dependence on auxiliary sensor inputs and manual feature engineering, TAGNet demonstrates robust performance under residual geometric noise, reflective artifacts, spectral ambiguity, and shallow-relief conditions characteristic of tidal flats. In addition, its modular design enables seamless integration with future waveform or multispectral data formats while maintaining strong standalone capability.
TAGNet was evaluated using ALB point clouds acquired from eight tidal-flat regions in Korea. A region-wise leave-one-region-out cross-validation scheme was employed to assess spatial generalizability, and TAGNet was compared with representative baseline networks and ablation variants to validate its effectiveness.
2. Materials and Methods
2.1. Study Area
The study site was situated in the Hwangdo tidal flat region of Korea’s west coast, which hosts a diverse range of coastal environments, including exposed flats, vegetated areas, and shallow waters. This area was selected as a representative site for evaluating the model’s classification performance when applied to ALB data from complex intertidal zones.
To enable a region-wise performance evaluation and assess the spatial generalizability of the classification model, the study area was divided into eight rectangular subregions (subregions 1–8), considering both geomorphic continuity and class distribution. The subregions encompassed a broad range of topographic and environmental characteristics, such as vegetated land near the shoreline, intertidal flats with various sediment types, and submerged areas with varying depths. This region-wise setup enabled the detailed evaluation of model performance under heterogeneous conditions, particularly at ambiguous class boundaries, such as those between the sea surface and seafloor classes. Figure 1 shows the subregion boundaries and class distributions overlaid on an orthophoto basemap of the study area. To effectively visualize the spatial distributions of the labeled classes, a random subset of 10,000 points was sampled from the full dataset.
Figure 1.
Study area in the Hwangdo tidal flat region. Eight subregions were defined to evaluate spatial generalization. Across these subregions, points were randomly sampled (n = 10,000) from the full annotated ALB point cloud dataset for display purposes. Point classifications are denoted by color.
2.2. Airborne LiDAR Bathymetry Data and Point Cloud Labeling
The ALB dataset used in this study was acquired using the Chiroptera-5 sensor manufactured by Leica Geosystems AG (Heerbrugg, Switzerland).The sensor operates at pulse repetition frequencies of 500 kHz in topographic mode and 200 kHz in bathymetric mode, with a scan rate of 4200 rpm and a field of view of 40°, using a circular scanning pattern to ensure uniform point distributions across swaths. The system provides horizontal and vertical accuracies of approximately 15 and 5 cm, respectively. Data acquisition took place on 10 November 2024, between 14:00 and 16:10 KST during the ebb tide under neap tidal conditions. The flights were conducted at an altitude of 500 m (1640 ft) and a speed of approximately 250 km/h (135 knots).
The raw waveform data was processed into discrete return point clouds using Leica LiDAR Survey Studio (LSS, version 2.2), and subregion-specific parameters were manually adjusted to achieve optimal results for each local environment. Most preprocessing steps, such as geometric noise removal, outlier filtering, and density resampling, were automatically executed within the LSS pipeline, with additional manual corrections applied based on visual inspections of point intensity, surface slope, and spatial distribution.
However, residual artifacts caused by reflective water surfaces, beam-angle distortion, and local turbidity variations remained after preprocessing. These residual uncertainties were handled by TAGNet, ensuring realistic tidal conditions during training and evaluation to enhance the model’s robustness.
The final dataset comprised approximately 108 million points, with an average density of 17.9 points/m2. All points were manually classified into four primary categories: exposed flat, sea surface, sea floor, and vegetation. Although the LSS software provided an initial automatic classification, domain experts refined the labels using auxiliary data such as waveform intensity and orthophotos. The resulting annotated point clouds served as reliable ground truth data for training and evaluating supervised deep learning-based classification models. The overall class distribution was exposed flat (38.4%), sea surface (52.0%), sea floor (9.1%), and vegetation (0.6%), indicating that the small proportion of vegetation points reflects their limited spatial extent within the upper intertidal zone. This dataset was obtained through a research collaboration with a government agency responsible for marine and coastal monitoring. Due to institutional data-sharing restrictions, the raw point cloud data cannot be publicly released; however, the statistical summaries provided in Appendix A (Table A1) ensure that the characteristics of the dataset used in this study can be fully understood.
2.3. Proposed Architecture: TAGNet
The proposed model, TAGNet, is a dynamic, graph-based architecture designed for the point-wise classification of ALB point clouds (Figure 2). Building upon the principle of the dynamic graph convolutional neural network (DGCNN) [], TAGNet introduces two key innovations: a feature augmentation strategy that enriches raw 3D coordinates with geometry-aware descriptors and a global attention mechanism that enhances inter-class discrimination in tidal environments, where sea and seafloor surfaces often exhibit geometric similarity. In contrast to generic graph-based networks, TAGNet is specifically tailored to address the challenges of tidal-flat environments. Unlike urban or terrestrial settings, tidal flats contain smooth water surfaces, submerged seafloors, exposed mudflats, and sparse vegetation within the same scan, resulting in geometrically similar yet semantically distinct classes. Existing models such as PointNet [], PointNet++ [], and DGCNN often underperform in these scenarios due to limited feature expressiveness and weak inter-class discrimination. By integrating tidal-flat–aware feature augmentation with SE-based global channel attention, TAGNet achieves robust classification performance, even under specular reflection and variable bathymetric gradients.
Figure 2.
Overall architecture of the proposed model, Tidal flat-Attentive Graph Network (TAGNet). Colored dots indicate the point classes for visualization purposes.
Let the input point cloud be defined as
where N denotes the total number of points. For each point , an augmented feature vector is constructed as
where are normalized coordinates; denote the mean, standard deviation (SD), and range of the z values (elevations)of the k-nearest neighbors (k-NNs) of point (k = 10 for local statistics); and PE (·) represents the sinusoidal positional encoding (PE) function, which is calculated for and values with four frequencies—including , with i = 0, …, 3—for both sine and cosine, totaling 16 channels, and scaled by 0.1 for stability. This augmentation yields an input dimension of d = 8 (base: ), d = 24 (base + PE), or d = 25 (base + PE + z − ), depending on the ablation setting. Unlike generic feature embeddings, these descriptors are specifically designed for tidal flats: normalized planar coordinates and positional encodings provide spatial context for separating sea surface points from exposed flat points, while local elevation (Z) statistics enhance sensitivity to subtle bathymetric gradients, enabling effective discrimination between sea surface and seafloor points. Furthermore, the local elevation range highlights vertical irregularity, which is essential for identifying the sparse vegetation points.
To capture local geometric relationships, TAGNet employs dynamic graph construction, which allows neighborhood relationships to be updated as the feature space evolves. Four EdgeConv modules, with output channel sizes of 64, 64, 64, and 1024, respectively, are stacked within the model, and their outputs are concatenated to form a 1216-D local multi-scale descriptor (. For each point , a neighborhood is defined by its k-NNs, with k = 20. The first EdgeConv module uses Euclidean k-NNs in (XYZ space), and subsequent EdgeConv modules update the k-NN value to the current feature space. Edge features are computed as
where indicates a shared multilayer perceptron (MLP). Point-wise feature aggregation is then performed via max pooling:
In parallel, a convolution maps the local tensor from 1216 to 1024 channels, followed by max pooling over points to obtain a global descriptor . To emphasize informative channels and suppress redundant ones, a squeeze-and-excitation (SE) module [] with a reduction ratio of 16 is applied as
where and denote the ReLU and sigmoid functions, respectively, and represents element-wise multiplication. The SE module adaptively recalibrates channel-wise feature responses by modeling global interdependencies, allowing the network to highlight discriminative information and suppress noise. It compresses the global descriptor into a 64-dimensional bottleneck and then re-expands it to 1024 dimensions, using and . The refined global feature, , is then broadcast and concatenated with the local descriptor, yielding a 2240-D fused, per-point representation (1216 + 1024).
The fused feature is fed to a lightweight point-wise classifier, consisting of a convolution, batch normalization (BN), and ReLU activation, followed by a dropout layer (rate = 0.5). The classifier structure can, thus, be summarized as
where corresponds to the four semantic classes {exposed flat, sea surface, sea floor, vegetation}. Final per-point predictions are obtained using Softmax activation, as expressed in Equation (7):
A layer-by-layer summary is provided in Appendix B (Table A2).
In summary, TAGNet integrates feature augmentation, dynamic graph convolutions, and global channel attention into a unified framework, making it particularly effective for ALB applications in coastal and tidal flat monitoring, where land–water boundaries are often ambiguous.
2.4. Model Benchmarking
To evaluate TAGNet, a region-wise leave-one-region-out cross-validation scheme was applied to the eight tidal flat survey subregions. In each step of the validation process, one subregion was excluded for testing, while the remaining seven were used for training. For example, when Subregion 1 served as the test set, the network was trained on subregions 2–8. This procedure was repeated until each subregion was used as the test set, and the final performance was reported as the average across all eight tests, thereby providing a reliable estimate of the model’s cross-region generalizability.
Model performance was assessed using class-wise precision, recall, and the F1-score, derived from the confusion matrix of each subregion. Given the pronounced class imbalance, the macro-averaged F1-score, defined as the unweighted mean of the per-class F1-scores, was adopted as the primary evaluation metric. Other measures, such as overall accuracy and IoU, were not considered, as they tend to obscure class-level variability in tidal flat environments.
To provide context, three state-of-the-art point cloud classification networks were selected as baselines for comparison: DGCNN, PointNeXt-SSG [], and PointNet Transformer []. These models were chosen as each represents one of the three major structural paradigms in 3D point learning, graph-based, hierarchical, and transformer-based structures, and being current, popular models used for ALB point clouds, they provide fair and reproducible standards for comparison. Additionally, all three architectures operate directly on XYZ inputs without RGB or multi-view information, do not rely on large-scale pretraining, and maintain comparable parameter sets and computational budgets, ensuring methodological consistency under the constraints imposed by the limitations inherent to ALB data, in where texture and multi-sensor cues are unavailable.
All models were trained using unweighted cross-entropy loss with the Adam optimizer. Unless otherwise specified, the initial learning rate was set to , and a ReduceLROnPlateau scheduler was applied based on the validation macro F1-score. Model selection was performed by retaining the checkpoint that achieved the highest macro F1-score during validation, with early termination triggered after 10 consecutive epochs without improvement. To ensure fair comparisons among methods, training was conducted with a batch size of 16 and 512 points per patch. In TAGNet, the neighborhood size was fixed at k = 20, and the input feature dimensionality followed the augmentation setting. Unless explicitly disabled in ablation experiments, TAGNet employed SE channel attention as the default global attention mechanism. All experiments were implemented in PyTorch 2.9.0+cu126 and executed on a single NVIDIA T4 GPU via Google Colab Pro.
2.5. Ablation Tests
To further assess the relative importance of each module, three ablation tests were conducted using the same region-wise leave-one-region-out protocol. In the first test, PE was disabled while all other modules were retained (creating the model “TAGNet-PE OFF”). This isolates the contribution of the sinusoidal encoding of the planar coordinates, which is particularly crucial for distinguishing between geometrically similar surfaces, such as the sea surface and exposed flats.
In the second test, the SE attention module was disabled, while other models, including PE, were preserved (“TAGNet-SE OFF”). Since SE reweights global channels to emphasize discriminative cues and suppress flat or noisy responses, this ablation test highlights the role of global channel attention in tidal flat classification.
In the third variant, Z-detrending was disabled (“TAGNet-Z-detrended OFF”), excluding the normalized elevation feature () from the augmentation process. This feature is designed to enhance subtle bathymetric gradients, making it especially useful for distinguishing the sea surface from the seafloor despite their overlapping absolute elevations. Unless modified by the specific ablation test, all training parameters were the same as those described in for model benchmarking. Model performance was reported as the region-wise macro F1-score, averaged across the eight held-out test subregions.
3. Results
3.1. Overall Performance of TAGNet Across Subregions
A region-wise leave-one-region-out protocol was employed to assess the generalizability of the models across eight tidal flat subregions. In each fold of the test, one subregion was reserved for testing while the remaining seven were used for training, and the F1-scores were averaged across all eight tests. Table 1 presents the per-class and overall F1-scores for TAGNet and the three baseline models.
Table 1.
Per-class and overall (macro) mean F1-scores (± SD) for three baseline networks and TAGNet, determined based on an eight-fold region-wise cross-validation protocol.
TAGNet consistently achieved the highest F1-scores across all surface classes, recording 0.78 ± 0.08 (±SD) for exposed flats, 0.78 ± 0.22 for sea surfaces, 0.78 ± 0.20 for the seafloor, and 0.66 ± 0.39 for vegetation. The relatively small deviations indicate that TAGNet maintains stable performance across heterogeneous tidal flat regions, even under varying bathymetric and surface conditions. In contrast, PointNeXt-SSG exhibited notably high SDs (up to 0.40), suggesting reduced robustness to spatial and environmental differences among the subregions. Compared with DGCNN, TAGNet again demonstrated substantial improvements, particularly for the exposed flat and seafloor classes, where geometric similarity often leads to misclassification. Lastly, PointNet Transformer achieved competitive results for the sea surface and seafloor classes but still lagged behind TAGNet. The overall macro F1-score of 0.69 ± 0.11 further confirms that TAGNet achieved the most balanced and reliable classification performance across the surface classes.
These findings show that integrating feature augmentation with global channel attention enabled TAGNet to capture subtle geometric and elevation cues, thereby achieving robust ALB point cloud classification in heterogeneous tidal flat environments. It should also be noted that the relatively low F1-score for the vegetation class is likely related to its limited proportion (~0.6%) and spatial sparseness within the upper intertidal zone, which constrained the diversity of training samples, rather than it being solely attributable to architectural or parameter-level factors.
Figure 3 visually represents the region-wise classification results for each subregion and model, along with the ground truth data. Comparing the model results to the ground truth data, it is evident that TAGNet not only produces clearer inter-class boundaries but also reduces misclassification.
Figure 3.
Subregion-wise classification results for the four tested models, along with ground truth data for comparison. Each row corresponds to one test subregion (from top to bottom: subregions 1–8), and columns represent the models: (a) ground truth data, (b) DGCNN, (c) PointNeXt-SSG, (d) PointNet Transformer, and (e) TAGNet.
The DGCNN captured local geometric structure to some extent, but the boundaries between the seafloor and sea surface remain indistinct; PointNeXt-SSG produced more stable predictions overall but introduced noisy misclassifications in vegetation-covered and exposed-flat areas; and PointNet Transformer, benefiting from its transformer-based design, yielded relatively sharp boundaries, but difficulties nonetheless persisted in distinguishing between the planar sea surface and seafloor classes. In contrast, TAGNet achieved the most consistent performance, with feature augmentation and SE attention supporting clearer inter-class boundaries. Notably, TAGNet provided robust discrimination between the seafloor and sea surface as well as reliable identification of the vegetation surface class. These results show that the proposed model maintains strong performance even in tidal flats. Region-wise comparisons further highlight TAGNet’s generalization ability across heterogeneous coastal environments.
To further examine the contribution of elevation (Z) information, and to avoid redundant visualizations across subregions, subregions 2 and 6 were selected for detailed cross-sectional analysis, as these areas exhibited significant overlap between the sea surface and seafloor classes in the XY plane while remaining distinguishable along the Z axis (Figure 4). Such conditions are particularly challenging for classification in tidal flats, where specular water reflections and flat terrain amplify ambiguities.
Figure 4.
Z–X cross-sectional comparisons of the classification results for subregions 2 (left) and 6 (right), including (a) the ground truth data and model results for (b) PointNet Transformer and (c) TAGNet.
For this comparison, PointNet Transformer and TAGNet were chosen. PointNet Transformer is a recently developed transformer-based model that exhibited a relatively competitive performance in the benchmark tests, whereas DGCNN and PointNeXt-SSG demonstrated clear limitations (Figure 3) and were, therefore, excluded from this analysis. This allowed a focused comparison between a state-of-the-art baseline model and the proposed model. The results indicate that PointNet Transformer struggles to delineate the boundary between the sea surface and the seafloor, with frequent mixing of the two classes along their interface. In contrast, owing to its feature augmentation and Z-detrending modules, which explicitly encode local elevation statistics, TAGNet produced a much clearer separation. Notably, even in Subregion 6, characterized by flat relief and water-induced reflection noise, TAGNet maintained a robust ability to discriminate between the two classes. These findings confirm that the proposed model effectively leverages vertical information to address one of the most critical challenges facing ALB-based surface classification in tidal flats.
3.2. Effects of Ablation on TAGNet’s Performance
An ablation study was conducted to quantify the contribution of the three main TAGNet components, the PE, SE attention, and Z-detrending modules. The evaluation followed the same region-wise leave-one-region-out protocol described in Section 3.1, with mean per-class F1-scores summarized in Table 2. The results indicate that the full TAGNet consistently outperformed all three ablated variants, confirming the complementary roles of the three modules. Disabling PE resulted in lower performances for sea surface and exposed flat identification, highlighting the importance of spatial encoding in distinguishing planar surfaces with minimal elevation differences. The absence of SE attention markedly reduced performance for the vegetation and exposed flat classes, suggesting that global channel reweighting is crucial for suppressing noisy responses and enhancing discriminative features. When Z-detrending was removed, the performance for the sea surface and seafloor classes declined, reflecting the importance of local elevation normalization in capturing subtle bathymetric gradients under conditions of overlapping absolute heights.
Table 2.
Per-class and overall (macro) mean F1-scores (averaged across the eight subregions; ±SD) for the full TAGNet model and its ablation variants.
It is also worth noting that the relatively lower F1-score for the vegetation class (Table 2) stems primarily from its small proportion and spatial sparsity within the dataset rather than from a specific architectural weakness, whereas the Sea floor class—despite its relatively small share—maintained stable accuracy, suggesting that performance degradation is primarily driven by extreme class scarcity rather than moderate class imbalance.
The relatively small SDs of all the models (mostly ± 0.1–0.3) show that maintains stable performance across heterogeneous tidal flat environments, though the ablated variants exhibit larger fluctuations, indicating reduced robustness. The macro F1 results further confirm that the full TAGNet model provides the most balanced overall performance, effectively integrating the three complementary modules into a stable and generalizable framework.
Overall, the ablation study demonstrates how each module contributes unique discriminative power, and their integration enables TAGNet to achieve balanced and robust classification performance in the heterogeneous tidal flat environment.
Figure 5 qualitatively illustrates the trends seen in Table 2. The PE OFF ablative variant produced blurred boundaries between water and exposed flats, the SE OFF variant generated fragmented vegetation patches and noisy predictions, and the Z-detrended OFF variant misclassified shallow seafloor points as surface water. In contrast, the full TAGNet yielded the most coherent and stable predictions across all subregions, with clearer class boundaries and reduced noise. The consistency between Table 2 and Figure 5 substantiates the complementary effects of PE, SE attention, and Z-detrending, confirming that their integration is essential for robust classification in tidal flats.
Figure 5.
Subregion-wise classification results for the TAGNet model and the three ablation variants, along with ground truth data for comparison. Each row corresponds to one test subregion (from top to bottom: subregions 1–8), and columns represent the models: (a) ground truth data, (b) TAGNet-PE OFF, (c) TAGNet-SE OFF, (d) TAGNet-Z-detrended OFF, and (e) TAGNet (Full).
4. Discussion
The comparative evaluation against three strong baselines—DGCNN, PointNeXt-SSG, and PointNet Transformer—underscores the advantages of TAGNet in ALB classification. Although DGCNN effectively captures local geometric relationships through dynamic graph convolution, its reliance on raw coordinates limits its ability to differentiate between the sea surface and seafloor, where absolute elevations often overlap. PointNeXt-SSG, an improved successor to PointNet++, incorporates residual connections and scalable operators but employs only single-scale grouping, which is insufficient for capturing the multi-scale elevational variations typical of tidal flats. PointNet Transformer leverages global self-attention; however, the absence of the explicit encoding of local elevation statistics and bathymetric gradients results in unstable predictions under noisy or low-relief conditions.
TAGNet addresses these shortcomings by combining geometry-aware feature augmentation with global channel attention. The augmentation module incorporates normalized coordinates, local elevation statistics, and PEs, which jointly enhance the separability of classes with subtle geometric differences. The SE mechanism further reweights global channels, effectively suppressing noisy responses from reflective water surfaces and sparse vegetation. Ablation results confirm that each component contributes complementary discriminative power: PE primarily improves planar surface discrimination, channel attention is crucial for correctly identifying vegetation, and Z-detrending enhances the robustness of sea surface–seafloor differentiation.
These findings demonstrate that TAGNet is not merely an incremental variant of existing graph- or transformer-based models, but a tailored architecture optimized for the heterogeneous and noisy characteristics of tidal flats. By explicitly encoding geometry-aware descriptors and applying global channel attention, TAGNet achieves more balanced and robust performance across regions, thereby providing a reliable foundation for large-scale coastal and tidal monitoring.
Although validation in this study was performed within a single tidal-flat site, TAGNet was designed as a generalizable framework for ALB classification. The region-wise leave-one-region-out protocol, conducted over eight spatially heterogeneous subregions, partially mitigates site dependence by testing the model across diverse local morphologies. Future work will extend the validation to multiple coastal sites to further assess cross-site generalizability.
Beyond data diversity, model diversity is another important axis of evaluation. While a broader comparison with recently introduced large-scale pretrained or multimodal point cloud networks would further enrich the analysis, such models typically rely on RGB, normal maps, or pretraining corpora that are not applicable to single-source ALB data. Therefore, this study focused on evaluating representative architectures that ensure methodological fairness and domain relevance. Future work will extend this comparison to include cross-modal and pretrained frameworks as suitable datasets become available.
Beyond these strengths, several limitations should also be acknowledged. First, tidal-level variation during ALB acquisition may shift the apparent boundary between Sea surface and Exposed flat and alter elevation- or intensity-related attributes, potentially introducing classification inconsistencies. Incorporating auxiliary tidal records or temporal normalization strategies could help mitigate such effects in future studies. Second, TAGNet contains approximately 2.7 million trainable parameters—about 2.3 times more than the PointNet Transformer implementation used in this study (~1.2 million). This increased capacity, which results mainly from the feature-augmentation and channel-attention modules, is expected to introduce a modest computational overhead, although the model remained practical to train under the eight-fold region-wise cross-validation setup used in this study.
5. Conclusions
This study introduces TAGNet, a dynamic graph-based architecture specifically designed for classifying ALB point cloud data in coastal and tidal flat environments. TAGNet integrates geometry-aware feature augmentation, dynamic graph convolutions, and global channel attention, thereby enabling robust discrimination among exposed flats, sea surfaces, seafloors, and vegetation.
Comprehensive experiments using a region-wise leave-one-region-out evaluation strategy demonstrated that TAGNet consistently outperforms state-of-the-art baseline models, such as DGCNN, PointNeXt-SSG, and PointNet Transformer. TAGNet’s feature augmentation ability proved essential for distinguishing geometrically similar classes, while its SE mechanism enhanced robustness by suppressing noisy responses and emphasizing discriminative channels. Ablation analyses further confirmed that PE, Z-detrending, and channel attention each provide complementary benefits, with their integration yielding the most balanced classification performance.
By explicitly addressing the unique challenges of tidal flats—low-relief terrain, specular reflection, and heterogeneous land–water boundaries—TAGNet offers a reliable framework for the large-scale monitoring of coastal and intertidal ecosystems using ALB. Future research will extend this study to multi-sensor data fusion, temporal monitoring, and larger-scale regional generalization to further enhance the applicability of TAGNet for coastal management and environmental monitoring.
Funding
This research was supported by Korea Institute of Marine Science & Technology (KIMST) funded by the Ministry of Oceans and Fisheries (RS-2023-00254717).
Data Availability Statement
The airborne LiDAR bathymetry dataset used in this study was acquired as part of an institutional research project and is not publicly available due to data sharing restrictions.
Acknowledgments
The author acknowledges the Korea Institute of Ocean Science & Technology (KIOST), the leading institution of the funded project, for overall project coordination. The author also gratefully acknowledges GeoStory Co., Ltd. for their technical support in the acquisition of airborne LiDAR bathymetry data used in this study.
Conflicts of Interest
The author declares no conflict of interest.
Appendix A
Each region shows the total number of 3D points and the percentage of each class (Exposed flat, Sea surface, Sea floor, Vegetation). Percentages were computed relative to the total points in each region.
Table A1.
Region-wise distribution of labeled points in the ALB dataset.
Table A1.
Region-wise distribution of labeled points in the ALB dataset.
| Region | Total Points | Exposed Flat (%) | Sea Surface (%) | Sea Floor (%) | Vegetation (%) |
|---|---|---|---|---|---|
| Region 1 | 14,266,166 | 90.7 | 4.9 | 1.7 | 2.6 |
| Region 2 | 13,695,564 | 0.2 | 88.3 | 11.5 | 0.0 |
| Region 3 | 13,331,637 | 94.4 | 4.3 | 0.9 | 0.4 |
| Region 4 | 18,418,598 | 0.1 | 85.1 | 14.8 | 0.0 |
| Region 5 | 9,567,602 | 92.9 | 5.3 | 1.8 | 0.0 |
| Region 6 | 17,139,573 | 2.6 | 83.4 | 14.1 | 0.0 |
| Region 7 | 5,267,644 | 92.1 | 3.8 | 1.2 | 3.0 |
| Region 8 | 16,223,791 | 10.4 | 74.5 | 15.1 | 0.0 |
| Total | 107,910,575 | 38.4 | 52.0 | 9.1 | 0.6 |
Appendix B
Each layer’s name, input/output dimensions, and operation types are summarized to enhance reproducibility.
Table A2.
Layer-by-layer configuration of the proposed TAGNet. ⊕ denotes feature concatenation.
Table A2.
Layer-by-layer configuration of the proposed TAGNet. ⊕ denotes feature concatenation.
| Stage | Layer/Operation | Input → Output Channels | k | Description/Function |
|---|---|---|---|---|
| Input and feature augmentation | Feature construction | , → d | - | Geometry- and physics-aware feature augmentation with local Z statistics (k = 10) and positional encoding |
| EdgeConv module 1 | Conv2d(2 × d, 64, 1) + BN + ReLU | 2 × d → 64 | 20 | Initial local feature extraction using Euclidean k-NN in XYZ space. |
| EdgeConv module 2 | Conv2d(128, 64, 1) + BN + ReLU | 128 → 64 | 20 | Dynamic k-NN updated in feature space; captures fine-scale geometric context. |
| EdgeConv module 3 | Conv2d(128, 64, 1) + BN + ReLU | 128 → 64 | 20 | Deeper local descriptor refinement (mid-level spatial relations). |
| EdgeConv module 4 | Conv2d(128, 64 → 128 → 1024, 1) + BN + ReLU | 128 → 1024 | 20 | Multi-stage convolution for high-dimensional local embedding. |
| Local aggregation | Concatenation (f1 ⊕ f2 ⊕ f3 ⊕ f4) | 64 + 64 + 64 + 1024 = 1216 | - | Multi-scale local descriptor concatenation. |
| Global branch | Conv1d(1216, 1024, 1) + BN + ReLU → MaxPool | 1216 → 1024 | - | Global feature extraction via 1 × 1 conv + max pooling over N points. |
| Channel attention | SE block (reduction 16) | 1024 → 64 → 1024 | - | Emphasizes informative channels; suppresses redundancy. |
| Feature fusion | Concatenate (local ⊕ global) | 1216 + 1024 = 2240 | - | Fused representation per point. |
| Classifier | Conv1d(2240, 512, 1) + BN + ReLU → Dropout (0.5) → Conv1d(512, C, 1) → Softmax | 2240 → 512 → C = 4 | - | Point-wise semantic prediction for {Exposed flat, Sea surface, Sea floor, Vegetation}. |
References
- Barbier, E.B.; Hacker, S.D.; Kennedy, C.; Koch, E.W.; Stier, A.C.; Silliman, B.R. The value of estuarine and coastal ecosystem services. Ecol. Monogr. 2011, 81, 169–193. [Google Scholar] [CrossRef]
- Murray, N.J.; Phinn, S.R.; DeWitt, M.; Ferrari, R.; Johnston, R.; Lyons, M.B.; Clinton, N.; Thau, D.; Fuller, R.A. The global distribution and trajectory of tidal flats. Nature 2019, 565, 222–225. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Dong, X.; Liu, Z.; Gao, W.; Hu, Z.; Wu, G. Mapping tidal flats with Landsat 8 images and Google Earth Engine: A case study of the China’s eastern coastal zone circa 2015. Remote Sens. 2019, 11, 924. [Google Scholar] [CrossRef]
- Ji, X.; Yang, B.; Wang, Y.; Tang, Q.; Xu, W. Full-waveform classification and segmentation-based signal detection of single-wavelength bathymetric LiDAR. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4208714. [Google Scholar] [CrossRef]
- Guo, Y.; Feng, C.; Xu, W.; Liu, Y.; Su, D.; Qi, C.; Dong, Z. Water–land classification for single-wavelength airborne LiDAR bathymetry based on waveform feature statistics and point cloud neighborhood analysis. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103268. [Google Scholar] [CrossRef]
- Xu, W.; Zhang, F.; Jiang, T.; Feng, Y.; Liu, Y.; Dong, Z.; Tang, Q. Feature curve-based registration for airborne LiDAR bathymetry point clouds. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102883. [Google Scholar] [CrossRef]
- Yang, F.; Su, D.; Zhang, K.; Ma, Y.; Wang, M.; Yang, A. Mosaicing of airborne LiDAR bathymetry strips based on Monte Carlo matching. Mar. Geophys. Res. 2017, 38, 303–311. [Google Scholar] [CrossRef]
- Li, S.; Su, D.; Yang, F.; Zhang, H.; Wang, X.; Guo, Y. Bathymetric LiDAR and multibeam echo-sounding data registration methodology employing a point cloud model. Appl. Ocean Res. 2022, 123, 103147. [Google Scholar] [CrossRef]
- Guenther, G.C.; Cunningham, A.G.; LaRocque, P.E.; Reid, D.J. Meeting the accuracy challenge in airborne bathymetry. In Proceedings of the EARSeL Workshop on Lidar Remote Sensing of Land and Sea, Dresden, Germany, 16–17 June 2000. [Google Scholar]
- Su, D.; Gao, H.; Yang, A.; Wang, J.; Mai, X.; Liu, X.; Wu, Z. Classification of seabed sediment by combining airborne LiDAR bathymetry and multispectral remote sensing images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 9023–9032. [Google Scholar] [CrossRef]
- Tysiac, P. Bringing bathymetry LiDAR to coastal zone assessment: A case study in the Southern Baltic. Remote Sens. 2020, 12, 3740. [Google Scholar] [CrossRef]
- Wagner, N.; Franke, G.; Schmieder, K.; Mandlburger, G. Automatic classification of submerged macrophytes at Lake Constance using laser bathymetry point clouds. Remote Sens. 2024, 16, 2257. [Google Scholar] [CrossRef]
- Lowell, K.; Calder, B. Extracting shallow-water bathymetry from LiDAR point clouds using pulse attribute data: Merging density-based and machine learning approaches. Mar. Geod. 2021, 44, 259–286. [Google Scholar] [CrossRef]
- Janowski, L.; Wroblewski, R.; Rucinska, M.; Kubowicz-Grajewska, A.; Tysiac, P. Automatic classification and mapping of the seabed using airborne LiDAR bathymetry. Eng. Geol. 2022, 301, 106615. [Google Scholar] [CrossRef]
- Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph CNN for learning on point clouds. ACM Trans. Graph. 2019, 38, 146. [Google Scholar] [CrossRef]
- Charles, R.Q.; Su, H.; Kaichun, M.; Guibas, L.J. PointNet: Deep learning on point sets for 3D classification and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 652–660. [Google Scholar] [CrossRef]
- Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep hierarchical feature learning on point sets in a metric space. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Qian, G.; Li, Y.; Peng, H.; Mai, J.; Hammoud, H.; Sinha, A.; Bagdanov, A.D.; Theobalt, C. PointNeXt: Revisiting point cloud classification. Adv. Neural Inf. Process. Syst. 2022, 35, 23192–23204. Available online: https://arxiv.org/abs/2206.04670 (accessed on 15 September 2025).
- Guo, M.-H.; Cai, J.; Liu, Z.-N.; Mu, T.-J.; Martin, R.; Hu, S. PCT: Point Cloud Transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Published by MDPI on behalf of the International Society for Photogrammetry and Remote Sensing. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).