3. Results
3.1. Pairwise Comparison
Pairwise comparison of the 48 workflow outputs revealed substantial variability in flow accumulation structure and channel continuity across processing configurations. Visual inspection of the cluster raster stacks (
Figure 3,
Figure 4 and
Figure 5) demonstrates clear differences in drainage concentration patterns depending on ground filtering method, interpolation technique, depression-filling algorithm, and flow routing scheme.
Workflows based on CSF and MCC generally produced coherent, spatially continuous high-accumulation corridors aligned with the main channel network. Accumulation values (up to ~300 contributing pixels) were concentrated along well-defined drainage paths, particularly in configurations using Planchon and Darboux sink filling combined with D8 or Dinf routing. In contrast, configurations using Wang and Liu depression filling frequently exhibited structured grid-like artifacts and spatially fragmented accumulation patterns, particularly when coupled with D8 routing.
PMF-based workflows exhibited systematically lower maximum accumulation values (≤100 pixels) and reduced channel continuity. The resulting flow networks appeared more diffuse and spatially fragmented compared to CSF and MCC outputs, suggesting that aggressive ground smoothing during filtering influenced downstream accumulation structure.
Quantitatively, pairwise IoU scores reflected these structural differences. Higher IoU values were observed among workflows sharing the same ground filtering method, indicating that ground classification exerts strong control over final channel representation. In contrast, workflows differing in ground filtering but sharing interpolation or routing methods exhibited substantially lower spatial agreement.
These results indicate that ground filtering introduces the largest structural divergence among workflows, while interpolation and routing primarily modulate drainage density and spatial detail. Importantly, high inter-method agreement did not uniformly correspond to visually coherent accumulation patterns, reinforcing the distinction between methodological consensus and external plausibility.
3.2. Component-Level Performance Analysis
Figure 6 shows a comprehensive box plot comparison of IoU distributions across the workflow components, revealing distinct performance hierarchies and allowing quantitative assessment of how each processing step influences final channel extraction accuracy. The vertical axis range (0.80–1.00) was optimized to highlight performance differences while maintaining visual clarity across all component comparisons.
Ground filtering methods exhibited the most pronounced differences, with PMF achieving the highest median IoU of 0.900 with the narrowest distribution (IQR = 0.046), substantially outperforming CSF (median = 0.854, IQR = 0.046) and MCC (median = 0.857, IQR = 0.054) in terms of pairwise consensus (
Figure 6a). The 5.1% difference in the median IoU between PMF and CSF represents the largest performance spread observed across any component category, indicating that ground filtering is the most influential processing step in terms of inter-method agreement. PMF’s superior consensus performance reflects its tendency to produce smoother, more generalized terrain representations that converge with the output of other smoothing-prone methods. However, this high consensus does not necessarily indicate superior accuracy in terms of representing actual channel features, as discussed below for the validation results.
The interpolation methods demonstrated remarkably similar performance with medians ranging from 0.856 to 0.876 and overlapping distributions, suggesting interpolation choice minimally affects inter-method agreement (
Figure 6b). The narrow performance range of only 2.3% between methods—substantially smaller than the 5.1% range for ground filtering—reflects the relatively uniform spatial distribution of ground points (4.76 points/m
2) achieved during UAV-LiDAR acquisition. IDW demonstrates slightly higher median IoU (0.876) than did TIN (0.856), MBA (0.857), and Kriging (0.859), although the overlapping interquartile ranges indicate that these differences are not statistically substantial.
Turning to the sink-filling algorithms, Wang & Liu (median = 0.890) outperformed Planchon and Darboux (median = 0.846) as revealed by a lower variability (IQR = 0.050 versus 0.044), indicating greater consistency across different input DTMs (
Figure 6c). The Wang and Liu’s priority-flood approach produces more connected, streamlined channel networks by aggressively filling depressions to ensure complete drainage and, therefore, better agrees with similarly aggressive methods than more timid methods.
For the flow direction methods, D8 (median = 0.860) and D-infinity (median = 0.859) performed nearly identically with overlapping distributions and minimal median difference (<0.1%), suggesting that the flow routing algorithm has a negligible impact on the extraction outcomes in a low-gradient environment (
Figure 6d). The computational efficiency advantage of D8 (3–5 times faster than D-infinity in our processing tests) makes it preferable for operational applications in similar low-gradient settings.
Table 1 quantifies these patterns using the summary statistics for all pairwise comparisons. This provides detailed numerical support for the visual patterns observed in
Figure 5. Ground filtering exhibits the largest performance spread with a PMF median of 0.900 versus CSF of 0.854, representing a 5.1% difference, while interpolation methods show minimal variation with a maximum 2.3% difference between IDW (0.876) and TIN (0.856). The coefficients of variation across all components range from 0.048 to 0.061 and therefore indicate stable reproducible patterns in pairwise comparison despite the large number of workflow combinations evaluated.
3.3. Top-Performing Workflows by Statistical Consensus
Table 2 identifies workflows that achieved the highest pairwise agreement (i.e., statistical consensus across methods). The rankings reflect inter-method agreement patterns rather than the validation plausibility against observable features.
We identified certain cases where different workflow configurations produced identical channel rasters. These duplicate outputs were grouped and treated as a single unique solution when interpreting pairwise statistics. Perfect IoU values associated with very small numbers of comparisons, therefore, reflect technical duplication rather than independent methodological confirmation.
Workflows with perfect IoU scores (1.000) indicate that the outputs were identical to those of compared workflows; however, low N values reveal important nuances. PMF-TIN-Planchon-Dinf and PMF-TIN-Wang-D8 both achieved a perfect median IoU of 1.000, but their N comparisons equaled 1, indicating that these workflows are in fact identical, producing duplicate outputs because of equivalent processing chains.
The CSF-IDW-Wang-D8 workflow demonstrates a robust performance with a median IoU of 0.905 across 45 diverse comparisons, representing genuine agreement across a wide range of methodologically distinct workflows rather than technical duplication. However, as discussed in subsequent sections, workflows achieving the highest pairwise statistical consensus do not necessarily offer the most plausible representation of channel features when compared with external observations.
3.4. Independent Validation Results
Figure 7 demonstrates side-by-side comparisons of the validated optimal workflows against Sentinel-2 imagery, revealing critical differences in how workflows represent observable channel features despite similar pairwise statistical performance.
Visual comparisons of all 48 workflow outputs against Sentinel-2 true-color imagery showed that the CSF-MBA-Planchon-D8 workflow provided consistently superior correspondence with observable channel features across the study area. This workflow achieved a median pairwise IoU of 0.857 when compared statistically against other methods, but demonstrated superior alignment with independently observable features at the Sentinel-2 scale. This workflow featured three key advantages over the consensus-leading alternatives:
Main channel alignment. The CSF-MBA-Planchon-D8 workflow accurately followed the sinuous geometry of meandering channels, clearly visible as water surfaces and riparian vegetation corridors in the satellite imagery. Channel centerlines extracted by this workflow showed excellent spatial agreement with observable channel courses, maintaining realistic curvature and avoiding artificial straightening artifacts present in some high-consensus workflows.
Tributary representation. Low-order tributaries visible as linear vegetation patterns, soil moisture signatures, or narrow water surfaces in densely vegetated zones were correctly represented in the CSF-MBA-Planchon-D8 output. These fine-scale features, critical for wetland hydrological connectivity, were systematically underrepresented or missed entirely by many workflows achieving higher pairwise consensus scores.
False positive control. The validated workflow avoided the extraction of channels in transition zones between well-defined channels and surrounding hillslopes where no channel features were observable in satellite imagery. In contrast, several high-consensus workflows systematically over-predicted channels in these ambiguous areas, likely due to terrain smoothing effects that create artificial drainage patterns.
Despite ranking only 18th in the pairwise statistical comparison (median IoU = 0.857), the CSF-MBA-Planchon-D8 workflow consistently outperformed the top-ranked consensus workflows, including PMF-TIN-Planchon-Dinf (median pairwise IoU > 0.900), in terms of correspondences to observable channel corridors in Sentinel-2 imagery. This finding reveals a fundamental disconnect between inter-method consensus and plausibility with respect to external observations.
The superior performance of CSF-MBA-Planchon-D8 is particularly evident in complex zones, including densely vegetated areas, where CSF’s adaptive ground filtering preserved subtle channel incisions, meandering channel segments, where MBA’s hierarchical spline interpolation maintained a smooth, realistic curvature, and hydrologically complex areas, where Planchon and Darboux sink-filling preserved natural depressions rather than artificially enforcing complete drainage connectivity.
Although this validation approach is necessarily limited by the 10 m resolution of Sentinel-2 imagery and the inherent challenges of establishing absolute ground truth for fine-scale features, the systematic visual comparison and scale-consistent IoU provide critical external evidence that high pairwise statistical consensus does not guarantee accurate representation of actual landscape features. Multiple workflows may converge on similar outputs that collectively deviate from observable reality—a phenomenon with profound implications for operational applications where accuracy rather than reproducibility determines project success.
4. Discussion
4.1. Inter-Method Consensus Does Not Ensure Accuracy
The most significant finding of this study is the substantial disconnect between inter-method statistical consensus and validated correspondence with observable features. Workflows achieving near-perfect pairwise agreement (median IoU > 0.900) demonstrated systematic deviations from channel corridors observable in Sentinel-2 imagery at the 10 m scale, while the visually validated optimal workflow CSF-MBA-Planchon-D8, ranking only 18th in the pairwise statistical comparison, provided consistently superior representation of observable channel geometry, tributary connectivity, and spatial extent.
This reveals a fundamental limitation of consensus-based validation frameworks, in which multiple methods may systematically converge on similar but collectively inaccurate results. This phenomenon likely arises because many LiDAR processing algorithms were developed and optimized for high-relief forested or urban environments. In low-gradient wetlands, these algorithms may share common systematic biases such as over-smoothing subtle channel features, misclassifying low shrubs as ground, or failing to preserve gentle elevation transitions that define wetland drainage patterns.
The practical implications are profound for operational applications. Validation strategies relying solely on inter-method comparisons can produce results that are misleading because high consensus does not guarantee accuracy. External validation against observable features using satellite imagery, field surveys, or other independent data sources is essential for operational applications where accuracy rather than reproducibility determines success. This finding has particular relevance for wetland restoration, where incorrect channel mapping could lead to restoration plans that fail to achieve the hydrological connectivity objectives or that inadvertently damage existing ecological functions.
IoU-based pairwise comparisons emphasize internal consistency among workflows, effectively selecting solutions that appear reasonable within an ensemble of extraction results. However, in environments characterized by fine-scale channel networks such as Kushiro Wetland, these subtle features may be treated as noise by many workflows, leading to systematic collective under-representation. This distinction is particularly critical in low-gradient wetlands where fine-scale tributary networks provide essential hydrological connectivity in terms of species dispersal, nutrient transport, and seasonal flooding patterns. The consequences of failing to represent these features accurately extend beyond mapping precision to fundamental questions of ecosystem function and restoration effectiveness.
4.2. Component-Level Drivers
The superior validation performance of CSF-based workflows, despite the fact that PMF achieves higher pairwise consensus, can be attributed to fundamental algorithmic differences in how these methods handle low-relief terrain. PMF employs fixed morphological windows that iteratively remove non-ground points based on elevation thresholds. This approach tends to over-smooth terrain in low-relief areas where elevation differences between ground and low vegetation approach the discrimination threshold of the method, resulting in loss of subtle topographic undulations that define wetland channels. CSF’s cloth simulation metaphor is associated with adaptive terrain-following behavior that better preserves the gentle elevation transitions characteristic of wetland channels. The pairwise comparison results that favored PMF likely reflect the tendency of PMF to produce smoother, more generalized terrain representations that were similar to the outputs from other smoothing-prone methods.
Although the pairwise comparisons suggested minimal differences among the interpolation methods (median IoU = 0.856–0.876), the validation results revealed MBA superiority in terms of channel delineation within optimal workflows. MBA’s multilevel B-spline approach progressively refines a surface representation via hierarchical decomposition, effectively smoothing measurement noise at coarse scales while preserving dominant features at fine scales. This hierarchical framework naturally accommodates the smooth, organically meandering channel geometries characteristic of alluvial wetlands.
Pairwise comparisons favored Wang and Liu [
16] sink-filling (median IoU = 0.890 versus Planchon = 0.846), but the validated optimal workflow employed Planchon and Darboux [
15]. This reversal reflects different algorithm behaviors in low-gradient terrain with real rather than spurious depressions. Wang and Liu’s priority-flood approach aggressively fills depressions to ensure complete drainage, potentially removing real hydrological features, including wetland pools, backwater areas, and abandoned channels that are topographically disconnected but ecologically significant. In low-gradient wetlands where real depressions are common, this approach may over-simplify hydrological connectivity. Planchon and Darboux [
15] employ epsilon-based filling that preserves small depressions below the threshold while ensuring computational efficiency. This conservative approach better maintains realistic hydrological complexity in wetland environments, as evidenced by superior visual correspondence with natural channel patterns observable in satellite imagery.
Both flow direction methods performed near-identically in the pairwise comparisons (median IoU difference < 0.001) and also in the plausibility-based validation, with the optimal workflow employing D8. This finding contrasts with the expectation that D-infinity’s continuous flow direction representation would improve accuracy in complex terrain. The similarity likely reflects the study area’s low topographic complexity and relatively well-defined channels. However, this finding should not be over-generalized. In environments with complex microtopography, divergent flow paths, or braided channels, D-infinity’s continuous flow representation would likely provide substantial advantages.
4.3. Implications for Wetland Restoration and Management
The validated workflow has several direct applications to ongoing Kushiro Wetland restoration efforts. Historical channel reconstruction becomes possible through processing of pre-channelization aerial LiDAR or DEM data through the validated workflow, enabling restoration planners to reconstruct historical drainage networks that will guide re-meandering design. The workflow’s superior preservation of natural channel geometry is particularly valuable in this context. Hydrological connectivity assessment is enhanced through accurate channel networks that enable modeling of surface water flow paths, residence times, and inundation dynamics under different restoration scenarios. This supports evidence-based evaluation of intervention effectiveness before implementation. Ecological habitat mapping benefits from channel networks derived through the validated workflow, which provide base layers for species distribution models, particularly those of aquatic taxa that are dependent on hydrological connectivity. Such improved representation of low-order tributaries is critical when assessing habitat availability for species, including the endangered red-crowned crane and various wetland-dependent fish and amphibians. Monitoring of restoration outcomes becomes more quantitative when repeated UAV-LiDAR surveys are processed through the standardized workflow.
The methodology transfers to diverse wetland environments requiring high-resolution hydrological characterization. Although this study was grounded in Kushiro Wetland, the key methodological insight (that inter-method consensus can mask systematic bias) is broadly applicable to low-gradient, vegetated landscapes worldwide, including peatlands, floodplains, and coastal marshes.
4.4. Limitations and Future Research Directions
Several limitations warrant careful consideration when interpreting these results or extending findings to other contexts. This study focused on a single wetland type, specifically a temperate alluvial wetland, in a specific geographic region. Performance rankings may differ substantially in peatlands with organic soils and microtopographic hummock–hollow structure, tidal wetlands where bidirectional flow and intertidal exposure complicate surface definition, tropical wetlands with distinct vegetation phenology, and arid wetlands dominated by ephemeral drainage patterns. The validated workflow identified in this study should therefore not be assumed universally optimal. Site-specific validation is strongly recommended prior to transferring this workflow to substantially different geomorphic or ecological contexts.
The analysis employed single-date LiDAR acquisition during late-autumn baseflow conditions. Performance may vary with the seasonal vegetation phenology, hydrological state, or soil moisture conditions. Multi-temporal assessment would reveal whether validated workflow configurations remain optimal across seasonal variations.
Although a 5 m grid resolution was appropriate for the point density and channel dimensions in this study, finer-scale hydrological features (e.g., sub-meter rills, microtopographic flow paths) are not captured. Studies targeting ephemeral channels or detailed surface roughness would require higher point densities and finer grid resolution. However, the selected resolution is consistent with wetland restoration planning scales and regional hydrological modeling applications.
The validation approach employed in this study, although independent of LiDAR processing assumptions, was necessarily limited and subject to several inherent constraints. Sentinel-2’s 10 m spatial resolution cannot resolve channels narrower than approximately 10 m, limiting validation primarily to main channels and major tributaries while precluding precise geometric assessment of fine-scale features. The visual comparison approach, although systematic, lacks the quantitative rigor of field-validated reference data collected in differential Global Navigation Satellite System (GNSS) surveys or high-resolution aerial photography. Observer subjectivity in identifying channel features from optical imagery may introduce interpretation biases, particularly in densely vegetated areas where channel signatures are subtle. Future research should complement satellite-based plausibility checks with field-collected reference data to establish a more definitive accuracy measure, though we note that the relative performance rankings established through visual validation provide valuable operational guidance even in the absence of absolute accuracy measures.
Although the 48 workflows encompassed substantial methodological diversity, individual algorithm parameters were held constant rather than comprehensively optimized. Full parameter optimization would require orders of magnitude more computational resources, but might reveal additional performance improvements.
This study evaluated widely implemented, general-purpose algorithms rather than specialized, wetland-specific methods. Future research should evaluate advanced methods, including machine learning classification, multi-return LiDAR analysis, multi-sensor fusion, and hydrologically conditioned filtering, against the baseline established here.
5. Conclusions
This study demonstrates that processing methodology introduces substantial uncertainty in wetland channel extraction, even when high-quality UAV–LiDAR data are available. Although several workflows achieved strong inter-method statistical consensus, the externally validated optimal configuration ranked only 18th in pairwise agreement, clearly illustrating that reproducibility does not necessarily imply accuracy. Component-level analysis revealed that ground filtering exerts the strongest influence on inter-method consensus; however, validation results emphasized that preservation of subtle channel morphology may require workflow configurations that do not maximize statistical agreement.
Ground filtering emerged as the most influential processing step in determining inter-method agreement, while validation results emphasized the importance of preserving subtle channel morphology rather than maximizing consensus. Sink-filling and flow-routing methods exerted comparatively minor influence in this low-gradient environment.
The broader methodological implication is clear: evaluation frameworks for LiDAR-derived hydrological products must extend beyond internal agreement metrics to incorporate independent validation. Without such external checks, methodological convergence may conceal systematic bias.
The proposed dual-validation framework provides practical guidance for wetland restoration planning and establishes a transferable evaluation structure for other low-relief landscapes worldwide.
Future research should replicate this methodological framework across diverse landscape types and temporal conditions to develop comprehensive method selection guidelines based on measurable environmental attributes. Integration of multi-scale validation approaches combining field surveys and satellite imagery would strengthen ground truth quality, while systematic exploration of resolution effects and threshold sensitivity would reveal generalization boundaries for the current findings. Ultimately, advances in geospatial derivative accuracy require an understanding of how landscape characteristics, data properties, and processing methods interact to determine derivative quality.