PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone

Murgu, Lucian Alexandru; Barbu, Tudor

doi:10.3390/app16115559

Open AccessArticle

PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone

by

Lucian Alexandru Murgu

¹ and

Tudor Barbu

^2,*

¹

Doctoral School of Mathematical Sciences and Informatics of SCOSAAR, Romanian Academy, P.O. Box 15-764, 014700 Bucharest, Romania

²

Institute of Computer Science, Romanian Academy—Iași Branch, 700481 Iași, Romania

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(11), 5559; https://doi.org/10.3390/app16115559

Submission received: 27 April 2026 / Revised: 23 May 2026 / Accepted: 28 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue Recent Progress and Challenges of Digital Health and Bioengineering, 2nd Edition)

Download

Browse Figures

Versions Notes

Abstract

Retinal vessel segmentation remains challenging for thin vessels and low-contrast bifurcations. We evaluate a Swin-UNet-style model family that conditions decoder features with a single-channel local fractal dimension prior refined by a short learnable anisotropic diffusion model and injected through Spatially-Adaptive Normalization (SPADE). On Fundus Image Vessel Segmentations (FIVES), the strongest no-test-time-augmentation result was obtained by OPT-I v2 at 200 epochs, reaching Dice 0.8899, clDice 0.8517, and Area Under the ROC Curve (AUC) 0.9904, compared with 0.8643, 0.8125, and 0.9856 for the matched 200-epoch baseline. In a matched Neural Cellular Automata (NCA)/no-NCA ablation using the same seed, data, 200-epoch budget, and evaluation pipeline, enabling NCA improved the test Dice from 0.8813 to 0.8907 and the test clDice from 0.8325 to 0.8518, with NCA winning on all 80 paired test images for both metrics. The results support PDE (partial differential equation)-SPADE fractal prior conditioning and NCA topology refinement as ablation-grounded improvements over the tested baseline family, while broader matched external validation requires future work.

Keywords:

retinal vessel segmentation; local fractal dimension; fractal prior; nonlinear anisotropic diffusion model; finite difference-based numerical scheme; SPADE; Swin-UNet; neural cellular automata; Skeleton Recall; topology-aware segmentation; FIVES

1. Introduction

Automated segmentation of retinal vessels from fundus images supports the extraction of vascular measurements for retinal and systemic vascular analysis [1,2,3]. Classical and deep learning pipelines have progressively improved pixel-wise overlap, from matched-filter approaches and U-Net baselines to transformer-based encoder-decoder architectures [4,5,6]. Nevertheless, thin vessels and bifurcations remain difficult to segment because they are low-contrast, spatially sparse, and geometrically self-similar across scale.

Fractal descriptors provide a principled way to expose self-similar vascular geometry to a segmentation model. The fractal feature maps framework showed that explicit fractal priors can improve tubular-structure segmentation [7], while fractal descriptors also provide a compact way to quantify retinal vascular complexity [3,8,9]. The image-level local fractal dimension estimator used in this paper follows differential box counting [10]. In parallel, Swin-based architectures provide an efficient hierarchical backbone for dense prediction [5,6]. Recent retinal-specific models such as FR-UNet [11], SGAT-Net [12], and EFDG-UNet [13] have pushed performance further, whereas careful evaluation studies have shown that protocol choices can materially influence reported rankings [14,15].

This study focuses on an architecture in which a single-channel local fractal dimension prior is not used as a static handcrafted channel. Instead, the prior is evolved through a short learnable fractal anisotropic diffusion PDE and then injected into decoder skip features through SPADE-style spatial conditioning. The working hypothesis is that explicit geometric guidance at the reconstruction stage can improve vessel continuity and overlap under a fixed-capacity Swin-UNet-style backbone. The reported computational overhead should therefore be interpreted only relative to the tested baseline family, not as a universal compactness claim against the broader retinal vessel literature.

The contributions of this study are organized in three categories.

First, at the architectural level, the paper studies a single-channel LFD -> learnable PDE -> SPADE conditioning pathway within a Swin-UNet-style decoder and a topology-oriented OPT-I extension that adds NCA-based post-decoder refinement.

Second, at the empirical level, the completed FIVES and HRF runs show that learnable SPADE/PDE-SPADE conditioning is more effective than fixed LFD gating in the available ablations, and that a controlled OPT-I v2 no-NCA ablation confirms that the NCA head independently improves Dice, clDice, and vessel connectivity-sensitive metrics under the matched 200-epoch protocol.

Third, at the engineering and reporting level, the paper quantifies the shared conditioning path overhead relative to the tested baseline (+14.4% parameters and +36.2% FLOPs), reports learned-PDE scalar values at epoch 200 for the NCA and no-NCA paired runs, and states explicit boundaries concerning missing cross-dataset evaluation, multi-seed variance, matched external reruns, qualitative panels, and NCA-specific runtime profiling.

2. Related Work

2.1. Retinal Vessel Segmentation with CNN and Transformer Backbones

U-Net established the dominant encoder–decoder paradigm for biomedical segmentation [4], and full-resolution retinal vessel models such as FR-UNet further improved thin vessel recovery [11]. Swin transformer architectures introduced hierarchical shifted-window attention [5], while Swin-UNet adapted that design to medical image segmentation [6]. SGAT-Net is also relevant because it evaluates a stimulus-guided transformer architecture on FIVES, DRIVE, STARE, and CHASEDB1 [12]. More recent retinal-specific architectures continue to refine attention, fusion, and efficiency, but the central challenge remains the same: preserving fine vascular geometry without losing global structure.

2.2. Fractal Priors for Tubular Structure Segmentation

Fractal geometry offers a natural descriptive language for tree-like biological networks. Mandelbrot’s formulation of fractal self-similarity [8] and lacunarity-based descriptors of spatial heterogeneity [9] motivate explicit geometric priors for vascular analysis, while retinal vascular fractal dimension has been linked to cardiovascular and ocular risk factors in population studies [3]. In deep learning, the FFM framework demonstrated that pixel-level fractal priors can improve tubular structure segmentation [7].

The image-level LFD estimator used here follows differential box counting, a standard grayscale-image fractal-dimension estimator [10]. The present work adopts the same general motivation but studies a compact LFD prior paired with learnable PDE refinement, SPADE-style skip conditioning, and a topology-oriented OPT-I extension.

An image-derived grayscale LFD prior was chosen intentionally. A vessel tree or skeleton-derived fractal-dimension estimate would require a vessel mask before the prior could be computed, which would introduce circularity in a model whose purpose is precisely to predict that mask. By contrast, differential box counting can be computed directly from the normalized fundus image and therefore serves as a pre-segmentation structural cue. For the same reason, the LFD map used here should not be interpreted as a clinical retinal vascular biomarker derived from a segmented vascular tree.

2.3. PDE-Based Refinement and Spatially-Adaptive Conditioning

Nonlinear diffusion remains a classical mechanism for smoothing within structures while preserving edges [16,17,18]. Beyond classical PDE formulations, learnable or unrolled PDE-style networks have shown that diffusion dynamics can be embedded in trainable architectures, either to learn task-specific diffusion processes or to approximate evolution operators from data. Representative examples include the trainable nonlinear reaction–diffusion models for image restoration [19] and PDE-Net for learning PDE dynamics from data [20]. Within that landscape, the present work uses a substantially narrower PDE role: not full-image restoration or equation discovery, but short-horizon regularization of a single-channel fractal prior before SPADE-based skip conditioning. SPADE provides a learned spatially adaptive modulation mechanism that conditions internal features using an external map rather than only through early input concatenation [21].

2.4. Topology-Aware Supervision

Retinal vessel segmentation is not judged solely by region overlap. Topology-aware losses such as clDice and Skeleton Recall improve centerline consistency in thin tubular structures [22,23], and iterative refinement mechanisms have also been explored for segmentation error correction [24]. In this study, NCA is used as a lightweight local post-decoder refiner because it applies weight-shared local updates that are conceptually aligned with repairing short discontinuities in thin vessels. This choice should not be interpreted as evidence that NCA is uniquely optimal relative to recurrent decoders, graph-based post-processing, or other refinement strategies; rather, it was selected here as a compact iterative module within the evaluated family.

3. Materials and Methods

This section describes the network architecture, prior construction, conditioning mechanism, topology-aware refinement, loss, and evaluation protocol used in the study. For brevity, the studied family is referred to as FP-Swin-UNet throughout the paper, although the backbone is described more cautiously as Swin-UNet-style. Figure 1 provides an overview of the shared conditioning pathway and the optional NCA refinement used by OPT-I.

For readability, we use the following shorthand throughout the remainder of the paper. Baseline denotes the unconditioned Swin-UNet baseline. OPT-D denotes the core PDE-SPADE fractal prior model. OPT-F denotes the same family with Skeleton Recall enabled. OPT-H denotes the strongest full non-NCA stack. OPT-I denotes the topology-oriented variant that adds NCA refinement and the revised topology-aware training recipe. OPT-I v2 denotes the 200-epoch retraining of OPT-I. The suffix “+D4 TTA” indicates dihedral test-time augmentation applied only at inference.

3.1. Network Architecture

The segmentation backbone is a fixed-capacity hierarchical Swin-UNet-style encoder–decoder with embedding dimension 32 and depth tuple (1, 1, 1), producing a three-stage feature hierarchy with channel widths 32, 64, and 128. The embedding dimension 32 and depth tuple (1, 1, 1) were chosen deliberately to keep backbone capacity fixed across all variants in the ablation family. The goal of the present submission is therefore methodological isolation of prior conditioning and topology-oriented refinement under a common compact backbone, not a capacity-maximized state-of-the-art search. Evaluation with deeper Swin-style backbones requires future work. Table 1 summarizes the shared backbone used in the study.

3.2. Local Fractal Dimension Prior Construction

The model uses a single-channel local fractal dimension (LFD) map, computed by differential box counting (DBC) over box sizes s ∈ {2, 4, 8, 16} within a sliding window of size 32. The map is an image-derived fractal descriptor, not a global clinical vascular FD computed from a binarized vessel tree. In each window, DBC treats the locally normalized grayscale intensity as a surface and estimates how the accumulated box count changes with scale.

n_{i j} (s) = f l o o r (\frac{M_{i j} (s) - m_{i j} (s)}{\frac{s}{G}}) + 1

(1)

N_{r} (s) = \sum_{i j} n_{i j} (s)

(2)

D = - \frac{d l o g N_{r} (s)}{d l o g s}

(3)

Here,

M_{i j} (s) a n d m_{i j} (s)

are the maximum and minimum intensities inside box (i, j), G = 256 denotes the gray-level count, and N_r(s) is the accumulated box count at scale s. The normalized LFD map is forwarded as the single conditioning channel. Because it is computed on locally normalized image intensities rather than on a segmented vascular skeleton, the LFD map should be interpreted as a discriminative structural prior related to local fractal complexity, not as a direct clinical FD biomarker. Figure 2 illustrates the input image, vessel annotation, and corresponding raw LFD prior for a representative sample. Table 2 summarizes the prior generation and injection pipeline. Appendix A expands the intermediate quantities used in the DBC computation.

This distinction is important when reading the LFD visualization. Local intensity normalization and the finite DBC window can change the numerical direction of the signal relative to raw grayscale or binary-tree FD estimates. The network therefore uses the spatial pattern and normalized ordering of the LFD map rather than its absolute value. Absolute clinical retinal FD values should be estimated from binarized vascular trees using a separately validated biomarker pipeline.

The DBC hyperparameters were fixed at a window size of 32 and scales {2, 4, 8, 16} throughout the present submission so that the downstream ablation isolates the effect of conditioning rather than mixing it with prior generation changes. This controlled choice should not be read as proof that these settings are globally optimal; sensitivity to the DBC window and scale set remains an important follow-up experiment. A vessel tree or skeleton-derived fractal-dimension estimate would require a vessel mask before the prior could be computed and would therefore introduce circularity into a pre-segmentation conditioning pipeline.

3.3. Learnable Fractal Anisotropic Diffusion and PDE-SPADE Conditioning

Let F denote the normalized raw LFD prior and X a decoder skip-level feature map. The raw DBC-derived map contains useful information about local branching complexity, but it is also affected by finite-window estimation, local contrast variation, and sparse peripheral structure. Before F is used for skip conditioning, we therefore regularize it with a short learnable anisotropic diffusion evolution. Intuitively, the diffusion term smooths unstable local fluctuations, the fidelity term prevents the evolved field from drifting too far from the input prior, and the curvature-dependent term allows the update to respond differently near sharp transitions and elongated structures. This branch is not an external preprocessing filter applied to the RGB image; it is an internal differentiable regularizer applied only to the single-channel prior. In continuous notation, the reported PDE block can be written as follows:

{\begin{matrix} \frac{\partial u}{\partial t} = α ψ_{θ} (| \nabla u | | \nabla^{2} u |) \nabla \cdot [ϕ_{θ} (| \nabla u_{σ} |, F) \nabla u] - λ (u - u_{0}), \\ u (x, 0) = u_{0} (x) = F (x) \end{matrix}

(4)

The fractal-modulated conductance and curvature modulation are parameterized so that the coupling to F(x) makes the diffusivity spatially dependent on local fractal complexity. Consequently, the PDE is not a uniform smoothing layer: it evolves the LFD field under a conductance that can vary across vessel-rich regions, bifurcations, thin peripheral structures, and background tissue before the decoder sees the prior. We use the following conductance and curvature forms:

ϕ_{θ} (s, x) = β e x p (- \frac{s^{2}}{κ^{2} + ε}) (1 - ω F (x)), κ^{2} = \frac{ξ}{η + ε}, ψ_{θ} (r) = \sqrt{ν r^{3} + γ}

(5)

In Equations (4) and (5), u = u(x, t) denotes the evolving prior field,

u_{0} (x) = F (x)

denotes the raw normalized LFD prior, x is the spatial position, t is the diffusion time, ∇ is the gradient operator,

\nabla \cdot

is the divergence operator, and

\nabla^{2}

denotes the Laplacian. The coefficient α controls the diffusion strength, λ anchors the evolved field to the input prior, σ sets the Gaussian smoothing scale used for robust gradients, β controls the conductance scale, ξ and η set the edge threshold through κ² = ξ/(η + ε), ε is a numerical stabilizer, ν and γ modulate the curvature-dependent term, and ω couples the conductance to the local fractal-dimension signal. The functions ϕ_θ and ψ_θ denote the learned conductance and curvature-related modulation terms. In the implementation, the scalar values are represented by unconstrained trainable variables and mapped to bounded physical ranges through sigmoid reparameterization.

This PDE-based model is used only as a short-time evolution. It is unrolled for T = 5 explicit Euler steps with Δt = 0.1, which keeps the branch interpretable as a bounded learned regularizer rather than a long-time diffusion process:

u^{k + 1} = u^{k} + Δ t D_{θ} (u^{k}, F), k = 0, \dots, T - 1, \tilde{F} = u^{T}

(6)

The refined prior

\tilde{F}

is then used as the conditioning signal for SPADE-style spatial modulation. For skip feature X, instance normalization is applied before the learned spatial scale and bias maps generated from

\tilde{F}

; instance normalization was chosen because the reported training uses batch_size = 1 patch sampling, for which batch normalization statistics would be unreliable. The SPADE-style modulation is written as

{\hat{X}}_{b, c, i, j} = \frac{X_{b, c, i, j} - μ_{c} (X)}{σ_{c} (X) + ε}, Y = (1 + S (\tilde{F})) ⊙ \hat{X} + B (\tilde{F})

(7)

The full conditioning path is therefore raw LFD prior -> learnable PDE refinement -> SPADE-style skip modulation -> decoder fusion. The short unrolling, bounded parameterization, fidelity anchoring, and subsequent normalization keep the branch as a bounded learned prior regularizer rather than a long-time diffusion process. We do not claim formal continuous-to-discrete well-posedness [17,18] or fractal dimension preservation for this learned module; its role is empirical and architectural. The completed paired NCA/no-NCA runs log scalar-parameter trajectories for monitoring purposes, and the epoch-200 values are reported in Appendix B. Appendix B gives the finite difference reporting form and the parameter table used to document the PDE block.

For the topology-oriented OPT-I variant, the coarse decoder prediction is followed by an NCA refinement stage. This stage applies learned local updates to the predicted vessel map with the goal of repairing thin-vessel discontinuities and improving centerline consistency. In the revised experiment package, the NCA contribution is isolated by a paired 200-epoch OPT-I v2 ablation in which the NCA-enabled and NCA-disabled models use the same code, seed (42), FIVES data, training budget, and evaluation pipeline; only model.nca.enabled is changed. This paired ablation is reported in Section 4.2.

3.4. Training Objective

The training objective combines region-overlap, pixel-wise, hard-example, and topology-aware supervision. In compact form, the optimized objective is

L_{t o t a l} = L_{D i c e} + L_{B C E} + 0.1 L_{F o c a l} + 0.25 L_{c l D i c e},

(8)

where

L_{D i c e}

and

L_{B C E}

provide the primary overlap and discrimination terms, focal loss [25] emphasizes hard pixels, and clDice [22] promotes centerline continuity. The displayed objective corresponds to the core OPT-D configuration. The focal and clDice weights were set heuristically from preliminary development rather than by exhaustive grid search.

Later variants additionally enable Skeleton Recall [23], fractal BCE regularization, Hessian cues, and, for OPT-I/OPT-I v2, NCA topology refinement with increased topology-aware emphasis. Table 3 summarizes the base loss terms and later topology/complexity extensions. Exact per-variant loss coefficients beyond the verified core OPT-D values should be read from the resolved configuration files when those files are included with the public code release.

3.5. Datasets, Experimental Protocol, and Reproducibility

3.5.1. Datasets

Experiments are reported on FIVES [26] and HRF [2]. FIVES contains 800 high-resolution multi-disease fundus photographs with pixelwise manual annotations and a recommended 600/200 train-test split, and serves as the primary benchmark. High-Resolution Fundus (HRF), which contains 45 high-resolution images, is used for ablation analysis. Table 4 summarizes the datasets considered in this study.

3.5.2. Configuration Details

Table 5 summarizes the configuration details that could be verified directly from saved configuration files, checkpoints, profiler outputs, evaluation logs, and the completed OPT-I v2 paired NCA-ablation runs. The completed FIVES evaluation includes the baseline, OPT-D, OPT-F, OPT-H, OPT-I, OPT-I v2, and OPT-I v2 + D4 TTA, as well as the controlled OPT-I v2 NCA/no-NCA pair. OPT-D corresponds to the core PDE-SPADE configuration described in Section 3.1, Section 3.2, Section 3.3 and Section 3.4, OPT-F follows the same family with Skeleton Recall enabled, OPT-H denotes the strongest full non-NCA 200-epoch variant, OPT-I is a 100-epoch topology-oriented variant, and OPT-I v2 is the 200-epoch topology-oriented NCA variant. D4 TTA denotes optional dihedral group test-time augmentation applied only at inference. The controlled no-NCA ablation uses the same 200-epoch budget and seed 42 as the NCA-enabled revision run, with only model.nca.enabled flipped. The HRF ablation covers configurations A–H.

3.5.3. Reproducibility and Reporting Boundaries

Dice, clDice, and AUC are the quantitative metrics emphasized in the Results section. Threshold-dependent metrics were selected on validation outputs only and then applied once to the test set. For older archived runs, the saved package records the selected τ* values but not the full sweep grid; for the controlled NCA/no-NCA pair, thresholds were selected independently on the validation split (τ* = 0.70 with NCA and τ* = 0.65 without NCA). Test images were not used for threshold selection. A deterministic reproduction check matched the OPT-D FIVES metrics to 10 decimal places. For the NCA isolation experiment, per-image test predictions were materialized for both paired runs (80 images each), enabling paired bootstrap confidence intervals and one-sided Wilcoxon signed-rank tests. More details are provided in Appendix C. The available package still does not include finalized qualitative figure panels, inference-time measurements, peak-VRAM logs, or multi-seed summaries. Generative AI assistance was limited to language editing, structural revision, consistency checking, document formatting, and figure layout preparation after the experiments and saved evaluations had already been completed. No GenAI tool was used to generate data, run model training or evaluation, select checkpoints, or determine the reported scientific conclusions.

4. Results

4.1. Main Quantitative Results on FIVES

Table 6 reports the primary no-TTA FIVES comparison for the matched Swin-UNet baseline and five FP-Swin-UNet variants. The D4-TTA result and the paired NCA-isolation experiment are reported separately, because they answer different questions: Table 6 compares training variants under the archived no-TTA protocol, while Section 4.2 isolates whether the NCA head itself adds value when all other conditions are matched.

Three patterns stand out. First, within this fixed-capacity family, all no-TTA FP-Swin-UNet variants outperform the 200-epoch baseline in Dice, clDice, and AUC, indicating that the added conditioning and topology-oriented recipe is beneficial under the tested setup. This should not be interpreted as pure training efficiency, because the architecture and loss formulation also changed across variants. Second, among the no-TTA models, OPT-I v2 is the strongest completed aggregate result, reaching Dice 0.8899, clDice 0.8517, and AUC 0.9904. Third, the earlier 100-epoch OPT-I run already improves clDice to 0.8419, which is consistent with the intended emphasis on centerline continuity. The recorded validation-selected thresholds were τ* = 0.55 for the baseline and OPT-D and τ* = 0.65 for OPT-H, OPT-I, and the archived OPT-I v2 family.

4.2. Controlled NCA-Isolation Ablation

To isolate the NCA refinement head, we trained and evaluated a paired OPT-I v2 ablation using the same code base, seed 42, FIVES data, 200-epoch budget, and full-image tiled evaluation pipeline. The only architectural switch was model.nca.enabled. Both models were evaluated with D4 TTA and validation-selected thresholds τ*, so the comparison directly tests whether the NCA head improves the otherwise matched topology-oriented recipe.

The controlled ablation described in Table 7 shows that disabling NCA reduces test Dice from 0.8907 to 0.8813 and test clDice from 0.8518 to 0.8325. The clDice gain (+1.93 pp) is approximately twice the Dice gain (+0.94 pp), which is consistent with the intended role of NCA as a local connectivity-refinement module rather than only an overlap optimizer.

NCA wins on all 80 paired test images for both Dice @ τ* and clDice @ τ*, as shown by the results in Table 8. The confidence intervals are comfortably separated from zero, so this ablation removes the earlier ambiguity between NCA refinement and the rest of the topology-oriented loss recipe for this controlled configuration.

As shown in Table 9, the exploratory pathology-stratified analysis suggests that NCA is most useful in diseased eyes. Diabetic retinopathy cases show the largest clDice gain (+2.30 pp), while normal eyes show the smallest gain (+0.83 pp). This is consistent with the hypothesis that iterative refinement helps most where the vasculature is more disrupted, tortuous, or fragmented, but the subgroup results should be interpreted as supportive rather than definitive clinical evidence.

4.3. Inference Time Augmentation

Table 10 reports the optional D4 test time augmentation reference result for OPT-I v2 from the archived completed evaluation. This row is separated from the primary no-TTA comparison because TTA is an inference-time enhancement, and the matched baseline was not re-evaluated under the same procedure. By contrast, the controlled NCA isolation ablation in Section 4.2 uses the same full-image/D4-TTA/validation threshold pipeline for both NCA-enabled and NCA-disabled runs, so TTA is not a confound within that paired comparison.

4.4. Contextual Comparison with Published FIVES Results

To place the internal ablation results in a broader context, Table 11 summarizes published FIVES results from the 2024 benchmark by Fadugba et al. [15]. These rows are contextual rather than directly matched because preprocessing, resizing, loss functions, thresholding, and implementation details differ across studies, and retinal vessel rankings are known to be sensitive to protocol choices [14]. The primary internally matched comparison within the present manuscript is therefore Table 6, which contains no-TTA variants evaluated under the same archived pipeline.

The contextual comparison shows that the strongest benchmarked FIVES Dice values available from the 2024 shared protocol study are 0.9037 for FR-UNet and 0.9015 for U-Net under DiceBCE training [15]. Our strongest completed no-TTA model, OPT-I v2, reaches Dice 0.8899 under a different training and evaluation pipeline. Accordingly, the present revision does not claim state-of-the-art FIVES performance; it claims improvement over the tested Swin-UNet-style baseline and reports broader literature numbers for context only. SGAT-Net is cited because it evaluates FIVES, DRIVE, STARE, and CHASEDB1 [12], but its exact FIVES table values should not be reproduced unless verified directly from the full article’s table. EFDG-UNet is discussed in related work but is not included in the direct FIVES table because the published EFDG-UNet study reports DRIVE, CHASE_DB1, and STARE rather than FIVES [13].

4.5. HRF Ablation

Table 12 summarizes the complete HRF 100-epoch ablation. The sequence A->H covers the baseline, LFD gate, SPADE v1, PDE-SPADE v2, Hessian, Skeleton Recall, fractal BCE, and full non-NCA stack settings. The strongest Dice value is tied for by F, G, and H (0.8290), while the strongest AUC is tied for by G and H (0.9851); D already captures most of the gain over the baseline. Values in parentheses show relative change compared with configuration A.

The HRF ablation indicates that gating alone does not improve performance relative to the baseline, SPADE provides the first clear gain, and PDE-SPADE yields an additional but modest improvement. Later structural refinements provide smaller increments. Because some numerical differences are small, and the available package does not include variance estimates, these HRF deltas should be interpreted as directional ablation evidence rather than as proof of statistically robust ranking among the later variants.

4.6. Computational Complexity

Table 13 reports parameter counts and FLOPs measured on 384 × 384 input for the shared PDE-SPADE prior-conditioning backbone. Relative to the baseline, this pathway adds 76,889 parameters (+14.4%) and 12.0 G FLOPs (+36.2%). These values should be interpreted only as the overhead of the shared conditioning pathway. The controlled NCA ablation establishes the accuracy contribution of the NCA head, but NCA-specific latency, peak memory, and FLOP measurements were not finalized in the available package, so deployment-level efficiency claims remain bounded.

The computational overhead of the shared conditioning pathway is modest relative to the tested baseline, and the paired ablation now shows that NCA improves the OPT-I v2 topology-oriented result. A complete efficiency claim nevertheless requires profiling the NCA refinement head separately and comparing against external methods under a matched protocol.

5. Discussion

5.1. Interpretation of the Main Result

The main empirical finding is now fourfold. First, SPADE conditioning establishes the first clear gain over gating alone, and PDE-SPADE strengthens that gain. Second, the later full non-NCA OPT-H variant remains a strong 200-epoch reference point. Third, the topology-oriented OPT-I and OPT-I v2 variants produce the largest clDice gains within the tested family, with OPT-I v2 becoming the strongest completed no-TTA model on FIVES. Fourth, the controlled NCA/no-NCA paired ablation isolates the NCA head and shows that enabling it improves test Dice by +0.94 pp and test clDice by +1.93 pp under the matched 200-epoch protocol, with NCA winning on all 80 paired test images for both Dice @ τ* and clDice @ τ*. These findings support the hypothesis that geometric guidance is most useful when it modulates decoder features and when topology-aware components target centerline continuity. However, the current evidence still does not prove superiority to all published FIVES-capable methods under a matched external protocol.

5.2. Why PDE-SPADE Conditioning Helps

The fractal and PDE components play complementary roles. DBC exposes local self-similar structure, but the resulting LFD map is window-dependent and can contain local fluctuations caused by illumination variation or sparse vessel pixels. The learnable anisotropic diffusion PDE regularizes this fractal complexity field while preserving coherent transitions, so the decoder receives a smoother but still spatially informative prior. In this sense, the model does not use fractal dimension as a static handcrafted channel; it learns how the fractal prior should evolve before it is used for spatial conditioning.

This use of fractal geometry differs from post hoc biomarker extraction. The LFD map is generated before segmentation and interacts with decoder features during prediction. It summarizes local scale-dependent intensity structure—large trunks, intermediate branches, bifurcations, and sparse peripheral vessels—while the PDE regularizes this field before SPADE uses it as a spatial conditioning signal. In this sense, fractal geometry functions as an internal inductive bias rather than only as a measurement applied after the vessel mask has been produced.

SPADE then injects the refined prior at decoder skip connections, where anatomical detail is reconstructed [21]. The HRF ablation suggests that this combination is more effective than a gating-only mechanism, while the later E–H variants indicate that Hessian cues and additional structure-aware losses offer smaller refinements on top of the main PDE-SPADE effect. The paired NCA ablation further indicates that, when the rest of the topology-oriented recipe is held fixed, NCA refinement disproportionately improves clDice and paired connectivity-sensitive metrics relative to Dice alone.

5.3. Limitations

Several limitations remain. First, the quantitative evaluation covers a completed FIVES run family, a complete HRF ablation, and a controlled NCA/no-NCA paired ablation, but broader validation on DRIVE, CHASE_DB1, STARE, or formal cross-dataset protocols is still needed. Second, the manuscript provides contextual FIVES comparison values from the 2024 benchmark, but not matched reruns of FR-UNet, U-Net, MA-Net, SA-UNet, W-Net, or SGAT-Net with the present preprocessing and threshold selection pipeline. Third, the NCA ablation includes per-image paired tests, but multi-seed variability is still unavailable, so the statistical evidence is paired-within-seed rather than multi-seed. Fourth, although per-image predictions were materialized for the paired NCA/no-NCA analysis, finalized qualitative panels and error maps were not included in the current manuscript package, which limits visual interpretation of vessel continuity repair. Fifth, NCA-specific latency, peak VRAM, and FLOP measurements were not finalized, so deployment-level efficiency claims remain bounded. Sixth, the DBC window size and scale set were fixed throughout the study, and no ω = 0 or non-fractal-conductance ablation was completed; therefore, the specific contribution of fractal conductance modulation remains not fully isolated. Finally, the LFD prior should not be equated with physiological global FD biomarkers: it is a local, normalized, image-intensity descriptor used for conditioning, whereas clinical FD biomarkers are normally computed from segmented or skeletonized vascular trees.

6. Conclusions and Future Work

In summary, the FP-Swin-UNet family combines a single-channel local fractal dimension prior with PDE-SPADE skip conditioning in a fixed-capacity Swin-UNet-style backbone, with the OPT-I branch adding topology-oriented NCA refinement. On FIVES, the strongest completed no-TTA result was obtained by OPT-I v2 at 200 epochs (Dice 0.8899, clDice 0.8517, AUC 0.9904), while optional D4 TTA further improved the archived reference run at inference. The controlled NCA isolation ablation then shows that, with the same code, seed, data, 200-epoch budget, and evaluation pipeline, enabling NCA improves test Dice from 0.8813 to 0.8907 and test clDice from 0.8325 to 0.8518. NCA wins on all 80 paired test images for Dice @ τ* and clDice @ τ*, with one-sided Wilcoxon p < 1 × 10⁻⁴ and bootstrap confidence intervals excluding zero. Together with the HRF ablation, these findings support explicit fractal prior conditioning and topology-aware refinement as practical directions for retinal vessel segmentation, while broader external validation, qualitative evidence, matched reruns, and multi-seed reporting remain necessary before stronger comparative claims would be justified.

Future work should add qualitative error panels and failure cases from the materialized per-image predictions, measure runtime and memory under a fixed hardware budget, profile the NCA head separately, and extend the current paired evaluation to multi-seed runs. DRIVE, CHASE_DB1, and STARE evaluations should be added to assess cross-dataset generalization. A dedicated non-fractal conductance or ω = 0 ablation would also help isolate the role of the fractal modulation in the PDE branch. Other anisotropic diffusion schemes will also be considered [18,27]. Richer fractal priors should be revisited once they are supported by complete experiments. An important application domain of this retinal vessel segmentation framework is eye-based biometrics [28]. Public release of code, checkpoints, resolved configurations, manifests, materialized predictions, and evaluation scripts in an archival repository will further strengthen reproducibility and facilitate independent comparison.

Author Contributions

Conceptualization, L.A.M. and T.B.; methodology, L.A.M.; software, L.A.M.; validation, L.A.M.; formal analysis, L.A.M.; investigation, L.A.M.; writing—original draft preparation, L.A.M.; writing—review and editing, L.A.M. and T.B.; supervision, T.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The FIVES dataset is publicly available from the Scientific Data publication and the accompanying Figshare release [24]. The HRF dataset is publicly available at https://www5.cs.fau.de/research/data/fundus-images (URL accessed on 29 May 2026) [2]. The paired NCA/no-NCA ablation produced materialized test predictions and per-image metrics; these artifacts, together with code, resolved configurations, checkpoints, and evaluation scripts, are available from the corresponding author on reasonable request and are intended for deposition in an archival repository with the final submission package.

Acknowledgments

During the preparation of this manuscript, the authors used ChatGPT (OpenAI, GPT-5.5 Pro; accessed May 2026) to assist with language editing, structural revision, consistency checking, document formatting, and figure layout preparation. All AI-assisted outputs were reviewed and edited by the authors, who take full responsibility for the content of the manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Differential Box-Counting Prior Construction

The single-channel local fractal dimension prior is computed by differential box counting. Let I(x,y) denote the normalized intensity image. For each sliding window of size 32 and each box size s ∈ {2, 4, 8, 16}, the local box counts are accumulated from local extrema and then regressed in log–log space according to the reporting form below.

n_{r} (i, j, s) = f l o o r (\frac{M_{i j} - m_{i j}}{s / G}) + 1

(A1)

N_{r} (s) = \sum_{i, j} n_{r} (i, j, s)

(A2)

D = - \frac{d l o g N_{r} (s)}{d l o g s}

(A3)

Here, M_ijand m_ijdenote the local maximum and minimum intensities within box (i, j), floor(·) denotes integer flooring of the bracketed ratio, G = 256 is the gray-level count,

N_{r} (s)

is the accumulated box count at scale s, and D is the local fractal dimension estimate obtained from the log–log slope relation.

Two practical choices determine the interpretation of this prior. First, DBC is evaluated on the intensity surface rather than on a binary vessel mask. Second, normalization makes the map a relative structural cue within each image/tile. For this reason, the LFD values used by the network are treated as normalized conditioning signals; they should not be compared directly with clinical retinal FD measurements unless a separate vessel-tree FD pipeline is applied.

Appendix B. Learnable Fractal Anisotropic Diffusion and PDE-SPADE Implementation

The FractalAnisotropicDiffusion block is documented as a short, unrolled PDE-based module applied to the LFD prior. Its continuous reporting form follows Equation (4), with the conductance and curvature terms given by Equation (5). In implementation, the continuous update is discretized by finite differences and trained end-to-end through the segmentation loss.

For readability, the discrete update can be summarized as

u^{k + 1} (i, j) = u^{k} (i, j) + ∆ t (D_{E} + D_{W} + D_{N} + D_{S} - λ (u^{k} (i, j) - u^{0} (i, j)) + ψ^{k} (i, j),

where the D terms denote directional diffusion contributions.

Here

D_{E}, D_{W}, D_{N} a n d D_{S}

represent the directional conductances used inside the east, west, north, and south finite-difference diffusion terms; they are induced by the learned scalar parameters and local image/prior structure, not by dense learned parameter maps. The update is applied for T = 5 steps with Δt = 0.1, and the final field

u^{T}

is denoted by

\tilde{F}

in the main text. The learnable parameters of this nonlinear anisotropic diffusion-based model are explained in Table A1 and Table A2.

Table A1. Learnable parameters of the fractal anisotropic diffusion model.

Parameter	Role in the PDE Block	Effective Range/Constraint
α	Overall diffusion strength	(0.1, 2.0)
λ	Fidelity weight anchoring the evolved prior to u0	(0.01, 1.0)
σ	Gaussian smoothing scale for robust gradients	(0.5, 5.0)
β	Conductance scale controlling maximum diffusion	(0.1, 10.0)
ξ	Edge-threshold numerator in κ² = ξ/(η + ε)	(0.0, 5.0)
η	Edge-threshold denominator in κ² = ξ/(η + ε)	(0.0, 1.0)
ν	Curvature modulation strength	(0.0, 1.0)
γ	Curvature offset/gain	(0.1, 5.0)
ω	Coupling between LFD prior and diffusion conductance	(0.0, 1.0)

Table A2. Learned PDE scalar values at epoch 200 for the controlled OPT-I v2 NCA/no-NCA paired ablation. Both runs converge to similar values, suggesting that the learned anisotropic diffusion prior remains stable and largely orthogonal to the downstream NCA switch.

Parameter	Bounds	With NCA	No NCA	Δ
α (diffusion strength)	(0.1, 2.0)	1.3924	1.3721	+0.020
λ (fidelity weight)	(0.01, 1.0)	0.0356	0.0359	−0.000
σ (smoothing)	(0.5, 5.0)	3.4706	3.4306	+0.040
β (conductance scale)	(0.1, 10.0)	3.6803	3.6276	+0.053
ξ (edge-threshold numerator)	(0.0, 5.0)	3.5784	3.5113	+0.067
η (edge-threshold denominator)	(0.0, 1.0)	0.5280	0.5462	−0.018
ν (curvature modulation)	(0.0, 1.0)	0.4726	0.4806	−0.008
γ (curvature gain)	(0.1, 5.0)	3.0305	2.9797	+0.051
ω (fractal coupling)	(0.0, 1.0)	0.5784	0.5960	−0.018

The maximum absolute difference between the paired runs is approximately 0.067, and the learned fractal coupling coefficient remains non-trivial in both cases (ω = 0.5784 with NCA and 0.5960 without NCA). This supports the interpretation that the PDE branch learns a similar prior refinement behavior regardless of the downstream NCA switch.

After the PDE refinement, the SPADE-style conditioning equations are

\hat{X} = \frac{X - μ (X)}{σ (X) + ε}, Y = (1 + S (\tilde{F})) ⊙ \hat{X} + B (\tilde{F})

(A4)

In this notation, X is a skip feature map, μ(X) and σ(X) represent normalization statistics, S

(\tilde{F})

and B

(\tilde{F})

are learned spatial scale and bias maps, and Y is the conditioned feature map passed to the decoder. The final-layer initialization of the SPADE projections is identity-preserving, so the prior pathway is introduced gradually during training rather than forcing a large modulation at initialization.

Appendix C. Reproducibility and Reporting Boundaries

Reproduction checks matched the OPT-D FIVES metrics to 10 decimal places. The available project snapshot included the baseline family, OPT-D, OPT-F, OPT-H, OPT-I, the full HRF ablation configurations A–H, and the completed paired OPT-I v2 NCA/no-NCA ablation. The paired ablation materialized 80 per-image test predictions for each model and stored probability arrays, binary masks, and per-image metric JSON files, enabling the paired statistical tests reported in Section 4.2. Inference timing, peak-VRAM logs, NCA-specific complexity, and multi-seed summaries were not available.

Appendix C.1. Controlled NCA Ablation Validation and Training Details

The controlled NCA isolation ablation used two 200-epoch OPT-I v2 runs with the same seed (42), FIVES data, and evaluation pipeline. The NCA-enabled run used runs/opt_I_v2_revision/, whereas the matched no-NCA run used runs/opt_I_v2_no_nca_revision/. The validation split contained 80 images and was used to select τ* independently for each model before test evaluation. Table A3 reports the validation split aggregate metrics for the controlled OPT-I v2 NCA/no-NCA ablation.

Table A3. Validation split aggregate metrics for the controlled OPT-I v2 NCA/no-NCA ablation. Metrics are full-image tiled results with D4 TTA and validation-swept thresholds.

Metric	with NCA	No NCA	Δ (pp)
Dice @ τ*	0.9009	0.8926	+0.83
Dice @ 0.5	0.8989	0.8916	+0.74
clDice @ τ*	0.8672	0.8490	+1.82
Sensitivity @ τ*	0.8836	0.8734	+1.02
Specificity @ τ*	0.9937	0.9933	+0.05
Accuracy @ τ*	0.9855	0.9843	+0.12
F1 @ τ*	0.9009	0.8926	+0.83
AUROC	0.9919	0.9910	+0.10
τ*	0.70	0.65	—

Both paired runs stored 200-line pde_params.jsonl logs and comparable checkpoint sets. The final-batch loss components in Table A4 are consistent with the test-set findings: the NCA-enabled run has lower total, Dice, clDice, Skeleton Recall, BCE, and focal loss values. Artifact summary for the controlled paired NCA/no-NCA ablation is shown in Table A5.

Table A4. Final-batch training-loss components for the controlled OPT-I v2 NCA/no-NCA ablation.

Loss/Monitor	with NCA	No NCA
loss_total	0.1474	0.1665
loss_dice	0.0810	0.0888
loss_cldice	0.0469	0.0550
loss_skel_recall	0.0863	0.1072
loss_bce	0.0338	0.0387
loss_focal	0.0090	0.0100

Table A5. Artifact summary for the controlled paired NCA/no-NCA ablation.

Artifact	with NCA	No NCA
Run directory	runs/opt_I_v2_revision/	runs/opt_I_v2_no_nca_revision/
pde_params.jsonl	200 lines	200 lines
Checkpoints	14 .pt files	14 .pt files
Per-image test predictions	80 × {prob.npy, mask.png, metrics.json}	80 × {prob.npy, mask.png, metrics.json}
metrics_train.json	yes	yes
metrics.json	yes	yes
Approximate footprint	1.4 GB	1.4 GB

References

Staal, J.; Abramoff, M.D.; Niemeijer, M.; Viergever, M.A.; van Ginneken, B. Ridge-Based Vessel Segmentation in Color Images of the Retina. IEEE Trans. Med. Imaging 2004, 23, 501–509. [Google Scholar] [CrossRef] [PubMed]
Budai, A.; Bock, R.; Maier, A.; Hornegger, J.; Michelson, G. Robust Vessel Segmentation in Fundus Images. Int. J. Biomed. Imaging 2013, 2013, 154860. [Google Scholar] [CrossRef] [PubMed]
Cheung, C.Y.; Thomas, G.N.; Tay, W.T.; Ikram, M.K.; Hsu, W.; Lee, M.L.; Lau, Q.P.; Wong, T.Y. Retinal Vascular Fractal Dimension and Its Relationship with Cardiovascular and Ocular Risk Factors. Am. J. Ophthalmol. 2012, 154, 663–674.e1. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the MICCAI 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer Using Shifted Windows. In Proceedings of the ICCV 2021, Montreal, QC, Canada, 10–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-Like Pure Transformer for Medical Image Segmentation. In Proceedings of the ECCV 2022 Workshops; Springer: Cham, Switzerland, 2023; pp. 205–218. [Google Scholar] [CrossRef]
Huang, J.; Zhou, Y.; Luo, Y.; Liu, G.; Guo, H.; Yang, G. Representing Topological Self-Similarity Using Fractal Feature Maps for Accurate Segmentation of Tubular Structures. In Proceedings of the ECCV 2024; Springer: Cham, Switzerland, 2025; pp. 143–160. [Google Scholar] [CrossRef]
Mandelbrot, B.B. The Fractal Geometry of Nature; W.H. Freeman: San Francisco, CA, USA, 1982. [Google Scholar]
Plotnick, R.E.; Gardner, R.H.; Hargrove, W.W.; Prestegaard, K.; Perlmutter, M. Lacunarity Analysis: A General Technique for the Analysis of Spatial Patterns. Phys. Rev. E 1996, 53, 5461–5468. [Google Scholar] [CrossRef] [PubMed]
Sarkar, N.; Chaudhuri, B.B. An Efficient Differential Box-Counting Approach to Compute Fractal Dimension of Image. IEEE Trans. Syst. Man Cybern. 1994, 24, 115–120. [Google Scholar] [CrossRef]
Liu, W.; Yang, H.; Tian, T.; Cao, Z.; Pan, X.; Xu, W.; Jin, Y.; Gao, F. Full-Resolution Network and Dual-Threshold Iteration for Retinal Vessel and Coronary Angiograph Segmentation. IEEE J. Biomed. Health Inform. 2022, 26, 4623–4634. [Google Scholar] [CrossRef] [PubMed]
Lin, J.; Huang, X.; Zhou, H.; Wang, Y.; Zhang, Q. Stimulus-Guided Adaptive Transformer Network for Retinal Blood Vessel Segmentation in Fundus Images. Med. Image Anal. 2023, 89, 102929. [Google Scholar] [CrossRef] [PubMed]
Yang, Y.; Li, Y.; Wang, J.; Zhou, H.; Zhang, W.; Chen, X.; Luan, T.; Liu, W.; Ying, D. Enhanced Feature Dynamic Fusion Gated UNet for Robust Retinal Vessel Segmentation. Sci. Rep. 2026, 16, 3767. [Google Scholar] [CrossRef] [PubMed]
Kovács, G.; Fazekas, A. A New Baseline for Retinal Vessel Segmentation: Numerical Identification and Correction of Methodological Inconsistencies Affecting 100+ Papers. Med. Image Anal. 2022, 75, 102300. [Google Scholar] [CrossRef] [PubMed]
Fadugba, J.; Köhler, P.; Koch, L.; Manescu, P.; Berens, P. Benchmarking Retinal Blood Vessel Segmentation Models for Cross-Dataset and Cross-Disease Generalization. arXiv 2024, arXiv:2406.14994. [Google Scholar] [CrossRef]
Perona, P.; Malik, J. Scale-Space and Edge Detection Using Anisotropic Diffusion. IEEE Trans. Pattern Anal. Mach. Intell. 1990, 12, 629–639. [Google Scholar] [CrossRef]
Weickert, J. Coherence-Enhancing Diffusion Filtering. Int. J. Comput. Vis. 1999, 31, 111–127. [Google Scholar] [CrossRef]
Barbu, T. Digital Image Processing, Analysis and Computer Vision Using Nonlinear Partial Differential Equations; Springer Nature: Berlin/Heidelberg, Germany, 2025; Volume 1211. [Google Scholar]
Chen, Y.; Pock, T. Trainable Nonlinear Reaction Diffusion: A Flexible Framework for Fast and Effective Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1256–1272. [Google Scholar] [CrossRef] [PubMed]
Long, Z.; Lu, Y.; Ma, X.; Dong, B. PDE-Net: Learning PDEs from Data. arXiv 2018, arXiv:1710.09668. [Google Scholar] [CrossRef]
Park, T.; Liu, M.-Y.; Wang, T.-C.; Zhu, J.-Y. Semantic Image Synthesis with Spatially-Adaptive Normalization. In Proceedings of the CVPR, Long Beach, CA, USA, 16–20 June 2019; pp. 2332–2341. [Google Scholar] [CrossRef]
Shit, S.; Paetzold, J.C.; Sekuboyina, A.; Ezhov, I.; Unger, A.; Zhylka, A.; Pluim, J.P.W.; Bauer, U.; Menze, B.H. clDice—A Novel Topology-Preserving Loss Function for Tubular Structure Segmentation. In Proceedings of the CVPR 2021, Nashville, TN, USA, 20–25 June 2021; pp. 16560–16569. [Google Scholar] [CrossRef]
Kirchhoff, Y.; Rokuss, M.R.; Roy, S.; Kovacs, B.; Ulrich, C.; Wald, T.; Zenk, M.; Vollmuth, P.; Kleesiek, J.; Isensee, F.; et al. Skeleton Recall Loss for Connectivity Conserving and Resource Efficient Segmentation of Thin Tubular Structures. In Proceedings of the ECCV 2024; Springer: Cham, Switzerland, 2025; pp. 218–234. [Google Scholar] [CrossRef]
Kalkhof, J.; González, C.; Mukhopadhyay, A. Med-NCA: Robust and Lightweight Segmentation with Neural Cellular Automata. In Proceedings of the IPMI 2023; Springer: Cham, Switzerland, 2023; pp. 705–716. [Google Scholar] [CrossRef]
Lin, T.-Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal Loss for Dense Object Detection. In Proceedings of the ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar] [CrossRef]
Jin, K.; Huang, X.; Zhou, J.; Li, Y.; Yan, Y.; Sun, Y.; Zhang, Q.; Wang, Y.; Ye, J. FIVES: A Fundus Image Dataset for Artificial Intelligence Based Vessel Segmentation. Sci. Data 2022, 9, 475. [Google Scholar] [CrossRef] [PubMed]
Barbu, T. Novel Diffusion-Based Models for Image Restoration and Interpolation; Springer International Publishing: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
Barbu, T.; Ciobanu, A.; Luca, M. Multimodal biometric authentication based on voice, face and iris. In Proceedings of the 2015 E-Health and Bioengineering Conference (EHB), Iasi, Romania, 19–21 November 2015; pp. 1–4. [Google Scholar]

Figure 1. FP-Swin-UNet family overview. All reported variants use the single-channel LFD prior, learnable anisotropic diffusion, and a SPADE skip conditioning pathway. OPT-I additionally enables an NCA topology refinement stage after the decoder to prioritize centerline connectivity.

Figure 2. Representative example of prior construction. (a) Input fundus image. (b) Ground-truth vessel mask. (c) Raw local fractal dimension (LFD) prior computed by differential box counting using window size 32 and box sizes {2, 4, 8, 16}. The normalized LFD map is the single conditioning channel used in the reported model.

Table 1. Backbone summary of the studied FP-Swin-UNet configuration. The optional NCA topology refinement stage is enabled in OPT-I after the decoder output.

Stage	Input Ch.	Output Ch.	Depth	Resolution
Stem	3	32	—	H × W
Encoder-1	32	32	1	H × W
Encoder-2	32	64	1	H/2 × W/2
Encoder-3	64	128	1	H/4 × W/4
Decoder	128	32	mirrored	H × W

Table 2. Prior generation and injection pipeline used in the studied model.

Stage	Operation	Output
Raw prior	Differential box counting LFD; window 32; scales {2, 4, 8, 16}	One-channel map
Refinement	Learnable fractal anisotropic diffusion PDE; T = 5 explicit Euler steps; Δt = 0.1; nine sigmoid-bounded scalars	PDE-refined prior
Injection	SPADE-style scale/bias modulation at decoder skip connections	Conditioned skip features

Table 3. Base loss terms and later topology/complexity extensions used across the reported experiments.

Component/Loss Term	Weight/Scope	Purpose
Dice	1.0	Region overlap
Binary cross-entropy	1.0	Pixel-wise discrimination
Focal loss [21]	0.1	Hard-example emphasis
clDice [18]	0.25	Topological consistency
Skeleton Recall [19]	later variants	Optional structure-aware recall term
Fractal BCE	later variants	Optional complexity-weighted BCE term
NCA refinement [20]	OPT-I/OPT-I v2	Iterative local topology refinement of the predicted vessel map

Table 4. Datasets used in this study.

Dataset	Images	Resolution	Protocol in This Study
FIVES [24]	800	2048 × 2048	Recommended 600/200 split; primary benchmark
HRF [2]	45	3504 × 2336	Ablation benchmark

Table 5. Configuration values used in the reported experiments.

Parameter	Value Used in Study
Backbone	Swin-UNet-style; embed dim = 32; depths = (1, 1, 1)
Prior channels	1 (LFD prior; refined internally by PDE before SPADE conditioning)
DBC window size	32
DBC box sizes	{2, 4, 8, 16}
PDE refinement	Fractal anisotropic diffusion of LFD prior; 9 sigmoid-bounded scalars; T = 5 explicit Euler steps; Δt = 0.1
Core OPT-D loss	Dice + BCE + 0.1 Focal + 0.25 clDice
Optional later-stage components	Hessian cue; Skeleton Recall; fractal BCE; full non-NCA stack (OPT-H); NCA topology refinement (OPT-I)
FIVES variants reported	Baseline 200 ep; OPT-D 100 ep; OPT-F 100 ep; OPT-H 200 ep; OPT-I 100 ep; OPT-I v2 200 ep; OPT-I v2 + D4 TTA; paired OPT-I v2 NCA/no-NCA ablation (200 ep, seed 42).
HRF ablation family	A–H, each 100 epochs
Metrics	Dice, clDice, and AUC
Threshold for overlap metrics	Validation-selected τ* values were selected independently per model on the validation split; no test-set threshold tuning. Older archived runs store τ* values but not the full sweep grid. Paired NCA ablation used τ* = 0.70 with NCA and τ* = 0.65 without NCA
Profiling input size	384 × 384
Deterministic check	OPT-D FIVES metrics reproduced to 10 decimal places
Available evidence	Saved configs, checkpoints, profiler outputs, evaluation metrics, pde_params.jsonl logs, and materialized per-image test predictions for paired NCA/no-NCA ablation
Unavailable in archive	Final qualitative figure panels, inference-time measurements, peak VRAM, NCA-specific latency/FLOPs, and multi-seed summaries
NCA topology refinement	Enabled in OPT-I/OPT-I v2 after decoder output. A paired 200-epoch no-NCA ablation with the same seed, data, and evaluation pipeline is now reported in Section 4.2

Table 6. Completed no-TTA quantitative evaluation on the FIVES test set. Values in parentheses show the relative improvement over the matched 200-epoch Swin-UNet baseline for each metric.

Method	Epochs	Dice (Δ%)	clDice (Δ%)	AUC (Δ%)
Swin-UNet baseline	200	0.8643 (—)	0.8125 (—)	0.9856 (—)
FP-Swin-UNet OPT-D	100	0.8749 (+1.23%)	0.8266 (+1.74%)	0.9885 (+0.29%)
FP-Swin-UNet OPT-F	100	0.8763 (+1.39%)	0.8284 (+1.96%)	0.9884 (+0.28%)
FP-Swin-UNet OPT-H	200	0.8863 (+2.55%)	0.8367 (+2.98%)	0.9902 (+0.47%)
FP-Swin-UNet OPT-I	100	0.8820 (+2.05%)	0.8419 (+3.62%)	0.9891 (+0.36%)
FP-Swin-UNet OPT-I v2	200	0.8899 (+2.96%)	0.8517 (+4.82%)	0.9904 (+0.49%)

Table 7. Controlled OPT-I v2 NCA-isolation ablation on FIVES. Aggregate metrics are full-image tiled results with D4 TTA and validation-swept thresholds. Test split results are the primary reporting values.

Split/Metric	With NCA	No NCA	Δ (pp)
VAL Dice @ τ*	0.9009	0.8926	+0.83
VAL clDice @ τ*	0.8672	0.8490	+1.82
VAL AUROC	0.9919	0.9910	+0.10
VAL τ*	0.70	0.65	—
TEST Dice @ τ*	0.8907	0.8813	+0.94
TEST Dice @ 0.5	0.8899	0.8811	+0.88
TEST clDice @ τ*	0.8518	0.8325	+1.93
TEST Sensitivity @ τ*	0.8711	0.8585	+1.25
TEST Specificity @ τ*	0.9937	0.9934	+0.04
TEST Accuracy @ τ*	0.9852	0.9840	+0.12
TEST F1 @ τ*	0.8910	0.8816	+0.94
TEST AUROC	0.9904	0.9893	+0.12

Table 8. Per-image paired statistics on the FIVES test split (n = 80). Confidence intervals are bootstrap 95% intervals over paired per-image differences using 5000 resamples; p-values are one-sided Wilcoxon signed-rank tests.

Metric	NCA Mean	No-NCA Mean	Δ Mean (pp)	95% CI (pp)	Wins NCA/No/Tie	Wilcoxon p
Dice @ τ*	0.8783	0.8687	+0.96	[+0.83, +1.11]	80/0/0	<1 × 10⁻⁴
Dice @ 0.5	0.8776	0.8685	+0.90	[+0.74, +1.08]	78/2/0	<1 × 10⁻⁴
clDice @ τ*	0.8680	0.8523	+1.58	[+1.35, +1.84]	80/0/0	<1 × 10⁻⁴
clDice @ 0.5	0.8672	0.8490	+1.82	[+1.57, +2.11]	80/0/0	<1 × 10⁻⁴
Sensitivity	0.8608	0.8499	+1.09	[+0.94, +1.26]	79/1/0	<1 × 10⁻⁴
Specificity	0.9937	0.9933	+0.04	[+0.04, +0.06]	72/8/0	<1 × 10⁻⁴
Accuracy	0.9855	0.9843	+0.12	[+0.11, +0.13]	80/0/0	<1 × 10⁻⁴
F1	0.8783	0.8687	+0.96	[+0.83, +1.11]	80/0/0	<1 × 10⁻⁴
AUROC	0.9811	0.9791	+0.20	[+0.14, +0.26]	78/2/0	<1 × 10⁻⁴

Table 9. FIVES pathology-stratified paired NCA ablation on the test split. Values are grouped by filename pathology code.

Pathology	n	NCA Dice	No-NCA Dice	Δ Dice (pp)	NCA clDice	No-NCA clDice	Δ clDice (pp)
D = Diabetic Retinopathy	21	0.8527	0.8398	+1.29	0.8309	0.8079	+2.30
G = Glaucoma	22	0.8435	0.8326	+1.09	0.8306	0.8138	+1.68
A = AMD	15	0.8946	0.8845	+1.01	0.8895	0.8744	+1.51
N = Normal	22	0.9265	0.9216	+0.49	0.9263	0.9181	+0.83

Table 10. Optional D4 test time augmentation for OPT-I v2. Relative improvements are computed against the no-TTA OPT-I v2 row.

Method	Epochs	Dice	ΔDice	clDice	ΔclDice	AUC	ΔAUC
FP-Swin-UNet OPT-I v2 + D4 TTA	200	0.8926	+0.30%	0.8596	+0.93%	0.9909	+0.05%

Table 11. Contextual published FIVES comparison. Benchmark values are from Fadugba et al. [15] using their shared FIVES protocol; FP-Swin-UNet values are from the present manuscript and are not protocol-matched to the benchmark rows.

Method	Source/Protocol	FIVES Dice/DSC	Notes
U-Net	Fadugba et al. 2024 [15]	0.9015	DiceBCE setting; official FIVES split
FR-UNet	Fadugba et al. 2024 [15]	0.9037	DiceBCE setting; official FIVES split
MA-Net	Fadugba et al. 2024 [15]	0.8997	DiceBCE setting; 0.9005 under clDice loss
SA-UNet	Fadugba et al. 2024 [15]	0.8655	DiceBCE setting; official FIVES split
W-Net	Fadugba et al. 2024 [15]	0.8565	DiceBCE setting; official FIVES split
FP-Swin-UNet OPT-I v2	This study	0.8899	No-TTA, completed 200-epoch test result

Table 12. Complete HRF ablation across the eight 100-epoch configurations (A–H). Relative changes are computed against configuration A.

Configuration	Description	Dice (Δ%)	AUC (Δ%)
A	Baseline	0.8253 (—)	0.9836 (—)
B	+LFD Gate	0.8251 (−0.02%)	0.9836 (0.00%)
C	+SPADE v1	0.8279 (+0.32%)	0.9844 (+0.08%)
D	+PDE SPADE v2	0.8287 (+0.41%)	0.9845 (+0.09%)
E	+Hessian	0.8283 (+0.36%)	0.9849 (+0.13%)
F	+Skel Recall	0.8290 (+0.45%)	0.9849 (+0.13%)
G	+Fractal BCE	0.8290 (+0.45%)	0.9851 (+0.15%)
H	Full non-NCA stack	0.8290 (+0.45%)	0.9851 (+0.15%)

Table 13. Computational profile of the baseline and the shared FP-Swin-UNet PDE-SPADE conditioning pathway

Method	Params	FLOPs (384 × 384)	ΔParams	ΔFLOPs
Swin-UNet baseline	532,993	33.2 G	—	—
FP-Swin-UNet shared PDE-SPADE backbone	609,882	45.2 G	+14.4%	+36.2%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Murgu, L.A.; Barbu, T. PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone. Appl. Sci. 2026, 16, 5559. https://doi.org/10.3390/app16115559

AMA Style

Murgu LA, Barbu T. PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone. Applied Sciences. 2026; 16(11):5559. https://doi.org/10.3390/app16115559

Chicago/Turabian Style

Murgu, Lucian Alexandru, and Tudor Barbu. 2026. "PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone" Applied Sciences 16, no. 11: 5559. https://doi.org/10.3390/app16115559

APA Style

Murgu, L. A., & Barbu, T. (2026). PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone. Applied Sciences, 16(11), 5559. https://doi.org/10.3390/app16115559

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

PDE-Refined Local Fractal Dimension Prior Conditioning and Topology-Aware Refinement for Retinal Vessel Segmentation with a Swin-UNet-Style Backbone

Abstract

1. Introduction

2. Related Work

2.1. Retinal Vessel Segmentation with CNN and Transformer Backbones

2.2. Fractal Priors for Tubular Structure Segmentation

2.3. PDE-Based Refinement and Spatially-Adaptive Conditioning

2.4. Topology-Aware Supervision

3. Materials and Methods

3.1. Network Architecture

3.2. Local Fractal Dimension Prior Construction

3.3. Learnable Fractal Anisotropic Diffusion and PDE-SPADE Conditioning

3.4. Training Objective

3.5. Datasets, Experimental Protocol, and Reproducibility

3.5.1. Datasets

3.5.2. Configuration Details

3.5.3. Reproducibility and Reporting Boundaries

4. Results

4.1. Main Quantitative Results on FIVES

4.2. Controlled NCA-Isolation Ablation

4.3. Inference Time Augmentation

4.4. Contextual Comparison with Published FIVES Results

4.5. HRF Ablation

4.6. Computational Complexity

5. Discussion

5.1. Interpretation of the Main Result

5.2. Why PDE-SPADE Conditioning Helps

5.3. Limitations

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Differential Box-Counting Prior Construction

Appendix B. Learnable Fractal Anisotropic Diffusion and PDE-SPADE Implementation

Appendix C. Reproducibility and Reporting Boundaries

Appendix C.1. Controlled NCA Ablation Validation and Training Details

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI