1. Introduction
Airway segmentation plays a pivotal role in pulmonary interventional procedures by enabling the precise extraction of morphological metrics—such as the airway wall thickness and luminal diameter variations—which serve as critical biomarkers for the diagnosis and assessment of disease severity in conditions including chronic obstructive pulmonary disease (COPD), asthma, and bronchiectasis [
1,
2,
3]. In the context of surgical planning and real-time navigation, accurate delineation of the bronchial tree allows clinicians to localize anatomical landmarks with confidence, thereby markedly improving the procedural success rates and patient safety [
4,
5]. Furthermore, the quantitative analysis of airway parameters facilitates the longitudinal monitoring of disease progression and the design of personalized therapeutic regimens, rendering airway segmentation an indispensable tool in modern interventional pulmonology.
ATM’22 (Airway Tree Modeling 2022) was an open challenge held in conjunction with MICCAI 2022 that aimed to provide a large-scale, multi-site, and multi-domain benchmark and a unified evaluation platform for pulmonary airway (airway tree) segmentation and modeling [
6]. The challenge released 500 expert-annotated chest CT scans, sourced from multiple institutions, including cases with COVID-19-related lesions. The results from the challenge indicate that deep-learning methods that explicitly preserve topology and/or specifically handle small airway branches achieve better performance in detecting and maintaining airway tree continuity. However, the majority of airway segmentation algorithms presented in ATM’22 are limited to the delineation of airways no smaller than the third generation (as defined by the standard tracheobronchial branching scheme), leaving finer and more peripheral bronchi beyond their current reach.
To date, no segmentation methods have been specifically developed for the small airways. Small airways are defined as those with an diameter of <2 mm, typically beginning at approximately the eighth generation of branching and comprising portions of both the conducting airways (responsible for air transport) and the respiratory bronchioles (involved in gas exchange) [
7]. Often referred to as the “quiet zone” of the lungs, these distal airways may exhibit functional impairment in the early stages of disease that remains clinically silent, undetectable in terms of symptoms and conventional pulmonary function tests.
The small size and intricate branching of the small airways render their segmentation on CT images particularly challenging; however, their accurate delineation holds clinical importance in the diagnosis and management of pulmonary disease. In chronic obstructive pulmonary disease (COPD), small-airway involvement—quantified via impulse oscillometry (IOS)—has been reported in 60–74% of patients [
8]. Likewise, in asthma, small-airway pathology affects approximately 50–60% of individuals with the disease [
9]. Beyond COPD and asthma, lesions of the small airways also contribute to the pathogenesis and progression of other respiratory disorders—such as pulmonary fibrosis, bronchiectasis, and idiopathic interstitial pneumonias—and are associated with sustained declines in lung function and poorer clinical outcomes.
Current mainstream algorithms remain incapable of reliably segmenting the fine distal airways. The luminal diameters of these bronchi are comparable to the spatial resolution (slice thickness) of CT imaging, producing pronounced partial volume effects [
10], and are further obscured by the elevated noise levels inherent in low-dose CT scans [
11]. Additionally, the irregular variability in airway wall thickness [
10] and the intricate highly branched topology of the bronchial tree [
11] exacerbate segmentation uncertainty. Challenges such as extreme class imbalance [
12], vanishing gradients and feature attenuation, and the failure to preserve airway connectivity (leading to discontinuities or omissions) further hinder fine airway delineation. This inability to accurately segment the small airways can directly result in missed lesions, thereby undermining the accuracy of pulmonary pathology assessment and the precision of subsequent interventional planning.
To address these prevailing challenges in airway segmentation, we propose a synergistically framework for airway delineation, designed to enhance the detection of delicate distal bronchial structures.
The framework focuses on explicit segmentation: the explicit branch leverages a multi-decoder architecture to enrich feature representation. Collectively, this approach aspires to yield a comprehensive airway model that achieves high segmentation sensitivity.
Building upon this framework, we introduce a Voxel-Selective Supervision (VSS) mechanism to mitigate supervisory uncertainty arising from terminal airway omissions and class imbalance. VSS dynamically quantifies voxel-wise uncertainty via cross-entropy to establish a hierarchy of easy and difficult samples and employs a spatial-distance-based weighting strategy, thereby directing the model to prioritize discriminative features along the delicate bronchial margins. During early training, the network focuses on stable low-uncertainty voxels and progressively shifts attention toward more uncertain regions, effectively guiding the model to “actively explore” potential airway branches—even when such predictions manifest as apparent false positives under incomplete annotations.
We conducted a systematic evaluation of our proposed method on publicly available pulmonary airway segmentation datasets, using current state-of-the-art approaches as baseline comparators. The experiments not only confirm the superiority of our synergistic optimization framework in preserving semantic fidelity and structural continuity, but also underscore the pivotal role of the VSS mechanism in enhancing the recognition of fine bronchial branches.
In summary, the proposed approach delivers a harmonized advancement across architectural design, training strategy, and empirical validation, offering an effective and broadly applicable solution for high-precision airway modeling.
2. Related Works
Accurate CT-based airway segmentation is critically important for the early detection and quantification of airway-centric pathologies—such as COPD, asthma, and bronchiectasis—where involvement of the small airways often precedes overt clinical symptoms and conventional pulmonary function deficits. Consequently, current research efforts have converged on three principal challenges: preserving the topology preservation, reliably detecting fine peripheral bronchi, and ensuring robust performance across diverse pathological presentations. The 2009 EXACT’09 Airway Segmentation Challenge first illuminated these difficulties [
13]: among fifteen automated algorithms, the best achieved only 74% completeness, while most averaged around 62%, with distal bronchi particularly prone to omission and fragmentation. This seminal benchmark galvanized the community to pursue methods capable of faithfully reconstructing both major and minor airway branches without sacrificing connectivity.
The ATM’22 Airway Tree Modeling Challenge further propelled advances in the field by benchmarking automated methods on a diverse set of pathological CT scans, thereby driving improvements in both small-airway detection and topological fidelity [
6]. The ATM’22 dataset comprised 500 CT volumes encompassing complex pathologies—such as COPD, pulmonary nodules, and fibrosis—and was evaluated using metrics including the total airway length coverage (tree-length detected rate; TD), branch detection rate (branch detected rate; BD), volumetric Dice coefficient, false-positive rate (FPR), and the topology error rate. The top-performing DTPDT (Differentiable Topology–Preserved Distance Transform) algorithm achieved approximately 91% TD and 90% BD [
14], whereas the average results for the remaining participants were 78% TD, 75% BD, and an FPR of around 10%.
2.1. CNN-Based Airway Segmentation
The advent of deep convolutional neural networks (CNNs) revolutionized medical image segmentation. Ronneberger et al. (2015) introduced the U-Net architecture [
15]—a symmetric encoder–decoder network with skip connections—that demonstrated precise segmentation even when training data are scarce. Its three-dimensional extensions, including 3D U-Net [
16] and V-Net [
17], further extended this paradigm to volumetric data, enabling end-to-end learning of complex anatomical structures with high fidelity.
Specifically for airway segmentation, Charbonnier et al. (2017) trained a 3D CNN to detect and correct “leakage” [
18] artifacts in baseline airway masks. By teaching the network to recognize and excise false positives that penetrated lung parenchyma, they markedly increased the specificity while preserving the overall branch detection sensitivity.
To enhance the detection of fine bronchial branches, multi-scale and cascade designs have been proposed. In 2019, Qin and colleagues developed AirwayNet, which incorporates voxel-connectivity awareness into a 3D CNN to explicitly encourage the prediction of a continuous airway tree [
19]; by penalizing disconnected predictions, AirwayNet achieved improved topology preservation and rescued otherwise overlooked distal branches.
In parallel, Zhao et al. employed a two-stage strategy: a 2D CNN first generated slice-wise airway probability maps, which were then integrated by a 3D CNN and refined through a linear-programming-based path-tracing algorithm to enforce connectivity [
20]. This hybrid 2D–3D cascade not only yielded more coherent tree structures but also enabled generation-level classification of bronchial branches, highlighting the power of staged architectures combined with global optimization.
Meanwhile, Wang et al. (2019) introduced a spatial fully connected network that leverages a radial-distance loss to directly learn the tubular geometry of airways. By predicting a distance transform rather than a binary mask, their approach inherently preserves the cylindrical morphology and continuity of airway segments, offering another effective avenue for topology-aware segmentation [
21].
In addition, recent studies have begun to focus on AI-generated annotations and pseudo-label learning strategies to alleviate the high cost and inconsistency of pixel-level manual annotation. Previous work has shown that model-generated pseudo labels can significantly reduce reliance on manual labeling while maintaining or even improving segmentation performance, thereby enhancing annotation efficiency and scalability. Such advances are expected to substantially improve the efficiency of image segmentation workflows in future research [
22,
23,
24].
2.2. Attention Mechanisms and Feature Recalibration
The integration of attention mechanisms and feature recalibration techniques into convolutional neural networks marked another significant breakthrough in airway segmentation. Qin et al. (2020) proposed AirwayNet-SE, a model that fuses multi-scale contextual information with a squeeze-and-excitation (SE) attention module to enhance the saliency of subtle airway structures during segmentation [
25]. In a related work, the same group introduced an attention distillation strategy to impart “fine-bronchus sensitivity” to the network, explicitly guiding it to amplify responses to small-caliber airway pathways [
26]. These attention-based approaches recalibrate feature maps and steer the network’s focus toward faint peripheral branches, significantly improving the detection sensitivity under resolution-limited conditions.
Similarly, Zhou et al. (2021) developed a multi-scale context-enhanced U-Net that aggregates coarse-to-fine semantic features, enabling more complete reconstruction of airway trees even in cases with complex pathological alterations [
27]. In a complementary direction, Nadeem et al. (2021) proposed a novel “freeze-and-grow” framework that combines deep learning with classical region-growing algorithms [
28]. Their method first uses a CNN to “freeze” correctly segmented regions—preventing leakage—and then iteratively grows additional branches in an alternating CNN–propagation manner, striking a compelling balance between false positives and false negatives.
These innovations—spanning 3D network architectures, cascade designs, and attention-enhanced modules—have collectively introduced a qualitative leap in airway segmentation. Modern approaches can now delineate dozens of bronchial generations from standard chest CT scans, far surpassing early algorithms that reliably captured only the third or fourth generations.
2.3. Strategies for Addressing Class Imbalance
With the advancement of deep learning, there has been a growing demand for loss functions tailored to the extreme class imbalance inherent in airway segmentation, wherein small distal airways are vastly outnumbered by the surrounding background. Originally proposed for dense object detection, the focal loss introduced in Lin et al. (2017) [
29] was later adapted for segmentation tasks to address this issue. By down-weighting the contribution of well-classified voxels and emphasizing the misclassified ones, focal loss effectively guides the model to focus on under-segmented fine-scale airway structures.
Building on this principle, Zheng et al. (2020) introduced the Generalized Unified Loss (GUL) [
12], which dynamically balances gradient contributions from both large and small airways through distance-based weighting. This strategy mitigates the optimizer’s tendency to favor easily segmentable large bronchi and instead promotes learning from smaller more challenging branches. By eliminating scale-dependent gradient disparities, the GUL improves peripheral airway detection without compromising the accuracy of central airway segmentation.
Overall, loss function design remains an active and critical area of research, as achieving branch-complete segmentation requires the network to learn from extremely sparse positive signals associated with the smallest airways. This challenge necessitates carefully crafted objective functions that can amplify weak supervision signals and ensure that even the most distal branches are faithfully captured during training.
2.4. Topology Preservation of the Bronchial Tree
A long-standing challenge in airway segmentation is maintaining the topology preservation of the bronchial tree, ensuring that all predicted branches remain connected to the trunk and to one another in a coherent structure. One prominent approach to address this issue involves the use of centerline prediction or multi-task learning frameworks that jointly infer airway masks and skeletons. By simultaneously predicting the airway centerline and segmentation mask, neural networks are encouraged to produce continuous tubular structures. For instance, Selvan et al. (2020) integrated a mean-field conditional random field with a graph neural network in the post-processing stage to “snap” disconnected components into a connected graph, leveraging learned graph optimization techniques to enhance connectivity [
30].
Another effective strategy is the incorporation of topology-preserving loss functions. Shit et al. (2021) proposed the centerline Dice (clDice) loss, which directly rewards topologically correct predictions by measuring the overlap between predicted and ground-truth skeletons [
31]. This differentiable skeletal objective ensures that omissions of fine branches contribute to the loss, thereby encouraging the network to pursue even the most distal airways. Similarly, the radial distance loss introduced by Wang et al. (2019) provides a topology-aware supervision signal by regressing the distance from airway boundaries to their centerlines, enabling the model to learn and preserve tubular continuity implicitly [
21].
Zheng et al. [
32] further introduced a Local Imbalance-based Weighting scheme and a Backpropagation-based Weight Enhancement strategy to reinforce topological completeness during training. To explicitly suppress disconnection errors, Nan et al. [
33] and Yu et al. [
34] incorporated discontinuity-sensitive regularization terms that penalize fragmented predictions. Given the global influence of structural diversity on distance maps [
35,
36], Zhang et al. [
14] and Yu et al. [
34] proposed convolutional distance transforms (CDT) and geodesic distance transforms, respectively, to avoid topological fragmentation.
Collectively, state-of-the-art airway segmentation methods now move beyond voxel-wise accuracy to explicitly optimize for anatomical plausibility and structural connectivity through skeleton-guided learning, graph-based optimization, and topology-aware loss formulations, minimizing branch discontinuities and yielding highly coherent airway tree reconstructions.
3. Methods and Materials
In summary, two key challenges persist in pulmonary airway segmentation. First, mainstream medical-imaging datasets typically omit annotations for the finest airways (branching depth beyond the eighth generation, diameter <2 mm) [
7], thereby impeding models from learning accurate representations of these delicate anatomical branches. Second, conventional encoder–decoder architectures tend to fracture or distort the topology of minute tubular structures during their downsampling–upsampling operations.
This study proposes a collaboratively optimized framework for pulmonary airway segmentation, designed to address the principal challenges in delineating fine bronchial structures. The segmentation module employs a dual-decoder architecture and introduces a voxel-selection supervision strategy to intensify the model’s sensitivity to minute airway branches. This encourages the exploration of unannotated fine bronchi, which may manifest in the evaluation metrics as controlled false positives. Such false positives do not necessarily reflect erroneous model predictions. Previous studies have reported that airway segmentation algorithms often detect anatomically plausible peripheral branches that are missing in the reference annotations, and thus are penalized as false positives when evaluated against an incomplete ground truth. These observations suggest that such detected but unannotated structures are more likely due to the incompleteness of the annotation set rather than true algorithmic errors [
5,
37].
3.1. Segmentation Module
This section presents a detailed account of the segmentation model’s network architecture, the Voxel-Selective Supervision strategy, and the combined loss function. The overall framework is illustrated in
Figure 1.
In our segmentation network, the original CT volume
is first cropped into a sub-volume of size
, which is then encoded into multi-scale features by a grouped-convolutional encoder and subsequently decoded by two parallel decoders: one with conventional upsampling convolutions and the other with dynamic serpentine convolutions, producing segmentation probability maps and distance maps, respectively:
where
represents the outputs of the upsampling decoder
, and
represents the outputs of the serpentine-convolution decoder
.
The two decoders share the encoder’s feature representations but follow independent upsampling paths to produce distinct segmentation probability maps and distance-transform maps. A consistency loss
enforces alignment between their feature spaces. The annotated airway mask is skeletonized to derive the ground-truth distance-transform map (DTM), which is then compared against the decoder’s predicted DTM via the regression loss
. Finally, the segmentation probability maps are supervised against the manual labels using the voxel-selective supervision strategy, yielding the loss
.
denotes the ground-truth annotation,
the skeletonization operator, and
the distance-transform operation.
The supervision loss
, and the regression loss
are defined as follows:
The challenge of accurately segmenting airways arises in part from gradient erosion and voxel neighborhood dilation during training [
12] and is exacerbated by missing annotations in these fine branches. Training with imperfect labels introduces erroneous supervision, leading to suboptimal solutions and limiting the effectiveness of topology-enhancement strategies.
To mitigate the gradient erosion in shallow layers, we supply complementary gradient flows. To address the model’s insensitivity caused by unannotated terminal bronchi, we propose a voxel-selective supervision strategy. By quantifying voxel uncertainty via cross-entropy, we dynamically select the subset of voxels to supervise and adaptively balance easy and difficult samples during training, initially focusing on confidently labeled voxels and then gradually incorporating high-uncertainty (i.e., under-labeled) voxels to correct omitted regions. Concurrently, we apply spatial distance-based weighting to assign higher importance to fine airways adjacent to the background, thereby enhancing the boundary sensitivity. Cross-entropy-based selection of high-uncertainty voxels forces the model to explore potential airway structures—even if these predictions would be considered false positives under existing annotations—while the distance weights emphasize boundary voxels in the loss, encouraging airway extensions. To avoid early-stage noise, difficult samples are introduced later in training, enabling the model to complete missing branches in under-annotated regions. Consequently, the resulting false positives correspond to plausible extensions of the airway tree rather than erroneous predictions, improving the fine-branch segmentation accuracy and preserving topological coherence.
To prevent erroneous predictions on unselected voxels, we employ a dual-decoder architecture to enforce consistency between outputs. Inspired by WingsNet [
12], we partition the convolutional blocks of Decoder 1 into distinct groups and apply auxiliary supervision at the group level rather than on individual blocks. Within each group, multi-scale features are aggregated via grouped convolutions and a feature pyramid: the output of each convolutional block is first passed through a lightweight
convolution, then upsampled, and finally concatenated to form the group’s combined representation.
In the segmentation of tubular structures—such as blood vessels or road networks—standard deformable convolutions can suffer from excessive offset magnitudes that displace the kernel away from the target. To address this, a mechanism is required to constrain offset learning so that adjustments follow the trajectory of the tubular structure. Accordingly, we integrate Dynamic Snake Convolution (DSConv) [
38] into
. DSConv is a specialized convolutional operation for tubular segmentation whose core idea is to iteratively accumulate and refine offsets, deforming the kernel to conform precisely to the target’s geometric contours.
In the dual-decoder architecture, employs a group-supervision mechanism: the network is partitioned into multiple groups, and a group-wise feature pyramid is constructed, wherein lightweight convolutions fuse multi-scale features. This design mitigates gradient erosion and explosion in shallow layers while preserving the deep network’s representational power, thereby effectively addressing the class imbalance. integrates DSConv, which iteratively constrains the kernel offsets so that the receptive field extends continuously along the tubular target’s geometric trajectory, enhancing the local adaptability to curved slender structures.
The proposed encoder–dual-decoder architecture not only guarantees gradient stability and the semantic coherence of multi-scale features at the global level but also precisely captures low-contrast high-curvature details of airway branches at the local level. Built upon an efficient inference paradigm, this framework markedly enhances both the accuracy and topological continuity of fine airway segmentation, providing an effective solution for pulmonary airway delineation.
3.2. Voxel-Selective Supervision
Fully supervised training paradigms can yield strong performance in airway-segmentation tasks; however, missing annotations of peripheral bronchi introduce bias into the supervision signal, undermining the model’s capacity to discern small airway branches and to preserve the topology preservation of the airway tree. To overcome this, we propose a dynamic voxel-selection strategy based on the cross-entropy between prediction probabilities and ground-truth labels, which enables progressive optimization by concentrating on difficult regions. As illustrated in
Figure 2, this strategy employs a threefold weighting mechanism to bolster the segmentation of fine branches.
First, the shortest Euclidean distance from each voxel to the background is computed to obtain 1 (Weight 1), which enhances the model’s sensitivity to fine branches and improves its ability to discriminate airway boundaries.
where
denotes the weight assigned to voxel
,
represents its shortest Euclidean distance to the background, and
is the maximum distance within the sample. To prevent
from becoming zero, the scaling factor is set to
.
Subsequently, the voxels to be supervised are determined based on the cross-entropy values computed between the predicted probability map from the decoder and the ground-truth labels:
where
denotes the cross-entropy value at voxel location
, and
C represents the total number of classes.
and
denote the ground-truth label and predicted probability for class
C at voxel
, respectively. The value
captures the uncertainty between the predicted probabilities and the ground truth. Higher values of
indicate a larger discrepancy between prediction and annotation, typically corresponding to missing labels in fine branches or ambiguous boundary regions.
Based on this, we introduce a voxel-level selection strategy to obtain
, which determines the subset of voxels in the spatial domain
D to be supervised by the loss function. During training, a dynamic threshold
is used to control the proportion of hard samples selected in each epoch, allowing the model to focus on confidently labeled voxels in the early stages and progressively incorporate more challenging samples as training advances:
where
t denotes the current training epoch, and
represents the proportion of selected voxel samples. Convolutional neural networks tend to prioritize learning from easily identifiable voxel samples in the early stages of training, while later they may overfit to mislabeled or missing annotations caused by low contrast [
39,
40]. To mitigate this, the threshold
is initially set close to 1 and gradually decreases with the training progression. In addition,
is chosen as an empirical value between 0 and 1 to control the introduction of harder samples. Specifically, as training proceeds, more difficult samples (e.g., those with missing annotations or low contrast) are progressively introduced to ensure that the model can handle these challenging parts. Early in training, the model focuses on easily recognizable voxels, while with further training it begins to process more difficult examples, thereby improving robustness to fine details and complex structures. Through validation experiments, we adjust
to ensure a balanced learning of easy and hard samples during training.
Finally, for the selected voxel set
, supervision is applied using a weighted cross-entropy loss:
where
W denotes the spatially varying weights assigned to each voxel. The cross-entropy-based voxel-selective supervision strategy enables the model to dynamically adjust its focus throughout training. This not only alleviates the overfitting caused by incomplete annotations but also enhances the model’s sensitivity to fine airway structures through the incorporation of geometric priors. However, because cross-entropy as an uncertainty metric cannot rigorously distinguish between truly missing airways and regions of noise or artifacts, the exploration mechanism of the VSS module relies on empirical assumptions and may result in incorrect selections. In future work, we plan to incorporate more sophisticated uncertainty modeling techniques to enhance this discriminative capability.
3.3. Combined Loss
Although the cross-entropy-based voxel-selective supervision enhances the prediction accuracy in challenging regions, it supervises only a subset of selected voxels, leaving unsupervised areas potentially susceptible to prediction bias. To address this limitation, we introduce a dual-decoder architecture and impose consistency constraints on the dual-decoder through feature alignment using the Kullback–Leibler (KL) divergence, thereby improving the overall robustness of the predictions.
where
and
denote the class-
c probability values at the voxel location
predicted by the dual-decoder, respectively.
Distance Transformation (DTM) quantifies the geometric relationship between each voxel and the airway surface, significantly enhancing the model’s ability to recognize complex airway structures. The Signed Distance Field (SDF) assigns a specific distance value to each voxel
x within the pulmonary airway:
where
O denotes the airway exterior,
S the airway surface, and
I the airway interior. To enhance the model’s sensitivity to airway boundaries, we adopt an exponentially weighted distance regression loss inspired by [
41]:
where
is a scaling factor that causes the weighting function to exponentially concentrate the model’s attention on the airway boundary regions, i.e., where the assigned weight for
is maximal. The value of the parameter
was determined through empirical tuning on the validation set, with the aim of increasing the model’s focus on the boundary regions of tubular structures by assigning greater weight to errors at these boundaries in the loss function. The distance loss function is defined as follows:
where
N denotes the total number of voxels within the domain
,
represents the ground-truth signed distance values, and
denotes the predicted signed distance values from the model. By modulating the weighting function
, boundary voxels receive higher gradient weights compared to interior regions, facilitating the model’s ability to address challenges such as blurred boundaries in low-contrast areas and discontinuities in fine branches. This promotes segmentation results that more accurately preserve the tubular anatomical characteristics of the airway.
Ultimately, the composite loss function
is formulated as a weighted sum of the segmentation loss, the DTM regression loss, and the KL consistency loss, mathematically expressed as
where
,
, and
are balancing hyperparameters. The parameters
(including
,
, and
) were used to balance the contributions of the consistency loss, SDF regression loss, and segmentation loss in the final loss function. Their specific values were likewise determined through empirical tuning on the validation set to achieve a reasonable trade-off among the individual loss components. Through this design, the model is enabled to actively explore sparsely annotated regions while maintaining the geometric continuity of complex branching structures.
3.4. Implementation Details
The joint operational mechanism and synergy between VSS, the dual decoders, and the SDF constraint during training are presented in Algorithm 1. VSS dynamically selects and weights hard voxel samples to ensure that the model progressively corrects overlooked fine branches throughout the training process. The dual-decoder architecture employs a consistency loss to reinforce agreement between the outputs of the two decoders, thereby enhancing the robustness of the model. Meanwhile, the SDF constraint introduces a weighted distance regression loss that increases the model’s sensitivity to airway boundaries, leading to improved segmentation accuracy for fine airway structures.
| Algorithm 1 Joint Training: VSS, Dual-Decoder, and SDF Constraints |
- 1:
Input: - 2:
X: CT volume image - 3:
Y: ground truth label - 4:
D: distance transform - 5:
Output: - 6:
: Optimized model parameters - 7:
Step 1: Input Processing and Encoding - 8:
- 9:
- 10:
Step 2: Dual-Decoder Decoding - 11:
- 12:
- 13:
Step 3: Voxel Selection via VSS - 14:
Compute voxel uncertainty based on cross-entropy - 15:
Compute dynamic threshold - 16:
Select voxels based on uncertainty - 17:
Compute spatial weights W (using distance transform) - 18:
Step 4: Loss Calculation - 19:
Segmentation loss with VSS-weighting: - 20:
SDF regression loss: - 21:
Consistency loss: - 22:
Step 5: Total Loss and Optimization - 23:
Combine all losses: - 24:
Backpropagate and update parameters: - 25:
return: Updated model parameters
|
4. Experiments and Results
4.1. Dataset Description
In this study, we utilized an internally collected dataset comprising 53 pig CT scans along with corresponding manual annotations. The dataset was divided into training, validation, and test sets in a ratio of 41:6:6, ensuring balanced distributions of anatomical structures and image quality across subsets.
Scanning equipment and parameters: Imaging was performed using an X-ray computed tomography (CT) scanner (SOMATOM go.Fit; Siemens Healthineers, Shanghai, China).
Imaging parameters: Tube voltage (kVp): 110; Slice thickness: 1 mm; Pixel spacing: [0.78471875, 0.78471875]; Image dimensions: .
Sedation and anesthesia protocol: Initial sedation was achieved by intramuscular injection of Zoletil (3 mg/kg) and Atropine Sulfate (0.08 mg/kg). Once the animal was calm, the surgical area was prepared and an intravenous catheter was placed in the marginal ear vein for slow infusion of physiological saline and propofol until the animal reached a state suitable for endotracheal intubation. Depending on the animal’s physiological condition, vasoactive drugs such as epinephrine, norepinephrine, or dopamine were administered as needed to maintain stable physiological parameters. After successful induction and intubation, mechanical ventilation was maintained with set respiratory parameters under general anesthesia. Intraoperative electrocardiographic monitoring was conducted via limb leads, invasive arterial blood pressure was monitored through an arterial line placed via central venous catheterization, and oxygen saturation was continuously measured using a tongue pulse oximeter.
Ventilator: WATO EX-20VET (Shenzhen Mindray Bio-medical Electronics Co., Ltd., Shenzhen, China).
Monitoring equipment: Veterinary monitor model im8vet (EDAN Co., Ltd., Shenzhen, China), used for real-time monitoring of vital signs, oxygen saturation, electrocardiography, and other key physiological metrics.
4.2. Data Preprocessing
During data preprocessing, three cropping strategies were adopted to enhance the diversity of the training samples. First, sliding-window cropping was applied around the trachea region to ensure adequate coverage of the target structure. Second, based on the annotated airway diameters, small airway regions were identified and subjected to additional cropping, thereby increasing the proportion of small-airway samples and improving the model’s sensitivity to such structures. Finally, random cropping was performed over the entire volume to introduce further variability. These three cropping strategies were integrated in the training set with a ratio of 7:2:1.
To comprehensively evaluate the segmentation performance, the following metrics were adopted:
where
P denotes the predicted segmentation, and
G denotes the ground truth. A higher Dice score indicates a larger overlap between the prediction and the ground truth.
where
and
represent the number of true positives and false negatives, respectively. Sensitivity measures the proportion of correctly detected trachea voxels relative to all ground-truth trachea voxels.
where
and
denote the coordinates of the
i-th predicted and ground-truth terminal points, respectively, and
N is the total number of terminal points.
where
denotes the
j-th sampled point on the predicted branches,
denotes sampled points on the ground-truth branches, and
M is the total number of sampled points.In addition, TD measures the proportion of the predicted airway centerline length that is correctly detected relative to the total centerline length of the ground truth airway tree, and BD measures the proportion of correctly detected airway branches relative to the total number of branches in the ground truth. A predicted branch is considered correctly detected only if at least 80% of its centerline voxels lie within the corresponding ground truth branch [
6].
where
denotes the total length of non-anatomical false branches predicted by the model. This metric reflects the proportion of spurious airway connections in the segmentation results. Leakages refers to the erroneous extension of the segmentation into adjacent non-airway regions in areas where airway boundaries are blurred, leading to predicted airways that do not reflect true anatomical structures and reducing overall segmentation accuracy. This phenomenon is particularly common in low-contrast peripheral bronchi. Clinically, such leakage can negatively impact airway structure visualization, navigation, and quantitative analysis, thereby reducing the reliability of automated airway models in diagnostic and intraoperative planning applications [
6].
4.3. Comparative Evaluation with State-of-the-Art Models
To validate the effectiveness of the proposed AirwaySeekNet model for lung airway segmentation, we compared it with several state-of-the-art 3D medical image segmentation models on the same test set, including WingsNet [
12], UNet3D [
16], VNet3D [
17], VoxResNet [
42], AttentionUNet [
43], CoTAttentionUNet3D [
44], FuzzyAttentionUNet3D [
33], and DSCNet [
38]. The evaluation metrics covered Dice, sensitivity, TD, BD, and leakages. The experimental results are presented in
Table 1. And, compared with the DSCNet, the TD metric increased by 5.55% and the BD metric increased by 8.14%. The qualitative segmentation results for each model are illustrated in
Figure 3.
To further assess the generalization ability and effectiveness of the proposed method, we conducted comparative experiments on the publicly available Binary Airway Segmentation (BAS) dataset [
25]. Our method was quantitatively compared with several representative approaches, including those of Juarez et al. [
45], AirwayNet [
19], FRNet [
26], and WingsNet [
12]. The BAS dataset comprises 90 CT cases with pixel spacing and slice thickness both below 1 mm, of which 70 cases are from the LIDC [
46] dataset and 20 cases are from the training set of the EXACT’09 Challenge [
13]. The data were randomly split into training, validation, and test sets at a ratio of 5:2:2.
The results in
Table 2 indicate that the proposed method achieves a sensitivity of 93.79%, demonstrating its improved ability to detect airway structures, especially small and peripheral branches. For the topology-oriented metrics reflecting airway tree completeness and connectivity, the Tree length detected rate (TD) and Branch detected rate (BD) of the proposed method reach 91.97% and 89.88%, respectively, outperforming the other four methods. Although the Dice score is comparable with other approaches, the consistent improvements observed across multiple metrics indicate that the proposed method provides stable and reliable segmentation performance, thereby validating its effectiveness and robustness across different data distributions. In addition, experimental evaluation shows that the segmentation module achieved an average inference time of approximately 4.2842 s per single CT volume input.
As shown in the results, most methods perform similarly in terms of Dice and sensitivity. However, AirwaySeekNet significantly outperforms all comparative models in TD and BD, indicating its superiority in capturing complex bronchial bifurcations and maintaining topological connectivity. At the same time, AirwaySeekNet exhibits relatively higher leakages, reflecting the model’s tendency to explore smaller airways, which may lead to some false positives. This aligns with the presence of mislabeling in the small branch regions of the dataset. Overall, AirwaySeekNet ensures that the Dice remains unaffected while substantially improving the topological and branching preservation, making it more suitable for clinical applications.
4.4. Hardware and Software Configuration
A consistent hardware and software environment was maintained throughout all experiments, including the same type of GPU (NVIDIA A100 GPU with 80 GB memory), the same version of the deep learning framework (PyTorch v2.4.0), and the same operating system version (Ubuntu 24.04). The visualization and software tools used in this study included Windows 11, Intel Core Ultra 7 155H processor, and 32 GB of RAM. The Visualization Toolkit (VTK) was employed for three-dimensional rendering and visualization, and the Medical Imaging Interaction Toolkit (MITK) was used for loading, viewing, and annotating medical images.
4.5. Tubular Constraints Based on SDF Loss
To further enhance the structural consistency of small airway regions, we incorporated SDF loss into the Weighted Entropy Select (WES) loss framework. The SDF loss applies topological constraints to the predicted results through the distance field, better preserving the tubular structure. The loss function includes two hyperparameters to balance the weight of different constraint terms.
The experimental results are presented in
Table 3. After introducing the SDF loss (where the SDF loss is defined as
;
is a parameter within
, and
denotes the weight of the SDF loss), the model achieved a significant improvement of 3.64% in TD and 6.21% in BD, while the Dice score remained essentially stable. The segmentation results became more consistent with the true tubular morphology of the airways. When the weight
of the SDF loss was further increased, TD and BD reached their optimal values, with additional improvements of 4.20% and 5.36%, respectively. However, the leakages also rose sharply by 6.06% compared to the results before introducing the SDF loss. These findings indicate that the SDF loss effectively enhances the structural preservation of fine bronchial branches, although an excessively strong constraint may introduce more false positives. Therefore, the appropriate tuning of hyperparameters is essential to achieve an optimal balance between topology preservation and leakages.
4.6. Dual Decoder Architecture and WES Loss Introduction Strategies
AirwaySeekNet adopts a dual-decoder architecture. Initially, we followed the WingsNet design, but its use of a conventional convolutional decoder proved inadequate for the complex highly variable tubular geometry encountered in lung airway segmentation, thereby limiting the exploration and recognition of small airways. To remedy this, we replaced one of the decoders with a Snake Convolution decoder to strengthen the network’s capacity for modeling tubular features. During training, we investigated two annealing strategies: inter-epoch annealing, which operates across epochs, and inner-epoch annealing, which operates within the iterations of an epoch. Both strategies use a dynamic threshold
as the proportional parameter for annealing.
dynamically modulates the learning trajectory, helping to avoid entrapment in local optima and thereby facilitating more effective model convergence. The experimental results are shown in
Table 4.
As shown in the table, under the WingsNet dual-decoder structure, the model trained with the epoch strategy achieved Dice, TD, BD, and leakages of 95.90%, 85.32%, 80.48%, and 3.61%, respectively. In contrast, the inter strategy achieved 95.91%, 86.95%, 82.11%, and 3.45%, respectively. These results indicate that introducing WES loss using the inter strategy leads to higher TD and BD values and lower leakages compared with the epoch strategy. After incorporating the Snake Convolution, the model using the epoch strategy achieved 94.87%, 90.44%, 88.14%, and 7.91% for the Dice, TD, BD, and leakages, respectively, while the inter strategy yielded 95.04%, 91.37%, 88.69%, and 7.89%. The TD and BD further improved, whereas the Dice slightly decreased, and the leakages increased noticeably compared with the model without Snake Convolution. This suggests that the Snake Convolution enhances the model’s ability to capture fine bronchial branches but also introduces more false positives. Moreover, we observed that the introduction of WES loss not only achieves superior performance but also leads to faster convergence, demonstrating that this strategy provides a more advantageous training approach in practice.
4.7. Summary
Based on the above experimental results, the following conclusions can be drawn. Overall, AirwaySeekNet significantly outperforms existing methods in terms of topological and branching preservation, enabling a more complete recovery of lung airway structures. The introduction of SDF loss further constrains the model’s structural consistency, aligning the results more closely with tubular features, although a reasonable trade-off between topology preservation and false positives must be made when tuning hyperparameters. The dual-decoder architecture combined with Snake Convolution strengthens the model’s ability to capture tubular features, while the inter introduction of WES loss offers superior performance and faster convergence compared to the epoch introduction. In summary, AirwaySeekNet significantly enhances the topology preservation while maintaining stable Dice performance, demonstrating its unique advantages and potential for lung airway segmentation tasks.
5. Discussion and Conclusions
In conclusion, this work presents AirwaySeekNet, a comprehensive solution for fine grained airway segmentation and completion that specifically targets the distal bronchi beyond the eighth generation. The proposed framework is built on a dual decoder architecture with a dedicated explicit segmentation branch, yielding an airway model with high segmentation fidelity and full anatomical continuity. The key to this design is the integration of Voxel Selective Supervision, a dynamic reliability aware training strategy that addresses class imbalance and incomplete annotations by gradually focusing the learning process on uncertain hard to segment voxels. This mechanism effectively guides the network to “actively explore” potential airway branches that might be missing in the ground truth, without being misled by early false positives. Together, the dual decoder architecture and VSS strategy constitute the core technical contributions of AirwaySeekNet, collectively geared toward capturing the smallest airway structures with high confidence.
Advancing Topological Continuity in Airway Segmentation: Topology preservation is critically important in airway segmentation due to the complex tree structure of the bronchial network and its relevance in clinical applications. Unlike voxel level overlap metrics, topological connectivity directly reflects whether the segmented airway tree is continuously reconstructed, which is essential for clinical tasks such as bronchoscopy navigation and quantitative airway assessment. Public benchmarks, such as the Multi site, Multi domain Airway Tree Modeling (ATM’22) challenge, establish tree length detected rate (TD) and branch detected rate (BD) as key metrics for evaluating the completeness and connectivity of airway segmentation, emphasizing the need to maintain structural continuity across different data sources and scanning protocols [
6].
The experimental results show that AirwaySeekNet substantially improves both tree-length detected (TD) and branch detected (BD) compared to existing approaches, indicating its enhanced ability to reconstruct a more complete airway network. Importantly, these topological gains do not come at the expense of voxel overlap accuracy. The Dice similarity coefficients remain comparable to those of other methods, as shown in
Table 1. This balance is significant in practical terms, demonstrating that the model can extend segmentation into peripheral branches while preserving the structural preservation of central airway segments.
While a modest increase in leakage is observed during topology enhancement, this behavior primarily reflects the network’s aggressive exploration of fine or previously unannotated airway segments. By employing dynamic supervision and carefully tuned loss weighting, AirwaySeekNet achieves a favorable trade off between retrieving true airway structures and limiting excessive over segmentation.
Future work: In future work, we will focus on integrating segmentation and shape based optimization within a unified end to end training and inference framework to further enhance topological continuity and overall robustness. Additionally, although the current method has been validated on porcine airway data as a preparatory step for surgical robot integration, the next stage involves full deployment on the robotic platform, integration with surgical control software, and conducting in vivo animal experiments to further assess performance in real interventional contexts.
Under the current hardware configuration, the model’s inference time is satisfactory and suitable for preoperative segmentation, as there is sufficient time to complete segmentation once CT images are obtained. An AI workstation equipped with an NVIDIA GeForce RTX 4090 has been deployed alongside our interventional surgical robot system, demonstrating technical feasibility. However, given the high cost and limited suitability of the RTX 4090 for embedded deployment, we will explore more optimized hardware integration solutions. To improve applicability in low-resource environments, we have already carried out research on model inference with lower-performance GPUs and CPUs toward more lightweight and practical implementations [
47,
48].