1. Introduction
Abdominal aortic aneurysms (AAAs) are defined as abnormal dilations of the abdominal aorta measuring 3.0 cm or more in diameter. This condition arises when the aortic wall weakens, leading to a localized bulging that can progressively enlarge over time. The prevalence of AAAs increases with age, particularly affecting individuals over 60 years old, and is more common in men and those with a history of smoking [
1].
The risk of aneurysm rupture increases with size, but even aneurysms below the accepted clinical intervention cut-off of 5.5 cm can pose a significant danger. Aneurysms between 4.5 and 4.9 cm have shown an average expansion rate of 0.7 cm per year, and ruptures have been reported in patients with aneurysms as small as 5.0 to 5.6 cm [
2]. These findings highlight the importance of close monitoring and early intervention, even for aneurysms below the traditional surgical threshold. Traditionally, the maximum transverse diameter (MTD) has been the standard measurement for evaluating AAAs [
3,
4]. However, this linear measurement may not fully capture the complex three-dimensional nature of aneurysms. Relying on a single diameter measurement can overlook shape irregularities, aneurysm length/tortuosity, and other geometric factors. In fact, diameter is considered a “rough and ready” measure and shows poor ability to detect certain shape changes or small growth increments, underscoring its limitations in reflecting the aneurysm’s true three-dimensional expansion [
4,
5]. In addition to diameter, advanced biomechanical criteria, such as stress and strain analysis, have been proposed to enhance AAA evaluation and rupture risk assessment. Peak wall stress (PWS), a biomechanical parameter, quantifies the maximum stress on the aneurysm wall under blood pressure, and studies show that it is a more reliable rupture risk indicator than diameter, with higher PWS observed in ruptured or symptomatic AAAs compared to electively repaired ones [
6,
7]. Finite element analysis (FEA) uses patient-specific 3D geometry from CT scans, incorporating wall thickness, material properties, and pressure to compute stress distributions, revealing high-stress regions prone to rupture [
6,
8]. Strain analysis, measuring deformation, complements this by indicating how the wall stretches under load, with 3D ultrasound and CT-based methods tracking wall motion across the cardiac cycle to produce strain maps that highlight areas of excessive deformation, signaling rupture risk [
9]. The rupture potential index (RPI), a ratio of wall stress to wall strength, integrates 3D geometry from CT scans, FEA-derived stress, and estimated wall strength, showing elevated values in symptomatic and ruptured AAAs, offering a nuanced prediction of rupture risk [
6,
10]. However, these methods face challenges in clinical adoption, requiring accurate 3D reconstructions, precise wall thickness measurements (limited by CT resolution), and patient-specific material properties, which are hard to obtain non-invasively [
8].
Recent studies suggest that volumetric assessments provide a more comprehensive evaluation of aneurysm morphology and may better predict clinical outcomes. For example, a systematic review indicated that volume measurements could offer a more accurate assessment of aneurysm growth compared to diameter measurements [
4]. Large-scale analyses from the M2S database reveal that post-repair sac enlargement occurs in up to 40% of thoracic and abdominal aneurysms, a risk strongly tied to preoperative anatomy that diameter measurements often underestimate [
11]. By accounting for the entire aneurysm sac, volume captures changes along the length and width of the aneurysm that a single diameter cannot. Accordingly, adding volume measurements to diameter has been shown to yield additional information on AAA growth, improving the characterization of expansion over time [
3,
4]. Despite these findings, volumetric analysis has not been widely adopted in clinical practice, partly due to the lack of standardized measurement protocols and the labor-intensive nature of manual volume calculations. Measuring volume often requires manual or semi-automated image segmentation, which demands specialized software and expertise and can take significant time, making it impractical for busy clinics [
4,
12].
Advancements in CT technology (e.g., helical CT and multi-slice scanners) have made imaging both fast and high-resolution. Helical CT permits much quicker scan times (on the order of seconds, often in a single breath-hold) than older CT modalities, virtually eliminating motion artifacts. These CTA techniques provide a combination of speed and detail, producing rapid, high-definition images of the AAA and its surrounding structures. Such 3D CTA images clearly show aneurysm size, shape, and involvement of branch vessels, offering an unprecedented anatomical overview compared to traditional 2D angiography. Notably, CTA is also less invasive than catheter angiography and simultaneously allows for evaluation of other abdominal pathology during the same scan [
13]. However, the interpretation of these images often relies on manual analysis by radiologists, which is time-consuming and subject to inter-observer variability.
To address these challenges, there is a growing interest in developing automated methods for aneurysm detection and measurement. Artificial Intelligence (AI) and deep learning algorithms have shown promise in healthcare, specifically accurately analyzing vascular structures [
14,
15,
16,
17,
18,
19,
20,
21]. For instance, recent research demonstrated the use of deep learning techniques to automate the measurement of vascular calcifications, highlighting the potential applicability of such methods to AAA assessment [
22,
23,
24]. Automated approaches can potentially reduce the workload of clinicians, minimize human error, and provide consistent measurements, thereby enhancing the reliability of aneurysm evaluation.
Several recent studies have successfully applied deep learning to the segmentation and volume estimation of abdominal aortic aneurysms, reporting high spatial and volumetric accuracy. For example, the PRAEVAorta framework [
25] achieved a Dice score around 0.95 and Pearson coefficient correlation of 0.90 using a dataset of 100 annotated cases (13,465 slices). Similarly, [
26] developed a 3D U-Net model using 78 annotated data and reported Dice scores of 0.87 and relative volume errors around 8.6% when compared to manual ground truth. Most recently, a 2024 study [
27] applying nnU-Net to EVAR follow-up data achieved Dice scores up to 0.97 using 220 scans with manual segmentations. Despite these promising results, a major limitation across these studies is their reliance on extensive pixel-level annotations or fully labeled 3D segmentations. These annotations are time-consuming and labor-intensive to create, posing challenges for scalability in real-world clinical workflows. In contrast, our study proposes a fully automated approach for aneurysm boundary detection and volume estimation, without requiring manual segmentations for training. This design significantly reduces the annotation burden while maintaining competitive performance. A broader challenge in the field is the scarcity of large, annotated datasets critical for training robust AI models. This lack of data impedes progress and adoption of AI solutions. Our study addresses this challenge by introducing an alternative that requires no manual segmentations, making it broadly applicable to medical imaging tasks where annotated data are limited or unavailable.
In summary, while traditional methods of aneurysm assessment have relied heavily on manual measurements of maximum diameter, emerging evidence supports volumetric analysis as having the potential to further improve accurate evaluation of aneurysm size and growth, complementing traditional methods. The integration of advanced imaging techniques and automated measurement tools holds promise for improving the detection, monitoring, and treatment planning of AAAs. However, the successful implementation of these technologies in clinical settings will require overcoming challenges related to data availability, standardization of measurement protocols, and validation of AI algorithms. The resources for this paper can be found at
https://github.com/pip-alireza/automated_aneurysm_analysis/ (accessed on 14 May 2025).
2. Materials and Methods
2.1. Data Description and Annotation
The dataset used in this study consists of de-identified computed tomographic angiography (CTA) scans from 60 patients with AAAs, obtained from the M2S Vascular Imaging Database (West Lebanon, NH, USA). M2S provides imaging analysis services and serves as the core lab for multiple aneurysm research studies [
11,
28]. For this work, they supplied fully de-identified DICOM series, including both preoperative and postoperative scans. All study design, analysis, and interpretation were conducted independently of M2S.
The dataset is predominantly composed of male patients (95%), with ages ranging from 55 to 90 years, and a majority around the 65–75-year range. The scans were acquired using systems from multiple manufacturers, primarily GE Medical System (Chicago, IL, USA), Toshiba (Tokyo, Japan), and Siemens (Munich, Germany), which contributed the majority of cases. The slice thickness varied across the dataset, with most values between 0.5 mm and 3 mm, and pixel spacing was mostly within the 0.6–0.9 mm range.
To generate ground truth for aneurysm boundaries, patient scans were manually annotated by an expert in arterial mechanics. The manual annotation process involved identifying the specific slice indices corresponding to the onset and conclusion of the aneurysm, based on morphological changes, primarily variations in cross-sectional diameter, as shown with number 1 and 2 in
Figure 1.
2.2. Data Preprocessing
CTA images were first converted from DICOM to TIFF files and normalized to an 8-bit grayscale format using linear contrast stretching (range: 10–245) and NumPy library version 1.24.13. To be compatible with segmentation models, grayscale TIFF images were duplicated across three channels. For the LSTM model, the segmented pixel counts per slice (output from SAM2) were extracted and formatted into 1D sequences per patient scan. This sequence of information can be used for visual inspection of the aneurysm and as the primary information to be used for the development of an aneurysm detection method.
2.3. Deep Learning Approach for Automated Aneurysm Analysis
We pursued two complementary strategies for aneurysm detection. The first relied on transfer learning from our earlier arterial-system segmentation studies [
18,
19,
22]. By re-deploying a UNet trained solely on normal anatomy, we could screen new scans that lack aneurysm annotations: In normal slices the model segments the vessel as expected, but it stops segmenting whenever the aortic wall becomes irregular, effectively flagging those gaps as potential disease. Interpreting this absence of segmentation as an aneurysm cue, and by applying simple rule-based logic, we could predict the presence of the aneurysm, although this transfer-learning approach does not provide volume measurements.
The second and main strategy consisted of utilizing an ensemble of tools or multi-system approach. This approach utilized the Segment Anything Model 2 (SAM2) developed by Meta (Menlo Park, CA, USA) [
29] in conjunction with our previous UNet model [
19,
22]. This approach starts with UNet providing the initial segmentation of the aorta, then SAM2 tracks the vessel through every slice, even across irregular regions. Boundaries are identified either by a rule-based expert system or by a bidirectional LSTM that learns temporal patterns in the pixel-count signal. Once boundaries are set, slice integration provides aneurysm volume, and linear interpolation between boundary slices estimates the normal baseline for enlargement quantification. In testing, the expert-system variant delivered the most accurate volumes and a strong Dice overlap, while the LSTM version achieved slightly lower but still solid agreement. Compared with the transfer-learning screen, this pipeline adds boundary placement and volumetric measurements. The architecture comprises the following components:
Aorta Localization (UNet):
A pre-trained UNet model with an encoder–decoder structure and a ResNet-34 backbone, segments and localizes the aorta from CTA images [
22,
30]. Despite its limitations when encountering pathological variations like AAA, the UNet reliably identifies normal regions of the thoracic aorta, providing initial slice numbers and center coordinates that serve as input prompts for subsequent models.
Aorta Tracking (SAM2):
The Segment Anything Model 2 (SAM2) employs a transformer-based streaming memory mechanism to track the aorta continuously from the prompted region down to the iliac bifurcation. It begins with a prompt provided by the UNet model, which identifies a region of interest within the aorta. SAM2 then refines this segmentation by searching for regions exhibiting similar characteristics to the prompted area, effectively accounting for morphological changes along the vessel. The model outputs both segmentation masks and the count of segmented pixels per slice, facilitating detailed analysis of the aorta’s structure.
Aneurysm Boundary Identification:
Two complementary approaches are used:
LSTM-Based Aneurysm Detection: To identify the beginning and end points of the aneurysm within the aorta, we developed a Long Short-Term Memory (LSTM) model that analyzes sequential changes in aortic cross-sectional area. From the SAM2 outputs, we collected the number of segmented pixels per slice, effectively forming a 1D sequence representing area changes across axial slices. The model architecture comprises:
Two stacked bidirectional LSTM layers, each with 600 hidden units, designed to capture forward and backward temporal dependencies in the sequence.
A series of fully connected layers with ReLU activation, mapping LSTM outputs to slice-wise classification scores across 200 output neurons, covering the region from the thoracic aorta to the iliac bifurcation.
A sigmoid-activated output layer, producing per-slice probabilities between 0 and 1.
A fixed threshold to classify each slice as aneurysmal or normal, enabling start and end slice identification.
Expert Rule-Based System: This component implements a sliding-window analysis that computes the average segmented pixel count over a fixed number of preceding slices to the output of SAM2. If a slice’s value exceeds 120% (for the start) or drops below 80% (for the end) of the running average consistently across the window, it is flagged as an anomaly corresponding to aneurysm boundaries. We performed a grid search to optimize the window size and the upper and lower threshold parameters for best performance.
2.4. Evaluation Metrics
Because no single statistic captures every aspect of model performance, we evaluated our pipelines with a suite of complementary measures. The UNet and SAM2 components were adopted exactly as trained in prior work, UNet on non-aneurysmal aorta [
19,
22] and SAM2 as distributed by Meta [
29], whereas the LSTM that refines aneurysm boundaries was trained from scratch on our dataset. Five-fold cross-validation ensured generalizability: in each fold, 80% of the patients were used to train the LSTM and 20% to test it, with strict patient-level separation to avoid data leakage. Training relied on the Adam optimizer, learning-rate 0.0003, batch-size 10, and binary-cross-entropy loss combined with a Jaccard term; early stopping on validation loss curtailed over-fitting.
Overlap quality: The spatial accuracy of each predicted boundary was assessed using the Dice coefficient, as defined in Equation (1). The Dice score, also known as the F1 score, is computed by doubling the intersection of the predicted and ground truth masks and dividing by the sum of their areas. In this context, it evaluates how well the predicted boundary aligns with the annotated boundary. Specifically, the Dice score uses true positives (
), representing the correctly predicted overlapping regions between the predicted and ground truth masks, false positives (
), indicating regions incorrectly predicted as part of the boundary, and false negatives (
), denoting regions missed by the prediction but present in the ground truth. The Dice score ranges from 0 (no overlap) to 1 (perfect match), providing a clear measure of how well the predicted mask aligns with the ground truth.
Boundary accuracy: Start and end slice predictions were compared with ground truth using the coefficient of determination (
), mean absolute error (MAE), and mean squared error (MSE) to evaluate per-case deviations. The
metric, shown in Equation (2), represents the proportion of variance in the ground-truth values that is explained by the predicted values. It is calculated using the Residual Sum of Squares (
RSS), which measures the sum of squared differences between the predicted and actual values, and the Total Sum of Squares (
TSS), which quantifies the total variance in the ground truth data. A higher
R2 value indicates better predictive accuracy, with 1 representing a perfect fit and 0 indicating no explanatory power.
In addition, Equation (3) defines MAE, which quantifies the average absolute difference between predicted and actual slice indices. And Equation (4) defines MSE, which captures the average squared difference between predicted and true values, penalizing larger errors more strongly:
Lower values imply higher quantitative fidelity. In these equations, represents the true value for slice , represents the predicted value for slice , and is the number of samples.
Volumetric agreement: For cases where SAM2 produced a complete aneurysm mask, the aneurysm volume was computed by integrating the segmented pixel values across slices. Agreement with manual volume was assessed using the R2 metric.
Training diagnostics: During optimization of the LSTM model, we monitored binary-cross-entropy and tracked the score on a held-out validation subset after each epoch. Once training and validation losses converged and further epochs failed to improve score, the model with the best score was frozen and evaluated on the untouched test folds. Binary cross entropy is expressed by Equation (5).
Volumetric agreement: We treat the sum of segmented pixels within the aneurysm boundaries as a proportional volume surrogate. If the predicted start and end slices are
and
,
denotes the number of segmented pixels in slice
. The surrogate volume is computed by Equation (6).
This unit-less figure scales linearly with true physical volume. Although it does not yield an absolute volume in cubic millimeters and ignores wall thickness, it is sufficient for primary algorithm validation, allowing consistent comparison with pixel totals when aneurysm boundaries are identified manually. Agreement between automated and manual surrogate volumes is reported with ; a value close to 1 indicates that the model captures nearly all variability in the reference measurements despite the absence of geometric calibration. By combining Dice, MAE, MSE and for both boundary indices and volumes, we obtained a comprehensive view of each pipeline’s ability to locate, delineate and quantify abdominal aortic aneurysms.
2.5. Prior Work and Motivation
Our earlier work established the utility of deep learning for vascular analysis: a UNet-based system achieved an 83.4% Dice score for segmenting the arterial tree from the descending thoracic aorta to the knees, produced automated calcification scores that correlated highly with manual assessments,
= 0.978, and yielded a MAPE of just 9.5% [
22]; a subsequent transformer-based model, TransONet, raised segmentation performance to 93.5% Dice from the thoracic aorta to the iliac bifurcation and maintained 80.64% Dice down to the knees [
18]. These successes motivated us to pursue aneurysm detection and volume quantification using similar approaches.
3. Results and Discussion
3.1. Using UNet Failure for Aneurysm Boundary Identification
We began by applying our previously trained UNet model [
22], trained on normal vascular system data, to identify aneurysms indirectly via segmentation failure. The model failed to segment the aorta in regions with anomalies. In essence, encountering an “unseen” pathological structure led the model to produce no segmentations. However, we observed that the absence of segmentation in anomalous slices effectively acted as a binary marker distinguishing “normal” from “abnormal” vascular systems. We developed an expert rule that interprets segmentation failure as an indicator of pathology. By maintaining a rolling average of segmented pixel counts across four consecutive slices, we established a baseline for normal aortic segmentation. Any slice where the segmentation dropped below 50% or exceeded 140% of this baseline was flagged as anomalous. If these deviations persisted across four consecutive slices, the first flagged slice was marked as the start of the aneurysm, while the last was designated as the end of the aneurysm. These threshold parameters and window size were optimized through a grid search to achieve the best performance.
We tested this rule-based approach on 33 patient scans: 16 non-aneurysm cases sourced from our earlier study [
22] and 17 cases with AAAs. The resulting confusion matrix is shown in
Table 1. The expert system correctly identified 14 out of 16 non-aneurysm cases and 15 out of 17 aneurysm cases, yielding an overall accuracy of 87.9%. Precision, recall, and F1-score were all 88.2%, while specificity was 87.5%. These results demonstrate that segmentation failure in a UNet trained only on normal anatomy can serve as a reliable binary classifier for vascular abnormalities.
Figure 2A shows how the UNet fails to segment the aneurysmal region, while
Figure 2B illustrates successful segmentation when SAM2 is used in conjunction with UNet.
3.2. Aneurysm Localization and Segmentation
Despite its limitations, the UNet accurately localized normal aortic slices. We extracted the first slice with successful segmentation and established the center coordinates (x, y) of the aorta. These served as point prompts for the SAM2 model. Using point prompts from UNet, the SAM2 model tracked the aorta throughout the entire scan. It adapted its segmentation to changes in aortic shape, including aneurysmal regions. The result is a per-slice segmentation mask and a count of segmented pixels representing cross-sectional area.
Figure 3 illustrates this full pipeline. Panel A shows the original CTA scan, while B displays the complete aorta mask generated by SAM2 after propagation over the input image. C shows the segmented mask from SAM2 output and finally, panel D highlights the aneurysm region using the manual annotation, with red indicating normal aorta segment and blue denoting the aneurysm segment. This visualization demonstrates how UNet and SAM2 operate in tandem: UNet anchors the aorta in normal regions, and SAM2 tracks it seamlessly across the aneurysm segments for full-volume segmentation.
Figure 4 provides a cross-sectional view at the aneurysm’s start and end boundary, further illustrating the model’s performance.
To enable full automation of aneurysm analysis, including detection, boundary localization, and volume estimation, we developed two downstream boundary detection strategies: a rule-based expert system and a learnable LSTM model. Both operate directly on the pixel-count sequences output by SAM2, allowing automated identification of aneurysm start and end boundaries without the need for manual annotation.
3.2.1. Results of UNet + SAM2 + LSTM (USL) Approach
Training and Validation Loss
To establish a baseline for sequence modeling, we trained a bidirectional LSTM on slice-wise pixel-count sequences using five-fold cross-validation with an 80/20 split between training and validation patients. The model was trained for 1000 epochs.
Both loss curves dropped sharply in the early epochs and then flattened at almost identical values; the validation trajectory never diverged from the training trajectory, indicating minimal over-fitting and good generalizability across folds.
Figure 5 illustrates the training and validation loss for one of the folds. As we can see from the figure, the model starts to overfit on the training around 500 epochs and validation loss did not improve after that.
Figure 5 shows LSTM training versus validation loss over 1000 epochs. Both curves drop sharply during the first 50–100 epochs, indicating rapid learning, and then flatten to a stable plateau. Training loss settles near 0.15 and validation loss near 0.22.
Boundary-Prediction Accuracy
Predicted start and end slices were compared with ground truth. Across the full cross-validation the aneurysm segment achieved an average Dice overlap of 65%, confirming that the pipeline provides clinically acceptable boundary localization.
Figure 6 illustrates the performance of USL on predicting the boundary in a subset of testing data. The Dice score reflects the degree of overlap between the red and green regions in
Figure 6, with higher overlap indicating better segmentation accuracy.
Volume-Estimation Accuracy
Integrating the SAM2 mask pixel counts between the LSTM-predicted boundaries produced a surrogate lumen volume that showed moderate agreement with manual measurements, achieving an
of 0.57 as shown in
Figure 7. This indicates the automated pipeline captures just over half of the variance observed in expert-derived volumes, reflecting useful, but not yet optimal quantitative accuracy.
3.2.2. Results of UNet + SAM2 + Expert (USE) Approach
In an alternative pipeline, we employed an expert system in place of the LSTM for aneurysm boundary detection. The rule consists of scanning through the number of segmented pixels for each cross-sectional slice and detecting significant rises and drops over a sliding window of four slices. Once the largest “abnormality” window is identified, the start and end of that region are deemed to be the aneurysm boundaries. The results are summarized below. We used grid search to find the optimum values for the window size, upper and lower threshold.
Boundary Prediction Accuracy of Expert System
We measured the boundary prediction performance of the expert system against the ground truth for the beginning and end of the aneurysm. While minor discrepancies were observed, the expert system outperformed the USL approach. It achieved an average Dice score of 78%, indicating strong spatial overlap with the region annotated manually.
Figure 8 illustrates the performance of USE on predicting the boundary in a subset of testing data.
Volume Estimation Accuracy of Expert System
The scatter plot shown in
Figure 9 shows a strong correlation in volume estimation, with an R-squared value of 0.92 in segmented pixel counts between the expert system and manual boundary annotations. This indicates that the expert system aligns well with manual annotations, demonstrating a robust capacity for accurate volume measurement.
3.3. Comparison of Methods
Table 2 illustrates a side-by-side look at the start and end boundaries detection performance, using
R2, MAE, and MSE, and the Average Dice Score across two different pipelines:
Although we are unable to report a dice score for segmentation accuracy due to the absence of manual annotations, the USE pipeline still achieved a volume
R2 of 0.916, outperforming the volumetric correlation reported in the PRAEVAorta framework by [
25], volume
R2 = 0.90, where the model was trained on annotated dataset. It is important to note that in our study, the UNet + SAM2 segmentation masks were validated only through expert visual inspection and therefore do not constitute formal ground truth.
Figure 10 and
Figure 11 summarize the comparative performance of the two pipelines USL and USE across six sub-panels.
Figure 10A displays the
R2 values for the start boundary, end boundary, and volume, immediately highlighting USE’s superior explanatory power at all three targets.
Figure 10B,C contrasts absolute error for the start and end indices, where the noticeably shorter USE bars confirm its finer boundary localization.
Figure 10D displays histogram plots of mean absolute errors, while
Figure 10E shows the mean-squared errors; in both, USE has better performance delineating the boundary than USL.
Figure 11 blends a violin plot with an embedded box plot to reveal the full distribution of Dice values; the denser, taller violin for USE shows both a higher median and a tighter clustering of scores. Collectively the six views demonstrate that coupling SAM2 with a rule-based expert system yields the most reliable boundaries and best volumetric agreement, whereas the LSTM variant exhibits larger errors and greater inter-patient variability.
We also conducted an experiment where we selected a random region with the same window size as an aneurysm to evaluate accuracy of our methods compared to the worst-case scenario. Our goal was to determine the accuracy range if boundaries were randomly chosen from this region. Specifically, we analyzed the R-squared if we predicted a random upper aorta region as aneurysm.
Although this does not represent the true worst-case scenario, since the upper thoracic aorta region is naturally larger in diameter than the aneurysm region would be if it were without aneurysm, it serves as a useful neutral reference for assessing the added value of automated boundary detection. The results, illustrated in the accompanying scatter plot, range between
R2 of 0.40 and 0.70.
Figure 12 illustrates one such randomly placed window in the upper aorta. This underscores the advantage of utilizing specialized modeling techniques, such as the expert system, which ensures a more systematic and reliable boundary detection method compared to random selection.
3.4. Volume Change from Normal to Aneurysm
After identifying the aneurysm boundaries, the aneurysm volume was calculated by summing the segmented pixel counts within the aneurysm across all slices identified by the LSTM or expert system. This sum represents the observed volume of the aneurysm derived from the segmented regions.
To provide a baseline for comparison, we used interpolation to estimate the size of a normal aorta in the same region. Interpolation connects the start and end slices of the aneurysm boundaries by assuming a linear relationship between the segmented pixel values at these points. This approach creates a hypothetical baseline for a normal aorta, representing the expected volume if no aneurysm were present.
Figure 13 illustrates a test case, contrasting the observed lumen volume with the interpolated normal baseline.
A comparative analysis of the standard sum representing the actual aneurysm volume, and the interpolated sum representing the estimated normal aorta volume was performed to quantify the extent of aortic enlargement due to the aneurysm for 60 patients, all of whom ultimately underwent endovascular stent placement. The percentage differences between the two sums were analyzed. The results showed a mean difference of 40.74%, a median difference of 40.54%, and a standard deviation of 21.66%. These findings underscore the substantial volumetric expansion typically seen in aneurysms requiring intervention and highlight the value of volume-based metrics in characterizing disease severity prior to stent placement.
4. Discussion
The side-by-side metrics confirm the clear advantage of the UNet + SAM2 + Expert (USE) pipeline over the UNet + SAM2 + LSTM (USL) alternative. For boundary localization USE attains R2 = 0.714 for the start and 0.759 for the end, compared with only 0.183 and 0.340 for USL. That statistical edge translates into sharply lower localization errors: the mean-absolute error falls from 17.8 to 7.7 at the start boundary and from 13.0 to 8.3 at the end, while the mean-squared error drops by roughly two-thirds (449 → 157 for the start, 410 → 149 for the end). Spatial overlap follows the same pattern, with the Dice coefficient rising from 0.64 for USL to 0.78 for USE. Most importantly for clinical use, surrogate lumen volume correlation climbs from R2 = 0.572 to 0.916, and the average volume error shrinks from 17% to about 11%. These results confirm that a deterministic sliding-window rule, when fed a high-quality SAM2 mask, can deliver reliable, interpretable boundary placement without the need for additional training data, a critical advantage in data-sparse clinical environments. Moreover, this study provides a basis for future research to determine the clinical significance of volume and Dice differences, guiding the evaluation of the practical utility of these metrics in clinical settings.
The strength of USE is especially relevant given the scarcity of labeled data. Our entire cohort comprises only 60 annotated samples. Such modest sample sizes restrict deep learning models with a high number of parameters, making a deterministic rule set an attractive choice. The LSTM was trained with conservative augmentation and a relatively deep recurrent stack. Its performance therefore represents an early baseline rather than the final word on sequence learning. We anticipate that lighter recurrent blocks, positional embeddings, attention pooling, Dice-augmented loss, and channel stacking (raw signal, derivatives, smoothed counts) could greatly improve USL performance.
5. Conclusions
This study introduces two complementary pipelines that transform a UNet trained only on normal anatomy into a fully automated tool for abdominal aortic aneurysm (AAA) detection and quantification.
Segmentation failure screening. The absence of a UNet mask in anomalous slices was turned into a binary classifier with a simple rolling-window rule. Tested on 33 scans (16 normal, 17 AAA), the method achieves 87.9% accuracy, 88.2% precision and recall, and an F1-score of 0.88, demonstrating that a label-free network can still act as a reliable triage filter.
Multi-system quantification. Seeding Meta’s SAM2 with the UNet output enabled dense aortic masks through diseased segments. Two boundary-detection strategies were compared:
- ○
The UNet + SAM2 + Expert (USE) pipeline achieved R2 of 0.714 for start and 0.759 for end boundaries, MAE of 7.7 for start and 8.3 for end boundaries. Its Dice score was 0.78, and it yielded a strong surrogate volume R2 of 0.916.
- ○
The UNet + SAM2 + LSTM (USL) pipeline produced R2 of 0.183 for start and 0.340 for end boundaries, MAE of 17.8 for start and 13.0 for end boundaries. Its Dice score was 0.64, and the surrogate volume R2 of 0.572.
The deterministic expert rules therefore delivered markedly superior boundary localization and volume agreement while requiring no additional training. A control experiment with randomly selected upper-aorta windows yielded
R2 values of 0.40–0.70, whereas the USE pipeline achieved a substantially higher
R2 of 0.916, exceeding the 0.90 correlation reported in [
25]
This work yields three key insights. First, when a UNet trained exclusively on normal aorta fails to produce a mask, that absence itself could serve as a marker of pathology. Second, even the sparse masks the UNet does generate are enough to “prompt” SAM2 into producing a continuous, high-quality aortic segmentation that spans the entire scan, aneurysm included. Third, a simple sliding-window rule applied to the SAM2 pixel-count signal can result in reliable aneurysm boundaries detection, achieving a volume correlation of roughly 0.92. Together these findings make the USE pipeline an end-to-end, interpretable solution for AAA screening and quantification in data-sparse settings, dramatically reducing manual annotation effort and opening a path toward scalable, fully automated vascular analysis. However, several important limitations remain. First, aneurysm volumes are estimated using pixel-count surrogates rather than calibrated voxel measurements. Second, the current segmentation captures only the contrast-enhanced lumen, excluding the outer vessel wall and any surrounding thrombus. This is due to the nature of SAM2, which propagates masks based on intensity similarity and therefore tracks only regions with characteristics similar to the initial UNet prompt. Future work will integrate slice-thickness metadata, extend segmentation to the full vessel wall (including thrombus), and validate the approach against manually segmented data.