TASC-SwinMT: Task-Adaptive Synergistic Cross-Task Swin Multi-Task Framework for CT and MRI Image Interpolation and Segmentation
Simple Summary
Abstract
1. Introduction
- 1.
- We propose TASC-SwinMT, a unified multi-task framework for joint interpolation and segmentation of CT and MRI, which exploits task spatiotemporal correlation and anatomical complementarity to realize feature sharing and performance improvement.
- 2.
- Three customized multi-task collaboration modules are developed: TALA enables task-adaptive modulation via spatial dual-bottleneck and frequency-domain modeling; MSTAF implements cross-level feature alignment; CTCI achieves fine-grained adaptive cross-task interaction. These modules jointly mitigate feature deficiency and distribution mismatch in multi-task learning.
- 3.
- A learnable dynamic multi-task loss is designed to balance heterogeneous optimization objectives of interpolation and segmentation, with better flexibility and effectiveness than traditional fixed-weight loss strategies.
- 4.
- Experiments on public MSD datasets verify the superiority of TASC-SwinMT over mainstream single-task and multi-task baseline methods. Ablation studies further confirm the effectiveness of each core module embedded in the proposed framework.
2. Related Work
2.1. Medical Image Interpolation
2.2. Medical Image Segmentation
2.3. Multi-Task Learning in Medical Image Analysis
2.4. Swin Transformer for Medical Computer Vision
2.5. Summary of Related Methods
3. Materials and Methods
3.1. Datasets
3.1.1. MSD Task06_Lung Dataset
3.1.2. MSD Task02_Heart Dataset
3.1.3. Unified Data Preprocessing
- 1.
- Data format conversion: Read the nii.gz files using the nibabel library and convert them into PyTorch tensors with the dimension order to adapt to the input requirements of deep learning models.
- 2.
- Pixel value normalization: Perform per-image normalization to scale pixel values into the range [0, 1]. This step eliminates the impact of differences in scanning devices and imaging parameters, and unifies data distribution characteristics of input CT and MRI images.
- 3.
- Training set data augmentation: To mitigate overfitting, enhance model generalization and robustness to imaging noise, we applied a comprehensive data augmentation strategy to the training set. The augmentation operations include random horizontal and vertical flipping with a probability of , random rotation within the range of to , random scaling with factors ranging from to , and additive Gaussian noise injection (mean = 0, standard deviation = 0.01) to simulate real-world medical imaging noise. For the validation and test sets, only format conversion and normalization are performed without any augmentation to reflect the true performance of the model on unseen data.
3.2. Proposed Model
- 1.
- The Task-Aware Lightweight Adapter (TALA) extracts spatial dual-bottleneck features and global frequency-domain information, and injects task-adaptive features into the shared encoder.
- 2.
- Multi-Scale Task Alignment Fusion (MSTAF) uses bidirectional cross-attention, multi-scale spatial extraction, and frequency-domain enhancement to align encoder-decoder skip connections.
- 3.
- Cross-Task Collaborative Interaction (CTCI) enables adaptive fine-grained feature interaction between interpolation and segmentation tasks.
- 4.
- The shared encoder and dual decoders balance universal feature sharing and task-specific learning to reduce redundancy.
- 5.
- A dynamic multi-task loss with learnable weights balances heterogeneous task optimization and avoids single-task dominance.
3.2.1. Task-Aware Lightweight Adapter (TALA)
- Given input feature , a convolution is first used to reduce channel dimension.
- Global structural information is captured by transforming features into the frequency domain using two-dimensional fast Fourier transform (FFT).
- Spatial and frequency features are concatenated and normalized for unified representation.
- A channel attention mechanism generates adaptive weights to emphasize task-relevant features.
3.2.2. Multi-Scale Task Alignment Fusion (MSTAF)
3.2.3. Window Attention in Swin Transformer
- 1.
- Window Partition
- 2.
- Query-Key-Value Projection
- 3.
- MLP Feature Refinement
3.2.4. SwinTransformerBlock with TALA
3.2.5. Cross-Task Collaborative Interaction (CTCI) Module
3.2.6. Patch Embedding/Merging/Expand Modules
PatchEmbed (Input Projection)
PatchMerging (Downsampling)
PatchExpand (Upsampling)
3.2.7. Shared Encoder
- 1.
- Patch embedding: Generate 96-channel feature map .
- 2.
- Four-stage feature extraction: Each stage consists of:
- SwinTransformerBlock with shifted window attention;
- TALA module for task-aware, frequency-aware and modality-aware feature enhancement;
- PatchMerging (except the 4th stage) for spatial downsampling.
3.2.8. Dual Decoders with CTCI
- 1.
- Upsampling: PatchExpand module to match the spatial resolution of encoder skip connections.
- 2.
- Skip Connection Fusion: MSTAF aligns decoder features and encoder features in the frequency domain, and eliminates feature deviation brought by different imaging modalities.
- 3.
- Feature Refinement: SwinTransformerBlock for feature optimization.
- 4.
- Cross-Task Feature Interaction: CTCI module fuses interpolation and segmentation features with bidirectional attention guidance, and maintains stable prediction performance on diverse CT and MRI imaging data.
3.2.9. Output Heads with PixelShuffle
Interpolation Output Head
Segmentation Output Head
| Algorithm 1 TASC-SwinMT Forward Propagation |
|
3.3. Training and Evaluation Setup
3.3.1. Loss Functions
Hybrid Interpolation Loss
Segmentation Loss
Mathematical Derivation of Learnable Weight Adaptation
3.3.2. Training Configuration
- 1.
- Optimizer: Adam optimizer with dynamically learnable learning rate and weight decay, initialized as and , respectively, and both parameters are automatically optimized during training;
- 2.
- Batch size: 2 (constrained by GPU memory);
- 3.
- Training epochs: 100 epochs with early stopping (patience = 10);
- 4.
- Training stability measures: Mixed precision training (FP16) and gradient clipping;
- 5.
- Parameter initialization: He normal initialization for convolutional layers.
4. Experiments
4.1. Evaluation Metrics
4.1.1. Interpolation Evaluation Metrics
Peak Signal-to-Noise Ratio (PSNR)
Structural Similarity Index (SSIM)
Edge Structural Similarity Index (EdgeSSIM)
Learned Perceptual Image Patch Similarity (LPIPS)
4.1.2. Segmentation Evaluation Metrics
Dice Coefficient
Intersection over Union (IoU)
Precision and Recall
Temporal Consistency Score (TCS)
4.2. Experimental Setup
4.2.1. Hardware/Software Environment
- 1.
- Hardware: Intel Core i9-12900K (16 cores/32 threads), 64 GB DDR4 RAM, NVIDIA RTX 4090 (24 GB VRAM), 2TB NVMe SSD.
- 2.
- Software: Ubuntu 20.04 LTS, Python 3.9.18, PyTorch 2.0.1, CUDA 11.7, cuDNN 8.5.0, nibabel 5.1.0, scikit-image 0.21.0, numpy 1.26.0, matplotlib 3.8.0.
4.2.2. Expanded Baseline and Ablated Models
Baseline Models
- 1.
- SwinUNet-Interp: SwinUNet for interpolation [26].
- 2.
- TransUNet-Interp: TransUNet for interpolation [64].
- 3.
- VTN-Interp: Transformer-based multi-scale interpolation model [16].
- 4.
- nnU-Net-Seg: nnU-Net for segmentation [22].
- 5.
- UNet++-Seg: U-Net++ for segmentation [23].
- 6.
- Swin UNETR-Seg: Swin UNETR for segmentation [11].
Ablated Variants (Proposed Model)
- 1.
- IndEnc: Baseline configuration adopting independent encoders for interpolation and segmentation tasks, without shared feature learning.
- 2.
- +TALA: Base framework embedded only with the Task-Aware Lightweight Adapter module.
- 3.
- +CTCI: Base framework embedded only with the Cross-Task Collaborative Interaction module.
- 4.
- +MSTAF: Base framework embedded only with the Multi-Scale Task Alignment Fusion module.
- 5.
- +TALA+CTCI: Base framework integrated with both TALA and CTCI modules for pairwise component validation.
- 6.
- +TALA+MSTAF: Base framework integrated with both TALA and MSTAF modules for pairwise component validation.
- 7.
- +CTCI+MSTAF: Base framework integrated with both CTCI and MSTAF modules for pairwise component validation.
- 8.
- StaticLoss: Full framework replacing learnable dynamic weights with fixed weighted multi-task loss.
- 9.
- Loss0.3: Full framework adopting hybrid interpolation loss with balance parameter set to .
- 10.
- Loss0.5: Full framework adopting hybrid interpolation loss with balance parameter set to .
4.3. Baseline Comparison
4.3.1. Baseline Comparison Results on Heart Dataset (MSD Task02_Heart)
4.3.2. Baseline Comparison Results on Lung Dataset (MSD Task06_Lung)
4.4. Ablation Studies
4.4.1. Model Variant Ablation
4.4.2. Loss Weight Ablation
4.4.3. Core Component Contribution Analysis
4.5. Comparative Experiments of Multi-Task Learning Methods
- Direct Feature Concatenation
- Shared-Bottom
- Cross-Task Feature Concatenation
- Mixture of Experts (MoE)
4.6. Multi-Task Balancing Strategy Evaluation and Training Dynamics Analysis
4.7. Single-Task vs. Multi-Task Comparison
4.7.1. Heart Dataset (MSD Task02_Heart)
4.7.2. Lung Dataset (MSD Task06_Lung)
4.8. Interpretability Visualization Experiments
5. Discussion
5.1. Result Analysis
- 1.
- Each designed module targets inherent bottlenecks in multi-task feature learning and brings steady performance gains over the independent encoder baseline. The MSTAF module delivers the most prominent performance improvement by rectifying feature distribution mismatch via bidirectional cross-attention and frequency enhancement, stabilizing multi-scale feature fusion and achieving maximum increments in PSNR, SSIM and Dice. The TALA module further boosts performance by integrating spatial dual-bottleneck extraction and global frequency modeling, realizing task-oriented feature modulation with lightweight parameter overhead. The CTCI module yields moderate yet consistent metric gains through bidirectional fine-grained cross-task feature interaction, enhancing intrinsic feature correlation via spatial alignment and dynamic gating.
- 2.
- Pairwise combinations of the three core modules produce evident performance synergy beyond simple numerical superposition, attributed to their complementary functional attributes. The integration of TALA and MSTAF forms a complete workflow of feature embedding and decoding deviation calibration. The combination of TALA and CTCI alleviates feature aliasing and gradient conflict via task-aware modulation and cross-task interactive optimization. Merging MSTAF and CTCI strengthens multi-scale context fusion and inter-task semantic transmission. Nevertheless, dual-module combinations cannot cover all optimization dimensions, thus failing to reach the accuracy of the full integrated framework.
- 3.
- The shared Swin Transformer encoder provides a fundamental architecture for universal feature sharing and computational cost reduction, eliminating redundant feature extraction and lowering parameter occupancy and inference latency. Consistent performance gains on two datasets confirm that shared hierarchical features can satisfy heterogeneous optimization demands of interpolation and segmentation. Integrating the shared encoder with TALA, MSTAF and CTCI establishes a closed-loop learning pipeline covering task adaptive modulation, cross-level feature alignment and fine-grained cross-task interaction. This integrated mechanism enables mutual promotion between anatomical constraints from segmentation and spatiotemporal context from interpolation, achieving optimal performance among all ablation variants.
5.2. Comparison with State of the Art
- SOTA Models
- 1.
- ACVTT: Cross-view texture transfer for CT slice interpolation [4].
- 2.
- Net: Inter-intra-slice network for medical slice synthesis [71].
- 3.
- Video Interp Net: Video frame interpolation for 3D tomography [72].
- 4.
- SFCLI-Net: Spatial-frequency collaborative CT slice interpolation [73].
- 5.
- SegMamba-V2: Mamba-based 3D medical image segmentation [74].
- 6.
- HiDiff: Hybrid diffusion medical image segmentation [75].
- 7.
- Anatomy-Aware Seg: CT airway tree segmentation with topology guidance [76].
- 8.
- SicTTA: Single-image test-time adaptation for segmentation [77].
- 1.
- Heart MRI Analysis: Existing state-of-the-art methods show competitive performance under single-task training settings on the cardiac MRI dataset. SFCLI-Net obtains marginally better SSIM value while showing unstable LPIPS performance with large standard deviation in interpolation experiments. SegMamba-V2 and HiDiff achieve competitive Dice and Precision scores in left atrium segmentation tasks. All referenced SOTA models are independently optimized for only one single task scenario. TASC-SwinMT attains the best PSNR and EdgeSSIM values among all interpolation methods. It achieves the highest IoU score in segmentation and maintains comparable Dice and Precision accuracy with advanced SOTA models while surpassing all conventional baseline networks.
- 2.
- Lung CT Analysis: The lung CT dataset involves irregular tumor contour features and raises higher difficulty for slice reconstruction and lesion segmentation tasks. Existing SOTA methods retain competitive advantages in their respective independent task domains. SFCLI-Net acquires the highest SSIM value in lung CT slice interpolation comparison. The proposed framework achieves optimal PSNR, EdgeSSIM and equivalent LPIPS performance against all contrast methods. It exceeds all SOTA models on IoU and Recall indicators for lung tumor segmentation, and maintains competitive Dice slightly lower than HiDiff yet higher than SegMamba-V2 while keeping Precision at a competitive level marginally below both models. The proposed framework presents stronger adaptive ability for complex irregular lesion structures than comparative models.
- 3.
- Comprehensive Multi-Task Advantage: Existing SOTA models only support independent single-task optimization and cannot finish interpolation and segmentation within one forward inference. Simply cascading separate SOTA interpolation and segmentation networks will introduce severe computational redundancy and resource waste. Such combined schemes inevitably increase inference delay, total parameter quantity and graphics memory occupation. The proposed framework employs a shared encoder to extract universal spatial features for dual tasks. It completes interpolation frame generation and segmentation mask prediction synchronously without repeated feature extraction procedures. The model maintains prediction accuracy competitive with mainstream SOTA methods and achieves superior multi-task performance while reducing computational overhead and accelerating inference efficiency. The balanced accuracy and efficiency enable the proposed framework to adapt better to clinical batch processing and real-time medical image analysis scenarios.
- Computational Efficiency Comparison with SOTA Models
5.3. Performance Superiority Analysis of the Proposed Multi-Task Learning Framework
5.4. Analysis of Multi-Task Loss Strategy Performance and Training Dynamics Differences
5.5. Generalization Evaluation on External Datasets
5.6. Interpretability Visualization Result Analysis
5.7. Limitations
- 1.
- Despite undergoing extreme lightweight optimization, the proposed framework has a compact parameter size of only 33.3589 million parameters and delivers favorable inference efficiency on the NVIDIA RTX 4090 platform: the average inference latency is 58.82 ms for lung CT samples, with comparable latency for cardiac MRI samples and slightly lower latency on external generalized CT datasets. Nevertheless, further computational efficiency optimization can still be implemented to better satisfy the strict requirements of intraoperative real-time diagnosis guidance and batch processing of large-scale clinical datasets. The main computational complexity originates from the deeply stacked Swin Transformer encoder and dual task-specific decoders deployed for fine-grained anatomical feature extraction, and we will continue to streamline this structure in follow-up research to pursue higher efficiency while maintaining model performance.
- 2.
- Although this study has conducted additional generalization experiments on external datasets including 3D liver tumor CT and COVID-19 chest CT datasets, the model is still mainly trained and evaluated on limited public datasets from the Medical Segmentation Decathlon (MSD) with unified imaging protocols and standardized annotation criteria. It has not been tested on more clinically specific and high-difficulty datasets involving complex lesions, rare diseases, severe imaging artifacts, and heterogeneous multi-center clinical data collected from different scanning equipment, diverse patient cohorts and independent manual annotation systems. As a result, the model still cannot satisfy all the complex and diverse requirements of practical clinical applications, and further research is needed to continuously improve its generalization ability to adapt to more complicated clinical scenarios.
- 3.
- This framework does not integrate explicit physiological motion modeling mechanisms to characterize cardiac contraction patterns and lung respiratory movements. The interpolation results may produce anatomically unreasonable tissue structures for image frames captured at typical physiological motion phases.
- 4.
- The designed dynamic multi-task loss only optimizes task weight allocation at the global batch level throughout the training process. The loss strategy fails to achieve independent adaptive weight adjustment for each individual sample. It is difficult to realize refined optimization balance for samples with variable anatomical morphologies and blurred lesion boundary characteristics.
- 5.
- The feasibility of the proposed model architecture has not been verified on multi-center datasets equipped with hierarchical classification diagnosis labels. Current experimental datasets only contain segmentation annotation information and lack graded clinical diagnosis labels corresponding to lesion severity and disease staging. The absence of validation on label-rich multi-center data limits the comprehensive clinical application value of the framework. We will supplement this verification work in future research to further improve the clinical practicability and robustness of the model.
5.8. Future Work
- 1.
- Model lightweight: Subsequent research will adopt knowledge distillation channel pruning and INT8 model quantization strategies to compress total parameters below 15 million. Inference speed will be optimized to no more than 0.05 s per sample to support real-time clinical deployment while retaining over 95% of the original model performance.
- 2.
- Multi-dataset validation: Follow-up work will collect multi-center clinical thoracic imaging data covering diversified scanning parameters patient groups and lesion distribution types. Domain adaptation methods including adversarial training and image style transfer will be adopted to alleviate data distribution difference and verify model generalization in practical clinical scenarios.
- 3.
- Motion model fusion: Subsequent studies will fuse deformable registration based physiological motion models to learn inherent movement rules of lung respiration and cardiac contraction. The embedded motion prior will improve anatomical rationality and physiological consistency of generated intermediate interpolation frames.
- 4.
- Adaptive loss weights: Future research will design reinforcement learning based dynamic loss adjustment strategies taking Deep Q-Network as the basic framework. Independent loss weights will be optimized for each single sample according to lesion scale imaging quality and anatomical complexity to further promote model adaptability on heterogeneous medical data.
- 5.
- Clinical system development: An end-to-end intelligent medical analysis platform will be constructed to integrate image import preprocessing joint interpolation segmentation and lesion quantitative calculation. Automatic measurement of left atrium volume and lung tumor size will be realized with intuitive graphical interaction interface to assist clinical diagnosis and individualized treatment planning.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| TASC-SwinMT | Task-Adaptive Synergistic Cross-Task Swin Multi-Task Framework |
| TALA | Task-Aware Lightweight Adapter |
| MSTAF | Multi-Scale Task Alignment Fusion |
| CTCI | Cross-Task Collaborative Interaction |
| CT | Computed Tomography |
| MRI | Magnetic Resonance Imaging |
| BN | Batch Normalization |
| CNN | Convolutional Neural Network |
| FFT | Fast Fourier Transform |
| iFFT2D | Inverse Two-dimensional Fast Fourier Transform |
| BCE | Binary Cross-Entropy |
| MSE | Mean Squared Error |
| Dice | Dice Coefficient |
| EdgeSSIM | Edge-based Structural Similarity Index Measure |
| LPIPS | Learned Perceptual Image Patch Similarity |
| PSNR | Peak Signal-to-Noise Ratio |
| SSIM | Structural Similarity Index Measure |
| IoU | Intersection over Union |
| TCS | Temporal Consistency Score |
| MSD | Medical Segmentation Decathlon |
| MTL | Multi-Task Learning |
| SOTA | State-of-the-Art |
| MoE | Mixture of Experts |
| GradNorm | Gradient Normalization |
| DWA | Dynamic Weight Average |
| GPU | Graphics Processing Unit |
| FP16 | Mixed precision training |
| FLOPs | Floating Point Operations |
| FPS | Frames Per Second |
References
- Lam, S.; Bai, C.; Baldwin, D.R.; Chen, Y.; Connolly, C.; de Koning, H.; Heuvelmans, M.A.; Hu, P.; Kazerooni, E.A.; Lancaster, H.L.; et al. Current and Future Perspectives on Computed Tomography Screening for Lung Cancer: A Roadmap From 2023 to 2027 From the International Association for the Study of Lung Cancer. J. Thorac. Oncol. 2024, 19, 36–51. [Google Scholar] [CrossRef]
- Zhang, Z.; Liang, X.; Dong, X.; Xie, Y.; Cao, G. A Sparse-View CT Reconstruction Method Based on Combination of DenseNet and Deconvolution. IEEE Trans. Med. Imaging 2018, 37, 1407–1417. [Google Scholar] [CrossRef]
- Ruan, Z.; Song, C.; Xu, P.; Wang, C.; Zhao, J.; Chen, M.; Li, S.; Su, Q.; Zhuo, X.; Wu, Y.; et al. Multiparametric Ultrasound Breast Tumors Diagnosis Within BI-RADS Category 4 via Feature Disentanglement and Cross-Fusion. IEEE Trans. Med. Imaging 2025, 44, 3064–3075. [Google Scholar] [CrossRef] [PubMed]
- Uhm, K.H.; Cho, H.; Hong, S.H.; Jung, S.W. An Anisotropic Cross-View Texture Transfer with Multi-Reference Non-Local Attention for CT Slice Interpolation. IEEE Trans. Med. Imaging 2026, 45, 336–349. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Bi, L.; Ahn, E.; Feng, D.; Wang, Q.; Kim, J. A Spatiotemporal Volumetric Interpolation Network for 4D Dynamic Medical Image. arXiv 2020, arXiv:2002.12680. [Google Scholar] [CrossRef]
- Wu, Z.; Wei, J.; Wang, J.; Li, R. Slice imputation: Multiple intermediate slices interpolation for anisotropic 3D medical image segmentation. Comput. Biol. Med. 2022, 147, 105667. [Google Scholar] [CrossRef]
- Cheung, W.K.; Pakzad, A.; Mogulkoc, N.; Needleman, S.H.; Rangelov, B.; Gudmundsson, E.; Zhao, A.; Abbas, M.; McLaverty, D.; Asimakopoulos, D.; et al. Interpolation-split: A data-centric deep learning approach with big interpolated data to boost airway segmentation performance. J. Big Data 2024, 11, 104. [Google Scholar] [CrossRef] [PubMed]
- Sarmad, M.; Ruspini, L.C.; Lindseth, F. SIT-SR 3D: Self-supervised slice interpolation via transfer learning for 3D volume super-resolution. Pattern Recognit. Lett. 2023, 166, 97–104. [Google Scholar] [CrossRef]
- Huang, T.C.; Zhang, G.; Guerrero, T.; Starkschall, G.; Lin, K.P.; Forster, K. Semi-automated CT segmentation using optic flow and Fourier interpolation techniques. Comput. Methods Programs Biomed. 2006, 84, 124–134. [Google Scholar] [CrossRef]
- Cai, Y.; Long, Y.; Han, Z.; Liu, M.; Zheng, Y.; Yang, W.; Chen, L. Swin Unet3D: A three-dimensional medical image segmentation network combining vision transformer and convolution. BMC Med. Inform. Decis. Mak. 2023, 23, 33. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Nath, V.; Tang, Y.; Yang, D.; Roth, H.R.; Xu, D. Swin UNETR: Swin Transformers for Semantic Segmentation of Brain Tumors in MRI Images. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Springer: Cham, Switzerland, 2022; pp. 272–284. [Google Scholar] [CrossRef]
- Chen, J.; Mei, J.; Li, X.; Lu, Y.; Yu, Q.; Wei, Q.; Luo, X.; Xie, Y.; Adeli, E.; Wang, Y.; et al. TransUNet: Rethinking the U-Net architecture design for medical image segmentation through the lens of transformers. Med. Image Anal. 2024, 97, 103280. [Google Scholar] [CrossRef] [PubMed]
- Zhang, X.; Wang, X.; Niu, T. CT Image Segmentation Using Frequency Domain Feature-Assisted Selective Long Memory State Space Model. Sens. Imaging 2025, 26, 74. [Google Scholar] [CrossRef]
- Liu, J.; Fu, Y.; Shi, J. DAGU-Net: Cascaded multi-scale aware network based on dual attention grouping module for medical image segmentation. Biomed. Signal Process. Control 2026, 112, 108732. [Google Scholar] [CrossRef]
- Liu, Y.; Deng, H.; Fu, J. DCM-Net: A novel dual-branch CNN–Mamba cross-layer feature fusion network for medical image segmentation. Biomed. Signal Process. Control 2026, 114, 109267. [Google Scholar] [CrossRef]
- Richter, S.R.; Roth, S. Matryoshka Networks: Predicting 3D Geometry via Nested Shape Layers. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2018; pp. 1936–1944. [Google Scholar] [CrossRef]
- Huang, P.S.; Horng, S.J.; Yang, C.J.; Chen, H.; Zhou, W. Applying Deep Learning and Heatmap Techniques for Sinusitis Diagnosis Using Enhanced 3-D Image Segmentation. IEEE Trans. Instrum. Meas. 2025, 74, 2525711. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, X.; Du, W.; Dai, N.; Lyv, Y.; Wu, K. Deep Learning-Based Fully Automatic Segmentation of the Paranasal Sinuses in Chronic Rhinosinusitis Patients Using Computed Tomographic Images. IEEE Access 2025, 13, 16444–16454. [Google Scholar] [CrossRef]
- Hamzaoui, I.; Benmeziane, H.; Cherif, Z.; El Maghraoui, K. Analog In-Memory Computing with Uncertainty Quantification for Efficient Edge-Based Medical Imaging Segmentation. arXiv 2024, arXiv:2403.08796. [Google Scholar]
- Sha, X.; Guan, Z.; Wang, Y.; Han, J.; Wang, Y.; Chen, Z. SSC-Net: A multi-task joint learning network for tongue image segmentation and multi-label classification. Digit. Health 2025, 11, 1–14. [Google Scholar] [CrossRef]
- Ali, H.; Xie, J. DFIT-Net: A novel dynamic feature integration transformer for automatic segmentation of multi-organ structures in medical imaging. Displays 2025, 90, 103087. [Google Scholar] [CrossRef]
- Isensee, F.; Jaeger, P.F.; Kohl, S.A.A.; Petersen, J.; Maier-Hein, K.H. nnU-Net: A self-configuring method for deep learning-based biomedical image segmentation. Nat. Methods 2020, 18, 203–211. [Google Scholar] [CrossRef]
- Zhou, Z.; Siddiquee, M.M.R.; Tajbakhsh, N.; Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer: Cham, Switzerland, 2018; Volume 11045, pp. 3–11. [Google Scholar] [CrossRef]
- Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-Net and Its Variants for Medical Image Segmentation: A Review of Theory and Applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV); IEEE: Piscataway, NJ, USA, 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
- Cao, H.; Wang, Y.; Chen, J.; Jiang, D.; Zhang, X.; Tian, Q.; Wang, M. Swin-Unet: Unet-like Pure Transformer for Medical Image Segmentation. In Computer Vision—ECCV 2022 Workshops. ECCV 2022; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2023; Volume 13803, pp. 205–218. [Google Scholar] [CrossRef]
- Hatamizadeh, A.; Yang, D.; Roth, H.R.; Xu, D.; Albarqouni, S.; Spampinato, C.; Yu, Q.; Li, W.; Myronenko, A.; Landman, B.A.; et al. UNETR: Transformers for 3D Medical Image Segmentation. In Proceedings of the 2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); IEEE: Piscataway, NJ, USA, 2022; pp. 574–584. [Google Scholar]
- Zhang, H.; Liu, B.; Yu, H.; Dong, B. MetaInv-Net: Meta Inversion Network for Sparse View CT Image Reconstruction. IEEE Trans. Med. Imaging 2020, 40, 621–634. [Google Scholar] [CrossRef] [PubMed]
- Wu, W.; Hu, D.; Niu, C.; Yu, H.; Vardhanabhuti, V.; Wang, G. DRONE: Dual-Domain Residual-based Optimization NEtwork for Sparse-View CT Reconstruction. IEEE Trans. Med. Imaging 2021, 40, 3002–3014. [Google Scholar] [CrossRef]
- Deng, Z.; Wang, Z.; Shan, Y.; He, G.; Du, T.; Wang, S. COO-DuDo: Computation overhead optimization methods for dual-domain sparse-view CT reconstruction. Expert Syst. Appl. 2025, 286, 128109. [Google Scholar] [CrossRef]
- Ling, Y.; Wang, Y.; Dai, W.; Yu, J.; Liang, P.; Kong, D. MTANet: Multi-Task Attention Network for Automatic Medical Image Segmentation and Classification. IEEE Trans. Med. Imaging 2023, 43, 674–685. [Google Scholar] [CrossRef] [PubMed]
- He, X.; Wang, Y.; Zhao, S.; Chen, X. Joint segmentation and classification of skin lesions via a multi-task learning convolutional neural network. Expert Syst. Appl. 2023, 230, 120174. [Google Scholar] [CrossRef]
- Graham, S.; Vu, Q.D.; Jahanifar, M.; Raza, S.E.A.; Minhas, F.; Snead, D.; Rajpoot, N. One model is all you need: Multi-task learning enables simultaneous histology image segmentation and classification. Med. Image Anal. 2023, 83, 102685. [Google Scholar] [CrossRef]
- Marullo, G.; Tanzi, L.; Ulrich, L.; Porpiglia, F.; Vezzetti, E. A Multi-Task Convolutional Neural Network for Semantic Segmentation and Event Detection in Laparoscopic Surgery. J. Pers. Med. 2023, 13, 413. [Google Scholar] [CrossRef]
- Aumente-Maestro, C.; Díez, J.; Remeseiro, B. A multi-task framework for breast cancer segmentation and classification in ultrasound imaging. Comput. Methods Programs Biomed. 2024, 260, 108540. [Google Scholar] [CrossRef]
- Zhao, Y.; Wang, X.; Che, T.; Bao, G.; Li, S. Multi-task deep learning for medical image computing and analysis: A review. Comput. Biol. Med. 2023, 153, 106496. [Google Scholar] [CrossRef]
- Dai, G.; Dai, D.; Wang, C.; Tang, Q.; Hamilton, M.; Chen, H.; Zhang, Y. Multi-Task Learning Network for Medical Image Analysis Guided by Lesion Regions and Spatial Relationships of Tissues. In IEEE Transactions on Circuits and Systems for Video Technology; IEEE: Piscataway, NJ, USA, 2025; pp. 1249–1264. [Google Scholar] [CrossRef]
- Chavarrias Solano, P.E.; Bulpitt, A.; Subramanian, V.; Ali, S. Multi-task learning with cross-task consistency for improved depth estimation in colonoscopy. Med. Image Anal. 2025, 99, 103379. [Google Scholar] [CrossRef] [PubMed]
- Yu, P.; Zhang, H.; Wang, D.; Zhang, R.; Deng, M.; Yang, H.; Wu, L.; Liu, X.; Oh, A.S.; Abtin, F.G.; et al. Spatial resolution enhancement using deep learning improves chest disease diagnosis based on thick slice CT. npj Digit. Med. 2024, 7, 335. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Salmi, A.; Jiang, F.; Yang, C.F. Enhancing Pulmonary Nodule Detection Rate Using 3D Convolutional Neural Networks with Optical Flow Frame Insertion Technique. IEEE Access 2024, 12, 112881–112895. [Google Scholar] [CrossRef]
- Chen, C.; Fu, Z.; Ye, S.; Zhao, C.; Golovko, V.; Ye, S.; Bai, Z. Study on high-precision three-dimensional reconstruction of pulmonary lesions and surrounding blood vessels based on CT images. Opt. Express 2023, 32, 1371. [Google Scholar] [CrossRef]
- Pan, X.; Cheng, J.; Hou, F.; Lan, R.; Lu, C.; Li, L.; Feng, Z.; Wang, H.; Liang, C.; Liu, Z.; et al. SMILE: Cost-sensitive multi-task learning for nuclear segmentation and classification with imbalanced annotations. Med. Image Anal. 2023, 88, 102867. [Google Scholar] [CrossRef]
- Liu, Y.; Mu, F.; Shi, Y.; Chen, X. SF-Net: A Multi-Task Model for Brain Tumor Segmentation in Multimodal MRI via Image Fusion. IEEE Signal Process. Lett. 2022, 29, 1799–1803. [Google Scholar] [CrossRef]
- Medical Segmentation Decathlon. Available online: http://medicaldecathlon.com/dataaws/ (accessed on 5 March 2026).
- Bricault, I.; Ferretti, G. A general tool for the evaluation of spiral CT interpolation algorithms revisiting the effect of pitch in multislice CT. IEEE Trans. Med. Imaging 2005, 24, 58–69. [Google Scholar] [CrossRef]
- Schaller, S.; Flohr, T.; Klingenbeck, K.; Krause, J.; Fuchs, T.; Kalender, W.A. Spiral Interpolation Algorithm for Multislice Spiral CT-Part I: Theory. IEEE Trans. Med. Imaging 2000, 19, 822–834. [Google Scholar] [CrossRef]
- Hahn, K.; Schöndube, H.; Stierstorfer, K.; Hornegger, J.; Noo, F. A comparison of linear interpolation models for iterative CT reconstruction. Med. Phys. 2016, 43, 6455–6473. [Google Scholar] [CrossRef]
- Schreibmann, E.; Chen, G.T.Y.; Xing, L. Image interpolation in 4D CT using a BSpline deformable registration model. Int. J. Radiat. Oncol. Biol. Phys. 2006, 64, 1537–1550. [Google Scholar] [CrossRef]
- Montesa, P.; Lauritsch, G. A temporal interpolation approach for dynamic reconstruction in perfusion CT. Med. Phys. 2007, 34, 3077–3092. [Google Scholar] [CrossRef]
- Marcos, L.; Babyn, P.; Alirezaie, J. Pure Vision Transformer (CT-ViT) with Noise2Neighbors Interpolation for Low-Dose CT Image Denoising. Deleted J. 2024, 37, 2669–2687. [Google Scholar] [CrossRef]
- Bai, H.; Zhou, X.; Zhao, Y.; Zhao, Y.; Han, Q. APFlowNet: An inter-layer interpolation approach for soil CT images based on CNN and bidirectional optical flow. Soil Tillage Res. 2024, 238, 106024. [Google Scholar] [CrossRef]
- Stefaniga, S.A.; Gaianu, M. An Approach of Segmentation Method Using Deep Learning for CT Medical Images. In 2019 21st International Symposium on Symbolic and Numeric Algorithms for Scientific Computing (SYNASC); IEEE: Piscataway, NJ, USA, 2019; pp. 273–279. [Google Scholar] [CrossRef]
- Dong, K.; Hu, P.; Zhu, Y.; Tian, Y.; Li, X.; Zhou, T.; Bai, X.; Liang, T.; Li, J. Attention-enhanced multiscale feature fusion network for pancreas and tumor segmentation. Med. Phys. 2024, 51, 8999–9016. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015; Springer: Cham, Switzerland, 2015; Volume 9351, pp. 234–241. [Google Scholar] [CrossRef]
- Shen, N.; Wang, Z.; Li, J.; Gao, H.; Lu, W.; Hu, P.; Feng, L. Multi-organ segmentation network for abdominal CT images based on spatial attention and deformable convolution. Expert Syst. Appl. 2023, 211, 118625. [Google Scholar] [CrossRef]
- Chen, Y.; Li, L.; Liu, X.; Su, X. A Multi-Task Framework for Infrared Small Target Detection and Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–9. [Google Scholar] [CrossRef]
- Cheng, J.; Liu, J.; Kuang, H.; Wang, J. A Fully Automated Multimodal MRI-Based Multi-Task Learning for Glioma Segmentation and IDH Genotyping. IEEE Trans. Med. Imaging 2022, 41, 1520–1532. [Google Scholar] [CrossRef] [PubMed]
- Salimi, Y.; Mansouri, Z.; Shiri, I.; Mainta, I.; Zaidi, H. Deep Learning–Powered CT-Less Multitracer Organ Segmentation From PET Images. Clin. Nucl. Med. 2025, 50, 289–300. [Google Scholar] [CrossRef]
- Zhou, L.; Wang, H. Multi-task model of adaptive multi-scale feature fusion and adaptive mixture-of-experts for equipment remaining useful life prediction and fault diagnosis. Expert Syst. Appl. 2025, 272, 126807. [Google Scholar] [CrossRef]
- He, Q.; Yang, Q.; Su, H.; Wang, Y. Multi-task learning for segmentation and classification of breast tumors from ultrasound images. Comput. Biol. Med. 2024, 173, 108319. [Google Scholar] [CrossRef]
- Zhou, L.; Wang, H.; Xu, S. Joint learning strategy of multi-scale multi-task convolutional neural network for aero-engine prognosis. Appl. Soft Comput. 2024, 160, 111726. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar] [CrossRef]
- Yang, J.; Qiu, P.; Zhang, Y.; Marcus, D.S.; Sotiras, A. D-Net: Dynamic large kernel with dynamic feature fusion for volumetric medical image segmentation. Biomed. Signal Process. Control 2026, 113, 108837. [Google Scholar] [CrossRef]
- Chen, J.; Lu, Y.; Yu, Q.; Luo, X.; Adeli, E.; Wang, Y.; Lu, L.; Yuille, A.L.; Zhou, Y. TransUNet: Transformers Make Strong Encoders for Medical Image Segmentation. arXiv 2021, arXiv:2102.04306. [Google Scholar] [CrossRef]
- Caruana, R. Multitask learning. Mach. Learn. 1997, 28, 41–75. [Google Scholar] [CrossRef]
- Ruder, S. An Overview of Multi-Task Learning in Deep Neural Networks. arXiv 2017, arXiv:1706.05098. [Google Scholar] [CrossRef]
- Bui, P.N.; Le, D.T.; Bum, J.; Han, J.C.; Pham, V.N.; Choo, H. Multi-scale feature enhancement in multi-task learning for medical image analysis. Artif. Intell. Med. 2026, 173, 103338. [Google Scholar] [CrossRef]
- Ma, J.; Zhao, Z.; Yi, X.; Chen, J.; Hong, L.; Chi, E.H. Modeling Task Relationships in Multi-task Learning with Multi-gate Mixture-of-Experts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1930–1939. [Google Scholar] [CrossRef]
- Chen, Z.; Badrinarayanan, V.; Lee, C.Y.; Rabinovich, A. GradNorm: Gradient Normalization for Adaptive Loss Balancing in Deep Multitask Networks. In Proceedings of the International Conference on Learning Representations (ICLR); Available online: https://openreview.net/forum?id=H1bM1fZCW (accessed on 15 May 2026).
- Liu, S.; Johns, E.J.; Davison, A.J. End-To-End Multi-Task Learning with Attention. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2019. [Google Scholar] [CrossRef]
- Song, H.; Mao, X.; Yu, J.; Li, Q.; Wang, Y. I3Net: Inter-Intra-Slice Interpolation Network for Medical Slice Synthesis. IEEE Trans. Med. Imaging 2024, 43, 3306–3318. [Google Scholar] [CrossRef]
- Gambini, L.; Gabbett, C.; Doolan, L.; Jones, L.; Coleman, J.N.; Gilligan, P.; Sanvito, S. Video frame interpolation neural network for 3D tomography across different length scales. Nat. Commun. 2024, 15, 7962. [Google Scholar] [CrossRef]
- Li, W.; Song, H.; Ai, D.; Shi, J.; Fan, J.; Xiao, D.; Fu, T.; Lin, Y.; Wu, W.; Yang, J. SFCLI-Net: Spatial-frequency collaborative learning interpolation network for Computed Tomography slice synthesis. Expert Syst. Appl. 2025, 272, 126602. [Google Scholar] [CrossRef]
- Xing, Z.; Ye, T.; Yang, Y.; Cai, D.; Gai, B.; Wu, X.-J.; Gao, F.; Zhu, L. SegMamba-V2: Long-range Sequential Modeling Mamba For General 3D Medical Image Segmentation. IEEE Trans. Med. Imaging 2025, 45, 4–15. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Wang, C.; Chen, Z.; Lei, Y.; Shan, H. HiDiff: Hybrid Diffusion Framework for Medical Image Segmentation. IEEE Trans. Med. Imaging 2024, 43, 3570–3583. [Google Scholar] [CrossRef] [PubMed]
- Wang, P.; Guo, D.; Zheng, D.; Zhang, M.; Yu, H.; Sun, X. Accurate Airway Tree Segmentation in CT Scans via Anatomy-aware Multi-class Segmentation and Topology-guided Iterative Learning. IEEE Trans. Med. Imaging 2024, 43, 4294–4306. [Google Scholar] [CrossRef]
- Wu, J.; Liu, X.; Wang, G.; Zhang, S. SicTTA: Single image continual test time adaptation for medical image segmentation. Med. Image Anal. 2026, 108, 103859. [Google Scholar] [CrossRef] [PubMed]



















| Method Category | Core Feature | Main Limitation |
|---|---|---|
| Traditional Interpolation | Linear/B-spline/Fourier reconstruction | Poor anatomical structure fitting, blurry detail |
| Deep Interpolation | Motion/texture based single-task interpolation | No segmentation constraint, lack frequency modeling |
| CNN Segmentation | U-Net variant local semantic extraction | Insufficient long-range dependency modeling |
| Transformer Segmentation | Global sequence feature modeling | Task-agnostic design for multi-task scenario |
| General MTL | Shared feature learning for correlated tasks | Simple fusion, fixed loss weight, no interpolation-segmentation joint design |
| Swin Medical Vision | Hierarchical window attention feature extraction | Lack task-adaptive module and frequency-domain embedding |
| Proposed TASC-SwinMT | Shared encoder + task-adaptive module + cross-task interaction + dynamic loss | Oriented for CT and MRI joint interpolation-segmentation |
| Stage | Input Dimension | Output Dimension | Number of Blocks | Number of Heads |
|---|---|---|---|---|
| 1 | 1 | 2 | ||
| 2 | 1 | 4 | ||
| 3 | 1 | 8 | ||
| 4 | 1 | 16 |
| Symbol | Description | Unit/Dimension |
|---|---|---|
| The i-th input frames CT for lung and MRI for heart | ||
| Predicted intermediate frame from image interpolation | ||
| Ground truth mask for medical image segmentation | ||
| Predicted mask from medical image segmentation | ||
| TALA | Task-Aware Lightweight Adapter module | - |
| MSTAF | Multi-Scale Task Alignment Fusion module | - |
| CTCI | Cross-Task Collaborative Interaction module | - |
| Channel attention weight in the TALA module | ||
| Balance parameter in hybrid interpolation loss | - | |
| Dynamic task fusion weights in the MSTAF module | Scalar | |
| Learnable parameters for dynamic multi-task loss balancing | ||
| Learnable weight projection matrix in MSTAF dynamic gating | - | |
| B | Batch size in model training | 2 |
| C | Channel dimension of intermediate feature maps | - |
| Height and width of input CT images | ||
| Number of multi-head attention heads | - | |
| Dimension of each attention head | - |
| Model | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| SwinUNet-Interp | 36.28 ± 0.47 *** | 0.952 ± 0.0021 *** | 0.114 ± 0.003 *** | 0.938 ± 0.002 *** |
| TransUNet-Interp | 37.51 ± 0.36 *** | 0.947 ± 0.0011 *** | 0.122 ± 0.002 *** | 0.929 ± 0.003 *** |
| VTN-Interp | 35.84 ± 0.42 *** | 0.958 ± 0.0017 *** | 0.107 ± 0.003 *** | 0.945 ± 0.001 *** |
| Ours | 41.50 ± 0.20 | 0.990 ± 0.0003 | 0.051 ± 0.001 | 0.988 ± 0.002 |
| Model | Dice | IoU | Precision | Recall |
|---|---|---|---|---|
| nnU-Net-Seg | 0.904 ± 0.015 *** | 0.871 ± 0.013 *** | 0.899 ± 0.017 *** | 0.907 ± 0.014 *** |
| UNet++-Seg | 0.887 ± 0.012 *** | 0.852 ± 0.016 *** | 0.882 ± 0.014 *** | 0.891 ± 0.015 *** |
| Swin UNETR-Seg | 0.915 ± 0.016 ** | 0.883 ± 0.012 *** | 0.911 ± 0.013 *** | 0.918 ± 0.016 ** |
| Ours | 0.967 ± 0.005 | 0.940 ± 0.007 | 0.969 ± 0.007 | 0.968 ± 0.005 |
| Model | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| SwinUNet-Interp | 35.71 ± 0.47 *** | 0.940 ± 0.0021 *** | 0.119 ± 0.003 *** | 0.926 ± 0.002 *** |
| TransUNet-Interp | 36.94 ± 0.36 *** | 0.935 ± 0.0011 *** | 0.127 ± 0.002 *** | 0.917 ± 0.003 *** |
| VTN-Interp | 35.27 ± 0.42 *** | 0.946 ± 0.0017 *** | 0.112 ± 0.003 *** | 0.933 ± 0.001 *** |
| Ours | 40.76 ± 0.38 | 0.974 ± 0.0019 | 0.058 ± 0.002 | 0.969 ± 0.002 |
| Model | Dice | IoU | Precision | Recall |
|---|---|---|---|---|
| nnU-Net-Seg | 0.874 ± 0.015 *** | 0.821 ± 0.013 *** | 0.870 ± 0.017 *** | 0.877 ± 0.014 *** |
| UNet++-Seg | 0.857 ± 0.012 *** | 0.802 ± 0.016 *** | 0.853 ± 0.014 *** | 0.861 ± 0.015 *** |
| Swin UNETR-Seg | 0.885 ± 0.016 ** | 0.833 ± 0.012 ** | 0.881 ± 0.013 ** | 0.888 ± 0.016 ** |
| Ours | 0.926 ± 0.014 | 0.869 ± 0.013 | 0.925 ± 0.015 | 0.928 ± 0.014 |
| Model | PSNR (dB) | SSIM | Dice | Recall |
|---|---|---|---|---|
| IndEnc | 38.33 ± 0.52 *** | 0.961 ± 0.0041 *** | 0.912 ± 0.024 ** | 0.915 ± 0.019 ** |
| +TALA | 39.46 ± 0.39 *** | 0.970 ± 0.0027 *** | 0.935 ± 0.016 ** | 0.938 ± 0.013 ** |
| +CTCI | 39.12 ± 0.67 *** | 0.968 ± 0.0033 *** | 0.930 ± 0.021 | 0.932 ± 0.017 ** |
| +MSTAF | 39.68 ± 0.34 *** | 0.972 ± 0.0022 *** | 0.941 ± 0.012 ** | 0.943 ± 0.010 ** |
| +TALA+CTCI | 40.25 ± 0.45 ** | 0.979 ± 0.0018 *** | 0.952 ± 0.009 * | 0.954 ± 0.008 * |
| +TALA+MSTAF | 40.61 ± 0.30 *** | 0.983 ± 0.0013 *** | 0.958 ± 0.007 | 0.959 ± 0.006 * |
| +CTCI+MSTAF | 40.42 ± 0.58 * | 0.981 ± 0.0015 *** | 0.955 ± 0.011 | 0.957 ± 0.009 |
| Ours | 41.50 ± 0.20 | 0.990 ± 0.0003 | 0.967 ± 0.005 | 0.968 ± 0.005 |
| Model | PSNR (dB) | SSIM | Dice | Recall |
|---|---|---|---|---|
| IndEnc | 37.76 ± 0.56 *** | 0.949 ± 0.0043 *** | 0.875 ± 0.030 * | 0.879 ± 0.026 ** |
| +TALA | 38.85 ± 0.42 *** | 0.958 ± 0.0031 *** | 0.892 ± 0.022 * | 0.895 ± 0.018 * |
| +CTCI | 38.51 ± 0.61 *** | 0.955 ± 0.0036 *** | 0.888 ± 0.027 * | 0.891 ± 0.023 * |
| +MSTAF | 39.07 ± 0.37 *** | 0.960 ± 0.0026 *** | 0.899 ± 0.019 * | 0.902 ± 0.015 * |
| +TALA+CTCI | 39.64 ± 0.49 ** | 0.965 ± 0.0021 *** | 0.910 ± 0.017 | 0.913 ± 0.014 |
| +TALA+MSTAF | 40.12 ± 0.33 * | 0.969 ± 0.0017 ** | 0.918 ± 0.020 | 0.920 ± 0.016 |
| +CTCI+MSTAF | 39.89 ± 0.53 * | 0.967 ± 0.0024 ** | 0.914 ± 0.018 | 0.916 ± 0.013 |
| Ours | 40.76 ± 0.38 | 0.974 ± 0.0019 | 0.926 ± 0.014 | 0.928 ± 0.014 |
| Weight | PSNR (dB) | Dice | Total Loss |
|---|---|---|---|
| 0.3/0.7 | 40.61 ± 0.42 ** | 0.955 ± 0.009 * | 0.00368 ± 0.0008 ** |
| 0.4/0.6 | 41.12 ± 0.25 * | 0.961 ± 0.006 | 0.00287 ± 0.00010 *** |
| 0.5/0.5 | 41.03 ± 0.31 * | 0.954 ± 0.007 * | 0.00335 ± 0.00011 *** |
| Ours | 41.50 ± 0.20 | 0.967 ± 0.005 | 0.00204 ± 0.00008 |
| Component | PSNR (dB) | SSIM (%) | Dice (%) | Recall (%) |
|---|---|---|---|---|
| TALA | 1.13 | 0.90 | 2.30 | 2.30 |
| CTCI | 0.79 | 0.70 | 1.80 | 1.70 |
| MSTAF | 1.35 | 1.10 | 2.90 | 2.80 |
| TALA+CTCI | 1.92 | 1.80 | 4.00 | 3.90 |
| TALA+MSTAF | 2.28 | 2.20 | 4.60 | 4.40 |
| CTCI+MSTAF | 2.09 | 2.00 | 4.30 | 4.20 |
| Full Framework | 3.17 | 2.90 | 5.50 | 5.30 |
| Component | PSNR (dB) | SSIM (%) | Dice (%) | Recall (%) |
|---|---|---|---|---|
| TALA | 1.09 | 0.90 | 1.70 | 1.60 |
| CTCI | 0.75 | 0.60 | 1.30 | 1.20 |
| MSTAF | 1.31 | 1.10 | 2.40 | 2.30 |
| TALA+CTCI | 1.88 | 1.60 | 3.50 | 3.40 |
| TALA+MSTAF | 2.36 | 2.00 | 4.30 | 4.10 |
| CTCI+MSTAF | 2.13 | 1.80 | 3.90 | 3.80 |
| Full Framework | 3.00 | 2.50 | 5.10 | 4.90 |
| Method | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| Direct Concatenation | 35.42 ± 0.35 *** | 0.936 ± 0.0026 *** | 0.125 ± 0.002 *** | 0.923 ± 0.003 *** |
| Shared-Bottom | 36.35 ± 0.32 *** | 0.945 ± 0.0015 *** | 0.116 ± 0.003 *** | 0.934 ± 0.004 *** |
| Cross-Task Concatenation | 37.68 ± 0.53 *** | 0.954 ± 0.0018 *** | 0.103 ± 0.002 *** | 0.946 ± 0.003 *** |
| MoE | 37.92 ± 0.41 *** | 0.957 ± 0.0012 *** | 0.098 ± 0.004 *** | 0.949 ± 0.001 *** |
| Ours | 41.50 ± 0.20 | 0.990 ± 0.0003 | 0.051 ± 0.001 | 0.988 ± 0.002 |
| Method | Dice | IoU | Precision | Recall |
|---|---|---|---|---|
| Direct Concatenation | 0.871 ± 0.009 *** | 0.818 ± 0.017 *** | 0.868 ± 0.019 *** | 0.873 ± 0.014 *** |
| Shared-Bottom | 0.883 ± 0.014 *** | 0.830 ± 0.012 *** | 0.880 ± 0.015 *** | 0.885 ± 0.016 *** |
| Cross-Task Concatenation | 0.896 ± 0.011 *** | 0.842 ± 0.015 *** | 0.893 ± 0.012 *** | 0.898 ± 0.023 *** |
| MoE | 0.901 ± 0.018 *** | 0.847 ± 0.010 *** | 0.898 ± 0.010 *** | 0.903 ± 0.009 *** |
| Ours | 0.967 ± 0.005 | 0.940 ± 0.007 | 0.969 ± 0.007 | 0.968 ± 0.005 |
| Strategy | PSNR (dB) | SSIM | Dice | Recall |
|---|---|---|---|---|
| Grad-Norm | 42.06 ± 0.22 | 0.9909 ± 0.0003 | 0.920 ± 0.007 *** | 0.918 ± 0.005 *** |
| DWA | 42.76 ± 0.14 | 0.9910 ± 0.0002 | 0.932 ± 0.004 *** | 0.929 ± 0.004 *** |
| Ours (Dynamic Loss) | 41.50 ± 0.20 | 0.9904 ± 0.0002 | 0.967 ± 0.005 | 0.968 ± 0.005 |
| Model | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| TASC-Interp | 39.25 ± 0.28 *** | 0.971 ± 0.0008 *** | 0.078 ± 0.002 *** | 0.965 ± 0.0005 *** |
| Ours-MT | 41.50 ± 0.20 | 0.990 ± 0.0003 | 0.051 ± 0.001 | 0.988 ± 0.002 |
| Model | Dice | IoU | Precision | Recall | TCS |
|---|---|---|---|---|---|
| TASC-Seg | 0.942 ± 0.008 *** | 0.915 ± 0.009 ** | 0.945 ± 0.006 *** | 0.943 ± 0.007 *** | 0.921 ± 0.004 *** |
| Ours-MT | 0.967 ± 0.005 | 0.940 ± 0.007 | 0.969 ± 0.007 | 0.968 ± 0.005 | 0.976 ± 0.002 |
| Model | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| TASC-Interp | 38.42 ± 0.35 *** | 0.953 ± 0.0015 *** | 0.089 ± 0.002 *** | 0.947 ± 0.0018 *** |
| Ours-MT | 40.76 ± 0.38 | 0.974 ± 0.0019 | 0.058 ± 0.002 | 0.969 ± 0.002 |
| Model | Dice | IoU | Precision | Recall | TCS |
|---|---|---|---|---|---|
| TASC-Seg | 0.903 ± 0.011 * | 0.846 ± 0.012 * | 0.901 ± 0.013 * | 0.905 ± 0.012 * | 0.908 ± 0.005 *** |
| Ours-MT | 0.926 ± 0.014 | 0.869 ± 0.013 | 0.925 ± 0.015 | 0.928 ± 0.014 | 0.963 ± 0.003 |
| Model | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| ACVTT | 39.42 ± 0.32 *** | 0.976 ± 0.0016 *** | 0.068 ± 0.008 ** | 0.972 ± 0.008 ** |
| Net | 39.85 ± 0.28 *** | 0.981 ± 0.0012 *** | 0.062 ± 0.007 * | 0.978 ± 0.006 * |
| Video Interp Net | 40.21 ± 0.34 *** | 0.985 ± 0.0010 *** | 0.058 ± 0.011 | 0.982 ± 0.005 |
| SFCLI-Net | 41.43 ± 0.19 | 0.991 ± 0.0007 | 0.048 ± 0.009 | 0.984 ± 0.010 |
| Ours | 41.50 ± 0.20 | 0.990 ± 0.0003 | 0.051 ± 0.001 | 0.988 ± 0.002 |
| Model | Dice | IoU | Precision | Recall |
|---|---|---|---|---|
| SegMamba-V2 | 0.965 ± 0.014 | 0.936 ± 0.015 | 0.973 ± 0.013 | 0.961 ± 0.013 |
| HiDiff | 0.969 ± 0.011 | 0.938 ± 0.019 | 0.965 ± 0.012 | 0.969 ± 0.017 |
| Anatomy-Aware Seg | 0.958 ± 0.022 | 0.931 ± 0.014 | 0.960 ± 0.020 | 0.959 ± 0.016 |
| SicTTA | 0.955 ± 0.008 * | 0.928 ± 0.010 | 0.957 ± 0.013 | 0.956 ± 0.012 |
| Ours | 0.967 ± 0.005 | 0.940 ± 0.007 | 0.969 ± 0.007 | 0.968 ± 0.005 |
| Model | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| ACVTT | 38.65 ± 0.36 *** | 0.959 ± 0.0021 *** | 0.074 ± 0.007 ** | 0.953 ± 0.008 ** |
| Net | 39.12 ± 0.31 *** | 0.964 ± 0.0014 *** | 0.069 ± 0.005 ** | 0.958 ± 0.006 * |
| Video Interp Net | 39.58 ± 0.37 ** | 0.968 ± 0.0018 *** | 0.065 ± 0.004 * | 0.963 ± 0.012 |
| SFCLI-Net | 40.71 ± 0.26 | 0.975 ± 0.0013 | 0.058 ± 0.005 | 0.968 ± 0.005 |
| Ours | 40.76 ± 0.38 | 0.974 ± 0.0019 | 0.058 ± 0.002 | 0.969 ± 0.002 |
| Model | Dice | IoU | Precision | Recall |
|---|---|---|---|---|
| SegMamba-V2 | 0.922 ± 0.016 | 0.861 ± 0.019 | 0.933 ± 0.021 | 0.924 ± 0.023 |
| HiDiff | 0.929 ± 0.021 | 0.865 ± 0.021 | 0.930 ± 0.023 | 0.926 ± 0.024 |
| Anatomy-Aware Seg | 0.921 ± 0.023 | 0.864 ± 0.026 | 0.920 ± 0.028 | 0.922 ± 0.018 |
| SicTTA | 0.918 ± 0.029 | 0.861 ± 0.031 | 0.917 ± 0.026 | 0.919 ± 0.032 |
| Ours | 0.926 ± 0.014 | 0.869 ± 0.013 | 0.925 ± 0.015 | 0.928 ± 0.014 |
| Model | Avg_Infer_Time (ms) | Avg_FPS | Parameters (M) |
|---|---|---|---|
| SwinUNet + nnU-Net | 72.4835 ± 0.3625 | 24.8635 ± 0.2947 | 186.7251 ± 0.0000 |
| TransUNet + UNETR | 68.7526 ± 0.2531 | 27.9526 ± 0.2615 | 142.3685 ± 0.0000 |
| VTN + UNet++ | 63.9418 ± 0.1428 | 30.2745 ± 0.3526 | 68.5122 ± 0.0000 |
| ACVTT + SegMamba-V2 | 86.3745 ± 0.4836 | 19.6428 ± 0.2532 | 235.4126 ± 0.0000 |
| Net + HiDiff | 81.2634 ± 0.4428 | 21.5836 ± 0.2746 | 218.6573 ± 0.0000 |
| VideoNet+Anatomy-Aware | 75.8427 ± 0.4135 | 23.4752 ± 0.2689 | 195.3248 ± 0.0000 |
| SFCLI-Net + SicTTA | 78.5316 ± 0.4267 | 22.3641 ± 0.2634 | 207.8965 ± 0.0000 |
| Ours | 58.8218 ± 0.2153 | 34.1846 ± 0.1862 | 33.3589 ± 0.0000 |
| Model | Memory (MB) | FLOPs (G) | Model Size (MB) |
|---|---|---|---|
| SwinUNet + nnU-Net | 2896.5327 ± 1.4236 | 489.2243 ± 0.0000 | 512.4637 ± 0.0000 |
| TransUNet + UNETR | 2643.7418 ± 1.2531 | 396.7118 ± 0.0000 | 468.5726 ± 0.0000 |
| VTN + UNet++ | 2487.6235 ± 0.8625 | 187.3359 ± 0.0000 | 426.7418 ± 0.0000 |
| ACVTT + SegMamba-V2 | 3952.8416 ± 1.8624 | 658.3215 ± 0.0000 | 648.7532 ± 0.0000 |
| Net + HiDiff | 3726.7325 ± 1.7235 | 612.5894 ± 0.0000 | 601.4627 ± 0.0000 |
| VideoNet+Anatomy-Aware | 3485.6417 ± 1.5362 | 547.2368 ± 0.0000 | 553.8716 ± 0.0000 |
| SFCLI-Net + SicTTA | 3614.5238 ± 1.6428 | 583.4572 ± 0.0000 | 579.6425 ± 0.0000 |
| Ours | 2252.1050 ± 1.3247 | 114.4042 ± 0.0000 | 380.9202 ± 0.0000 |
| Dataset | PSNR (dB) | SSIM | LPIPS | EdgeSSIM |
|---|---|---|---|---|
| Liver CT Dataset | 38.7054 ± 0.25 | 0.9655 ± 0.0025 | 0.0611 ± 0.0018 | 0.8589 ± 0.0032 |
| COVID-19 CT Dataset | 38.1579 ± 0.32 | 0.9588 ± 0.0028 | 0.0468 ± 0.0021 | 0.8824 ± 0.0029 |
| Dataset | Dice | IoU | Precision | Recall |
|---|---|---|---|---|
| Liver CT Dataset | 0.9653 ± 0.0042 | 0.9360 ± 0.0051 | 0.9684 ± 0.0038 | 0.9647 ± 0.0045 |
| COVID-19 CT Dataset | 0.9046 ± 0.0053 | 0.8309 ± 0.0062 | 0.9169 ± 0.0049 | 0.8970 ± 0.0058 |
| Dataset | Parameters (M) | Memory (MB) | FLOPs (G) |
|---|---|---|---|
| Liver CT Dataset | 33.3589 ± 0.0000 | 2252.1050 ± 1.0532 | 114.4042 ± 0.0000 |
| COVID-19 CT Dataset | 33.3589 ± 0.0000 | 2253.6519 ± 0.9849 | 114.4042 ± 0.0000 |
| Dataset | Avg_Infer_Time (ms) | Avg_FPS | Model Size (MB) |
|---|---|---|---|
| Liver CT Dataset | 56.6527 ± 1.25 | 35.3148 ± 0.35 | 380.9202 ± 0.0000 |
| COVID-19 CT Dataset | 56.2045 ± 0.98 | 35.5976 ± 0.42 | 380.9202 ± 0.0000 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Sun, Y.; Yang, Y.; Bao, N. TASC-SwinMT: Task-Adaptive Synergistic Cross-Task Swin Multi-Task Framework for CT and MRI Image Interpolation and Segmentation. Tomography 2026, 12, 80. https://doi.org/10.3390/tomography12060080
Sun Y, Yang Y, Bao N. TASC-SwinMT: Task-Adaptive Synergistic Cross-Task Swin Multi-Task Framework for CT and MRI Image Interpolation and Segmentation. Tomography. 2026; 12(6):80. https://doi.org/10.3390/tomography12060080
Chicago/Turabian StyleSun, Yujia, Yingying Yang, and Nan Bao. 2026. "TASC-SwinMT: Task-Adaptive Synergistic Cross-Task Swin Multi-Task Framework for CT and MRI Image Interpolation and Segmentation" Tomography 12, no. 6: 80. https://doi.org/10.3390/tomography12060080
APA StyleSun, Y., Yang, Y., & Bao, N. (2026). TASC-SwinMT: Task-Adaptive Synergistic Cross-Task Swin Multi-Task Framework for CT and MRI Image Interpolation and Segmentation. Tomography, 12(6), 80. https://doi.org/10.3390/tomography12060080

