RA-BiMENet: Continuous-Time 4D Medical Image Interpolation via Relation-Aware Bi-Directional Motion Estimation
Abstract
1. Introduction
- We propose RA-BiMENet, a relation-aware bi-directional motion estimation network for 4D medical image interpolation, enabling accurate intermediate-frame prediction at arbitrary time points.
- We developed a spatiotemporal transform MLP (TS-MLP) module for bi-directional motion estimation, in which a relation-aware multi-scale MLP (RAM-MLP) unit was designed to capture complex nonlinear motion through local correlation modeling and multi-scale dependency learning.
- We developed a hierarchical spatiotemporal fusion (HSTF) module to fuse cross-frame features via forward warping and self-attention, improving detail reconstruction while preserving temporal consistency.
- Extensive experiments on the ACDC and 4D-Lung datasets demonstrate that RA-BiMENet generates high-fidelity and temporally continuous interpolated frames, outperforming existing state-of-the-art methods.
2. Related Works
2.1. Video Frame Interpolation
2.2. Four-Dimensional Medical Image Interpolation
2.3. Multilayer Perceptron (MLP)
2.4. Multi-Scale Feature Fusion
3. Methodology
3.1. Overview
3.2. Spatiotemporal Transform MLP (TS-MLP)
3.3. Temporal Positional Encoding
3.4. Hierarchical Spatiotemporal Fusion (HSTF)
3.5. Loss Function
4. Experiments
4.1. Dataset
4.2. Experimental Setup
4.3. Comparison with Other Methods
4.4. Ablation Study
- Architectural design analysis. As shown in Table 2, the baseline model equipped with RAM-MLP significantly outperforms VoxelMorph [20] and Hire-MLP [38] on all evaluation metrics. This improvement mainly stems from the design of RAM-MLP. VoxelMorph relies on convolutional structures and therefore struggles to capture long-range dependencies in high-resolution volumetric data. Although Hire-MLP introduces MLP-based modeling, its single-scale design limits its ability to characterize multi-scale deformations. In contrast, RAM-MLP incorporates a relation-aware multi-scale modeling mechanism, enabling it to capture complex nonlinear motion across spatial levels and achieve a better balance between global consistency and local detail restoration.
- Effect of the correlation modeling module. To evaluate the contribution of correlation modeling, we compared the performance of RAM-MLP with and without the correlation computation module. As shown in Table 3, introducing correlation modeling consistently achieves improvements on all metrics on both the Cardiac and 4D-Lung datasets. This demonstrates that local correlation information is beneficial for enhancing cross-frame motion representation, thereby improving the accuracy of spatiotemporal transformation estimation and reducing interpolation errors.
- Effect of the multi-branch design. We further analyzed the influence of different branch combinations in RAM-MLP. As shown in Table 4, removing any one of the scale branches (, , or ) leads to performance degradation on both datasets. The best results are achieved only when all three branches are retained, indicating strong complementarity among multi-scale representations. Specifically, the and branches are beneficial for capturing fine-grained and medium-scale deformations, while the branch is more effective for modeling large displacements. This design enables RAM-MLP to achieve a favorable balance between local detail recovery and global motion consistency.
5. Discussion
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Pan, T.; Lee, T.Y.; Rietzel, E.; Chen, G.T. 4D-CT imaging of a volume influenced by respiratory motion on multi-slice CT. Med. Phys. 2004, 31, 333–340. [Google Scholar] [CrossRef] [PubMed]
- Sun, X.; Cheng, L.H.; Plein, S.; Garg, P.; van der Geest, R.J. Deep learning based automated left ventricle segmentation and flow quantification in 4D flow cardiac MRI. J. Cardiovasc. Magn. Reson. 2024, 26, 100003. [Google Scholar] [CrossRef] [PubMed]
- Zhang, W.; Brady, J.M.; Becher, H.; Noble, J.A. Spatio-temporal (2D+ T) non-rigid registration of real-time 3D echocardiography and cardiovascular MR image sequences. Phys. Med. Biol. 2011, 56, 1341–1360. [Google Scholar] [CrossRef] [PubMed]
- Canè, F.; Verhegghe, B.; De Beule, M.; Bertrand, P.B.; Van der Geest, R.J.; Segers, P.; De Santis, G. From 4D medical images (CT, MRI, and ultrasound) to 4D structured mesh models of the left ventricular endocardium for patient-specific simulations. BioMed Res. Int. 2018, 2018, 7030718. [Google Scholar] [CrossRef] [PubMed]
- Mory, C.; Janssens, G.; Rit, S. Motion-aware temporal regularization for improved 4D cone-beam computed tomography. Phys. Med. Biol. 2016, 61, 6856–6877. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Huang, X.; Wang, J. Advanced 4-dimensional cone-beam computed tomography reconstruction by combining motion estimation, motion-compensated reconstruction, biomechanical modeling and deep learning. Vis. Comput. Ind. Biomed. Art 2019, 2, 23. [Google Scholar] [CrossRef] [PubMed]
- Küstner, T.; Pan, J.; Gilliam, C.; Qi, H.; Cruz, G.; Hammernik, K.; Yang, B.; Blu, T.; Rueckert, D.; Botnar, R.; et al. Deep-learning based motion-corrected image reconstruction in 4D magnetic resonance imaging of the body trunk. In Proceedings of the 2020 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC); IEEE: New York, NY, USA, 2020; pp. 976–985. [Google Scholar]
- Tripathi, V.R.; Tibdewal, M.N.; Mishra, R. A survey on motion artifact correction in magnetic resonance imaging for improved diagnostics. SN Comput. Sci. 2024, 5, 281. [Google Scholar] [CrossRef]
- Werlberger, M.; Pock, T.; Unger, M.; Bischof, H. Optical flow guided TV-L1 video interpolation and restoration. In Proceedings of the International Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition; Springer: Berlin/Heidelberg, Germany, 2011; pp. 273–286. [Google Scholar]
- Jeong, S.G.; Lee, C.; Kim, C.S. Motion-compensated frame interpolation based on multihypothesis motion estimation and texture optimization. IEEE Trans. Image Process. 2013, 22, 4497–4509. [Google Scholar] [CrossRef] [PubMed]
- Guo, Y.; Bi, L.; Ahn, E.; Feng, D.; Wang, Q.; Kim, J. A spatiotemporal volumetric interpolation network for 4d dynamic medical image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 4726–4735. [Google Scholar]
- Chen, J.; Frey, E.C.; He, Y.; Segars, W.P.; Li, Y.; Du, Y. Transmorph: Transformer for unsupervised medical image registration. Med. Image Anal. 2022, 82, 102615. [Google Scholar] [CrossRef] [PubMed]
- Liu, R.; Li, Y.; Tao, L.; Liang, D.; Zheng, H.T. Are we ready for a new paradigm shift? A survey on visual deep mlp. Patterns 2022, 3, 100520. [Google Scholar] [CrossRef] [PubMed]
- Wolterink, J.M.; Zwienenberg, J.C.; Brune, C. Implicit neural representations for deformable image registration. In Proceedings of the International Conference on Medical Imaging with Deep Learning; PMLR: New York, NY, USA, 2022; pp. 1349–1359. [Google Scholar]
- Niklaus, S.; Mai, L.; Liu, F. Video frame interpolation via adaptive separable convolution. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 261–270. [Google Scholar]
- Gui, S.; Wang, C.; Chen, Q.; Tao, D. Featureflow: Robust video interpolation via structure-to-texture generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 14004–14013. [Google Scholar]
- Liu, Z.; Yeh, R.A.; Tang, X.; Liu, Y.; Agarwala, A. Video frame synthesis using deep voxel flow. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 4463–4471. [Google Scholar]
- Jin, X.; Wu, L.; Shen, G.; Chen, Y.; Chen, J.; Koo, J.; Hahm, C.h. Enhanced bi-directional motion estimation for video frame interpolation. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: New York, NY, USA, 2023; pp. 5049–5057. [Google Scholar]
- Hu, M.; Jiang, K.; Zhong, Z.; Wang, Z.; Zheng, Y. Iq-vfi: Implicit quadratic motion estimation for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 6410–6419. [Google Scholar]
- Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. Voxelmorph: A learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800. [Google Scholar] [CrossRef] [PubMed]
- Kim, B.; Ye, J.C. Diffusion deformable model for 4D temporal medical image generation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2022; pp. 539–548. [Google Scholar]
- Wei, T.T.; Kuo, C.; Tseng, Y.C.; Chen, J.J. Mpvf: 4d medical image inpainting by multi-pyramid voxel flows. IEEE J. Biomed. Health Inform. 2023, 27, 5872–5882. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Yoon, H.; Park, G.; Kim, K.; Yang, E. Data-efficient unsupervised interpolation without any intermediate frame for 4D medical images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 11353–11364. [Google Scholar]
- Kang, J.; Jung, E.C.; Koo, H.J.; Yang, D.H.; Ha, H. Flow-Rate-Constrained Physics-Informed Neural Networks for Flow Field Error Correction in Four-Dimensional Flow Magnetic Resonance Imaging. IEEE Trans. Med. Imaging 2025, 44, 5155–5171. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Van Rozendaal, T.; Brehmer, J.; Nagel, M.; Cohen, T. Implicit neural video compression. arXiv 2021, arXiv:2112.11312. [Google Scholar] [CrossRef]
- Guo, Z.; Li, W.; Loy, C.C. Generalizable implicit motion modeling for video frame interpolation. Adv. Neural Inf. Process. Syst. 2024, 37, 63747–63770. [Google Scholar] [CrossRef]
- Wang, X.; Wang, S.; Xiong, H.; Xuan, K.; Zhuang, Z.; Liu, M.; Shen, Z.; Zhao, X.; Zhang, L.; Wang, Q. Spatial attention-based implicit neural representation for arbitrary reduction of MRI slice spacing. Med. Image Anal. 2024, 94, 103158. [Google Scholar] [CrossRef] [PubMed]
- Hill, J.; Khokher, M.R.; Nguyen, C.; Adcock, M.; Li, R.; Anderson, S.; Morrell, T.; Diprose, T.; Salvado, O.; Wang, D.; et al. Automated instance segmentation and registration of spinal vertebrae from CT-Scans with an improved 3D U-net neural network and corner point registration. Comput. Biol. Med. 2025, 196, 110663. [Google Scholar] [CrossRef] [PubMed]
- Ying, J.; Cattell, R.; Zhao, T.; Lei, L.; Jiang, Z.; Hussain, S.M.; Gao, Y.; Chow, H.H.S.; Stopeck, A.T.; Thompson, P.A.; et al. Two fully automated data-driven 3D whole-breast segmentation strategies in MRI for MR-based breast density using image registration and U-Net with a focus on reproducibility. Vis. Comput. Ind. Biomed. Art 2022, 5, 25. [Google Scholar] [CrossRef] [PubMed]
- Xie, Y.; Zhang, J.; Shen, C.; Xia, Y. Cotr: Efficiently bridging cnn and transformer for 3d medical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2021; pp. 171–180. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Bernard, O.; Lalande, A.; Zotti, C.; Cervenansky, F.; Yang, X.; Heng, P.A.; Cetin, I.; Lekadir, K.; Camara, O.; Ballester, M.A.G.; et al. Deep learning techniques for automatic MRI cardiac multi-structures segmentation and diagnosis: Is the problem solved? IEEE Trans. Med. Imaging 2018, 37, 2514–2525. [Google Scholar] [CrossRef]
- Hugo, G.D.; Weiss, E.; Sleeman, W.C.; Balik, S.; Keall, P.J.; Lu, J.; Williamson, J.F. Data from 4D Lung Imaging of NSCLC Patients. 2016. Available online: https://cir.nii.ac.jp/crid/1880302167830331392 (accessed on 1 January 2016).
- Dosselmann, R.; Yang, X.D. Existing and emerging image quality metrics. In Proceedings of the Canadian Conference on Electrical and Computer Engineering; IEEE: New York, NY, USA, 2005; pp. 1906–1913. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 586–595. [Google Scholar]
- Zhang, Q.; Lei, Y.; Zheng, Z.; Chen, Z.; Xie, Z. Test time training for 4D medical image interpolation. In Proceedings of the 2025 International Joint Conference on Neural Networks (IJCNN); IEEE: New York, NY, USA, 2025; pp. 1–8. [Google Scholar]
- Guo, J.; Tang, Y.; Han, K.; Chen, X.; Wu, H.; Xu, C.; Xu, C.; Wang, Y. Hire-mlp: Vision mlp via hierarchical rearrangement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 826–836. [Google Scholar]






| Method | Param (M) | Cardiac | 4D-Lung | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| PSNR ↑ | NCC ↑ | SSIM ↑ | NMSE ↓ | LPIPS ↓ | PSNR ↑ | NCC ↑ | SSIM ↑ | NMSE ↓ | LPIPS ↓ | ||
| SVIN | 8.4 | 32.51 | 0.559 | 0.972 | 2.930 | 1.535 | 30.99 | 0.312 | 0.973 | 0.852 | 2.182 |
| MPVF | 26.45 | 33.15 | 0.562 | 0.971 | 2.847 | 1.458 | 31.18 | 0.310 | 0.972 | 0.761 | 2.554 |
| VM | 0.3 | 31.02 | 0.555 | 0.966 | 4.254 | 1.772 | 32.29 | 0.316 | 0.977 | 0.641 | 2.063 |
| DDM | 10.92 | 29.71 | 0.541 | 0.956 | 5.007 | 2.136 | 30.37 | 0.308 | 0.971 | 0.905 | 2.283 |
| UVI-Net | 0.51 | 33.52 | 0.565 | 0.977 | 2.433 | 1.134 | 34.00 | 0.320 | 0.980 | 0.552 | 1.489 |
| TTT4MII | 14.29 | 33.55 | 0.565 | 0.977 | 2.414 | 1.129 | 34.02 | 0.320 | 0.981 | 0.495 | 1.414 |
| Ours | 7.52 | 33.85 | 0.566 | 0.978 | 2.257 | 1.198 | 35.42 | 0.328 | 0.986 | 0.359 | 1.185 |
| Dataset | TS-MLP | PSNR ↑ | NCC ↑ | SSIM ↑ | NMSE ↓ | LPIPS ↓ |
|---|---|---|---|---|---|---|
| Cardiac | VoxelMorph | 33.17 | 0.562 | 0.971 | 2.847 | 1.458 |
| Hire-MLP | 33.43 | 0.564 | 0.974 | 2.537 | 1.322 | |
| RAM-MLP | 33.85 | 0.566 | 0.978 | 2.257 | 1.198 | |
| 4D-Lung | VoxelMorph | 33.48 | 0.318 | 0.979 | 0.612 | 1.754 |
| Hire-MLP | 34.35 | 0.322 | 0.983 | 0.472 | 1.334 | |
| RAM-MLP | 35.42 | 0.328 | 0.986 | 0.359 | 1.185 |
| Dataset | Correlation | PSNR ↑ | NCC ↑ | SSIM ↑ | NMSE ↓ | LPIPS ↓ |
|---|---|---|---|---|---|---|
| Cardiac | 33.63 | 0.565 | 0.977 | 2.395 | 1.276 | |
| ✓ | 33.85 | 0.566 | 0.978 | 2.257 | 1.198 | |
| 4D-Lung | 35.28 | 0.326 | 0.985 | 0.397 | 1.226 | |
| ✓ | 35.42 | 0.328 | 0.986 | 0.359 | 1.185 |
| Dataset | RAM-MLP Branch | PSNR ↑ | NCC ↑ | SSIM ↑ | NMSE ↓ | LPIPS ↓ | ||
|---|---|---|---|---|---|---|---|---|
| Cardiac | ✓ | ✓ | 33.57 | 0.563 | 0.975 | 2.445 | 1.273 | |
| ✓ | ✓ | 33.49 | 0.562 | 0.974 | 2.458 | 1.322 | ||
| ✓ | ✓ | 33.53 | 0.563 | 0.975 | 2.451 | 1.288 | ||
| ✓ | ✓ | ✓ | 33.85 | 0.566 | 0.978 | 2.257 | 1.198 | |
| 4D-Lung | ✓ | ✓ | 35.38 | 0.327 | 0.985 | 0.394 | 1.285 | |
| ✓ | ✓ | 35.34 | 0.326 | 0.984 | 0.435 | 1.319 | ||
| ✓ | ✓ | 35.36 | 0.327 | 0.985 | 0.409 | 1.296 | ||
| ✓ | ✓ | ✓ | 35.42 | 0.328 | 0.986 | 0.359 | 1.185 | |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Li, L.; Lyu, J. RA-BiMENet: Continuous-Time 4D Medical Image Interpolation via Relation-Aware Bi-Directional Motion Estimation. Sensors 2026, 26, 3034. https://doi.org/10.3390/s26103034
Li L, Lyu J. RA-BiMENet: Continuous-Time 4D Medical Image Interpolation via Relation-Aware Bi-Directional Motion Estimation. Sensors. 2026; 26(10):3034. https://doi.org/10.3390/s26103034
Chicago/Turabian StyleLi, Liangjiang, and Jun Lyu. 2026. "RA-BiMENet: Continuous-Time 4D Medical Image Interpolation via Relation-Aware Bi-Directional Motion Estimation" Sensors 26, no. 10: 3034. https://doi.org/10.3390/s26103034
APA StyleLi, L., & Lyu, J. (2026). RA-BiMENet: Continuous-Time 4D Medical Image Interpolation via Relation-Aware Bi-Directional Motion Estimation. Sensors, 26(10), 3034. https://doi.org/10.3390/s26103034

