Multi-Modal Image Registration Problem Integrating Multi-Scale Strategy and Deep Learning
Abstract
1. Introduction
- The integration of the multi-scale strategy into the multi-modal registration network, which yields significant improvements in multi-modal registration accuracy.
- A synergistic mechanism combining dual consistency constraints with the multi-scale strategy. Ablation studies demonstrate that the dual consistency constraints effectively suppress errors induced by the multi-scale strategy.
2. Methods
2.1. Image Registration Framework Based on Deep Learning
2.1.1. Affine Transformation Network—AT-Net
2.1.2. Deformable Transformation Network—DT-Net
2.1.3. Dual Consistency-Constrained Bi-Directional Image Transformation
2.2. Image Registration Problem Integrating Multi-Scale Strategy and Deep Learning
2.2.1. Multi-Scale Strategy
- (1)
- Coarse Registration (Low Resolution)—Low resolution images are used for coarse registration, reducing computational load and speeding up processing.
- (2)
- Incremental Registration (Increasing Resolution)—After the coarse registration, the image resolution is progressively improved, and the registration process is repeated. With each increase in resolution, more detailed information becomes available for more accurate registration. This iterative process can be conducted multiple times, with each iteration refining the registration outcomes to enhance precision.
- (3)
- Fine Registration (High Resolution)—In the final phase, precise registration is conducted using full-resolution images. Given that the images are already roughly aligned at this point, the registration process can focus on making minor adjustments to achieve more precise registration.
2.2.2. Coarse-to-Fine Multi-Contrast MR Image Registration Framework with Dual Consistency Constraint
2.2.3. Loss Function
2.3. Variational Formulation and Numerical Optimization
2.3.1. Unified Variational Model
- (1)
- denotes the continuous and differentiable approximation of mutual information (MI), serving as the data fidelity term. The negative sign converts the similarity maximization into a minimization problem.
- (2)
- is the regularization term enforcing spatial smoothness and topological preservation.
- (3)
- and are weighting parameters balancing fidelity and regularity.
- (4)
- is the admissible space of transformation fields, typically constrained to satisfy almost everywhere to guarantee diffeomorphic deformations.
- (1)
- Lower Boundedness: Mutual information satisfies . Thus, . Combined with is bounded from below. Under our construction (Parzen window with Gaussian kernel and bounded image intensities), the mutual information estimator is uniformly bounded above, i.e., Thus .
- (2)
- The regularization term is designed to be coercive in , i.e., as , which dominates the bounded fidelity term.
- (3)
- Weak Lower Semi-continuity: Under standard intensity regularity assumptions, the smoothed MI estimator is weakly continuous on bounded subsets of [27], and convex regularizers are weakly lower semi-continuous. Hence, is weakly lower semi-continuous on .
2.3.2. Multi-Scale Strategy as a Numerical Solver
2.3.3. Discrete Optimization and Convergence
2.3.4. Mathematical Role of Each Loss Term and Regularization Component
- (1)
- imposes a smoothness constraint on the deformation field to prevent unphysical deformations;
- (2)
- is a multi-objective joint loss function, in which the parameters and control the relative importance of its internal sub-terms;
- (3)
- enforces inverse consistency by minimizing the mean squared error between the forward-warped image and the inverse-warped image, thereby guaranteeing the topological validity and diffeomorphic property of the deformation field.
3. Experiments and Results
3.1. Experimental Environment and Dataset
3.2. Evaluation Metric
3.3. Experiments
3.3.1. Effectiveness of the Multi-Scale Strategy
3.3.2. The Influence of Different Learning Rates and Network Widths
3.3.3. Weight Results of Different Loss Functions
3.3.4. Parameter Sensitivity and Robustness Analysis
3.3.5. Time Consumption Analysis of Inverse Transformation
3.3.6. Comparison of Registration Effect with Other Models
3.3.7. Statistical Significance
- (1)
- Hypothesis Testing and Significance Verification
- (2)
- Win-Rate Distribution and Advantage Zone Analysis
- (3)
- Attribution of Underperforming Samples and Boundary Condition Discussion
- (4)
- Comprehensive Assessment
3.3.8. Ablation Study
- (1)
- The Multi-scale Strategy Serves as the Core Engine for Overcoming Baseline Bottlenecks
- (2)
- Dual Consistency and Affine Pre-alignment Provide Robust Foundational Alignment Gains
- (3)
- The Full Model Achieves Synergistic Optimization in Both Accuracy and Stability
3.3.9. Scalability Analysis
3.3.10. Time Complexity Analysis
- (1)
- Per-Level Computational Complexity
- Feature Extraction and Registration Network Forward Pass: Since the network employs exclusively local convolutions, spatial interpolation, and point-wise non-linear activations, the computational cost scales linearly with the number of pixels at that level, i.e., O(Nl).
- Deformation Field Upsampling: Upsampling the deformation field σl−1 from the previous level to the current resolution via bilinear interpolation requires traversing all pixels at the current level, resulting in a computational cost of O(Nl).
- Incremental deformation field Composition: At the l-th level, the deformation field predicted at the current level must be composed with the coarse deformation field obtained from the previous level (see preceding text for the specific formulation). This operation requires one bilinear interpolation per pixel; thus, its computational cost remains O(Nl). Although the constant overhead is higher than that of pure addition due to the involved coordinate transformations and interpolations, the absence of nested loops or non-linear searches ensures it remains a linear-time operation. Consequently, the total cost at each level is maintained at O(Nl), preserving the overall linear complexity conclusion.
- (2)
- Multi-Scale Strategy
- (3)
- Complexity Conclusion
3.3.11. Quantitative Comparison Based on Normalized Cross-Correlation
- (1)
- NCC is the most widely adopted and theoretically sound similarity measure for intensity-based registration, directly reflecting local intensity consistency between deformed images;
- (2)
- In BraTS2020 multi-modal (T1, T1ce, T2, FLAIR) registration tasks, NCC demonstrates robustness to grayscale distribution variations and correlates highly with anatomical alignment.
4. Discussion and Conclusions
Funding
Data Availability Statement
Conflicts of Interest
References
- Menze, B.H.; Jakab, A.; Bauer, S.; Kalpathy-Cramer, J.; Farahani, K.; Kirby, J.; Burren, Y.; Porz, N.; Slotboom, J.; Wiest, R.; et al. The Multimodal Brain Tumor Image Segmentation Benchmark (BRATS). IEEE Trans. Med. Imaging 2015, 34, 1993–2024. [Google Scholar] [CrossRef]
- Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.S.; Freymann, J.B.; Farahani, K.; Davatzikos, C. Advancing The Cancer Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features. Nat. Sci. Data 2017, 4, 170117. [Google Scholar] [CrossRef]
- Bakas, S.; Reyes, M.; Jakab, A.; Bauer, S.; Rempfler, M.; Crimi, A.; Shinohara, R.T.; Berger, C.; Ha, S.M.; Rozycki, M.; et al. Identifying the Best Machine Learning Algorithms for Brain Tumor Segmentation, Progression Assessment, and Overall Survival Prediction in the BRATS Challenge. arXiv 2018, arXiv:1811.02629. [Google Scholar] [CrossRef]
- Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.; Freymann, J.; Farahani, K.; Davatzikos, C. Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-GBM collection. Cancer Imaging Arch. 2017, 7. [Google Scholar] [CrossRef]
- Bakas, S.; Akbari, H.; Sotiras, A.; Bilello, M.; Rozycki, M.; Kirby, J.; Freymann, J.; Farahani, K.; Davatzikos, C. Segmentation Labels and Radiomic Features for the Pre-operative Scans of the TCGA-LGG collection. Cancer Imaging Arch. 2017. [Google Scholar] [CrossRef]
- Arad, N.; Dyn, N.; Reisfeld, D.; Yeshurun, Y. Image warping by radial basis functions: Application to facial expressions. CVGIP Graph. Models Image Process. 1994, 56, 161–172. [Google Scholar] [CrossRef]
- Yang, X.; Xue, Z.; Liu, X.; Xiong, D. Topology preservation evaluation of compact-support radial basis functions for image registration. Pattern Recognit. Lett. 2011, 32, 1162–1177. [Google Scholar] [CrossRef]
- Kybic, J.; Unser, M. Fast parametric elastic image registration. IEEE Trans. Image Process. 2003, 12, 1427–1442. [Google Scholar] [CrossRef]
- Rueckert, D.; Sonoda, L.I.; Hayes, C.; Hill, D.L.; Leach, M.O.; Hawkes, D.J. Nonrigid registration using free-form deformations: Application to breast MR images. IEEE Trans. Med. Imaging 1999, 18, 712–721. [Google Scholar] [CrossRef]
- Sdika, M. A fast nonrigid image registration with constraints on the Jacobian using large scale constrained optimization. IEEE Trans. Med. Imaging 2008, 27, 271–281. [Google Scholar] [CrossRef]
- Besl, P.J.; McKay, N.D. A method for registration of 3-D shapes. IEEE Trans. Pattern Anal. Mach. Intell. 1992, 14, 239–256. [Google Scholar] [CrossRef]
- Chui, H.; Rangarajan, A. A new point matching algorithm for non-rigid registration. Comput. Vis. Image Underst. 2003, 89, 114–141. [Google Scholar] [CrossRef]
- Yang, T.; Bai, X.; Cui, X.; Gong, Y.; Li, L. TransDIR: Deformable imaging registration network based on transformer to improve the feature extraction ability. Med. Phys. 2022, 49, 952–965. [Google Scholar] [CrossRef]
- Yang, T.; Bai, X.; Cui, X.; Gong, Y.; Li, L. GraformerDIR: Graph convolution transformer for deformable image registration. Comput. Biol. Med. 2022, 147, 105799. [Google Scholar] [CrossRef] [PubMed]
- Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28, 2017–2025. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Yang, T.; Bai, X.; Cui, X.; Gong, Y.; Li, L. DAU-Net: An unsupervised 3D brain MRI registration model with dual-attention mechanism. Int. J. Imaging Syst. Technol. 2023, 33, 217–229. [Google Scholar] [CrossRef]
- De Vos, B.D.; Berendsen, F.F.; Viergever, M.A.; Staring, M.; Išgum, I. End-to-end unsupervised deformable image registration with a convolutional neural network. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support; Springer International Publishing: Cham, Switzerland, 2017; pp. 204–212. [Google Scholar]
- Shan, S.; Yan, W.; Guo, X.; Chang, E.I.; Fan, Y.; Xu, Y. Unsupervised end-to-end learning for deformable medical image registration. arXiv 2017, arXiv:1711.08608. [Google Scholar]
- Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. Voxelmorph: A learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800. [Google Scholar] [CrossRef]
- Huang, W.; Yang, H.; Liu, X.; Li, C.; Zhang, I.; Wang, R.; Wang, S. A coarse-to-fine deformable transformation framework for unsupervised multi-contrast MR image registration with dual consistency constraint. IEEE Trans. Med. Imaging 2021, 40, 2589–2599. [Google Scholar] [CrossRef]
- Chen, J.Y.; Frey, E.C.; He, Y.F.; Segars, W.P.; Li, Y.; Du, Y. TransMorph: Transformer for Unsupervised Medical Image Registration. Med. Image Anal. 2022, 82, 102615. [Google Scholar] [CrossRef]
- Zhu, J.K.; Zheng, B.Y.; Xiong, B.; Zhang, Y.X.; Cui, M.; Sun, D.Y.; Cai, J.; Xie, Y.Q.; Qin, W.J. SynMSE: A multimodal similarity evaluator for complex distribution discrepancy in unsupervised deformable multimodal medical image registration. Med. Image Anal. 2025, 103, 103620. [Google Scholar] [CrossRef]
- Lara-Hernandez, A.; Rienmüller, T.; Juárez, I.; Pérez, M.; Reyna, F.; Baumgartner, D.; Baumgartner, C. Deep learning-based image registration in dynamic myocardial perfusion CT imaging. IEEE Trans. Med. Imaging 2022, 42, 684–696. [Google Scholar] [CrossRef] [PubMed]
- Yoo, I.; Hildebrand, D.G.; Tobin, W.F.; Lee, W.C.A.; Jeong, W.K. ssEMnet: Serial-section electron microscopy image registration using a spatial transformer network with learned features. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, Proceedings of the Third International Workshop, DLMIA 2017, and 7th International Workshop, ML-CDS 2017, Québec City, QC, Canada, 14 September 2017; Springer International Publishing: Cham, Switzerland, 2017; pp. 249–257. [Google Scholar]
- Vajda, I. Theory of Statistical Inference and Information; Kluwer Academic Publishers: Dordrecht, The Netherlands, 1989; Volume 11, p. 54. [Google Scholar]
- Hermosillo, G.; Chefd’Hotel, C.; Faugeras, O. Variational methods for multimodal image matching. Int. J. Comput. Vis. 2002, 50, 329–343. [Google Scholar] [CrossRef]
- Dacorogna, B. Direct Methods in the Calculus of Variations, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
- Bertsekas, D.P. Nonlinear Programming, 2nd ed.; Athena Scientific: Belmont, MA, USA, 1999. [Google Scholar]
- Dalca, A.V.; Balakrishnan, G.; Guttag, J.; Sabuncu, M.R. Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces. Med. Image Anal. 2019, 57, 226–236. [Google Scholar] [CrossRef] [PubMed]
- Wang, S.; Cao, S.; Wei, D.; Wang, R.; Ma, K.; Wang, L.; Zheng, Y. LT-Net: Label transfer by learning reversible voxel-wise correspondence for one-shot medical image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9162–9171. [Google Scholar]







| Category | Item | Specification/Value |
|---|---|---|
| Implementation | Framework | TensorFlow 1.10.0 (Keras Backend) |
| Operating System | Linux | |
| Hardware | NVIDIA RTX 2080 Ti GPU | |
| Dataset | Source | BraTS2020 (Training + Validation) [1,2,3,4,5] |
| Modality | T2 (Moving) to T1 (Fixed) Registration | |
| Partition | 320 Training/32 Validation/32 Test pairs | |
| Training Strategy | Optimizer | Adam(Dynamic momentum adjustment) |
| Learning Rate | 1 × 10−3 | |
| Batch Size | 8 | |
| Epochs | 300 | |
| Augmentation | Random shifts, rotations, scaling, horizontal flips | |
| AT-Net | Structure | 5 Downsampling blocks + 2 Fully Connected layers |
| Block Detail | 2 × (3 × 3 Conv) + 1 × (2 × 2 Max-pooling) | |
| Output | 6 Affine transformation parameters | |
| Parameters | 588 k (Trainable) | |
| DT-Net | Structure | Improved U-Net (Encoder–Decoder) |
| Final Layer | 2 × (3 × 3 Conv) with Linear activation | |
| Output | Dense deformation field | |
| Parameters | 1474 k (Trainable) |
| Group | MI | Sec/Slice (GPU) | Sec/Slice (CPU) |
|---|---|---|---|
| The first group | 1.200 ± 0.111 | 0.0217 | 0.2135 |
| The second group | 1.253 ± 0.113 | 0.0297 | 0.2824 |
| The third group | 1.329 ± 0.122 | 0.0316 | 0.3235 |
| The fourth group | 1.340 ± 0.136 | 0.0452 | 0.5234 |
| Group | MI | GPU Time (s/Slice) | ΔMI vs. Previous | ΔGPU vs. Previous (s) | Marginal Efficiency (ΔMI/ΔGPU) |
|---|---|---|---|---|---|
| 1 (Baseline) | 1.200 ± 0.111 | 0.0217 | — | — | — |
| 2 | 1.253 ± 0.113 | 0.0297 | +0.053 | +0.0080 | 6.63 |
| 3 (Default) | 1.329 ± 0.122 | 0.0316 | +0.076 | +0.0019 | 40.00 |
| 4 | 1.340 ± 0.136 | 0.0452 | +0.011 | +0.0136 | 0.81 |
| Loss Function | λ1 | λ2 | λ3 | λ4 | MI |
|---|---|---|---|---|---|
| Losstotal(F, M) | 1 | 4 | 100 | 100 | 1.098 ± 0.081 |
| 1 | 20 | 100 | 100 | 0.931 ± 0.061 | |
| 1 | 30 | 100 | 100 | 1.273 ± 0.114 | |
| 1 | 50 | 100 | 100 | 1.329 ± 0.122 | |
| 10 | 50 | 100 | 100 | 1.251 ± 0.115 | |
| 1 | 50 | 500 | 100 | 1.320 ± 0.119 | |
| 1 | 50 | 100 | 500 | 0.908 ± 0.057 |
| Method | Sec/Slice (GPU) | Sec/Slice (CPU) |
|---|---|---|
| VM-diff | 0.0672 | 0.2353 |
| LT-Net | 0.0423 | 0.2752 |
| ours | 0.0235 | 0.2132 |
| Method | MI | Sec/Slice (GPU) | Sec/Slice (CPU) |
|---|---|---|---|
| SyN | 0.962 ± 0.068 | - | 3.2312 |
| VM | 1.163 ± 0.105 | 0.0134 | 0.1934 |
| C-F-I-R | 1.200 ± 0.111 | 0.0217 | 0.2135 |
| ours | 1.329 ± 0.122 | 0.0316 | 0.3235 |
| TransMorph | 1.365 ± 0.098 | 0.124 | 0.4873 |
| DiffuseMorph | 1.392 ± 0.086 | 0.537 | 1.8346 |
indicates its availability).
indicates its availability).| Exp ID | Model Variant | Multi-Scale Strategy | Dual Consistency Constraint | Affine Pre-Alignment | MI |
|---|---|---|---|---|---|
| Exp ID 0 | Baseline | ✕ | ✕ | ✕ | 1.023 ± 0.117 |
| Exp ID 1 | + Multi-scale Strategy | ![]() | ✕ | ✕ | 1.237 ± 0.135 |
| Exp ID 2 | C-F-I-R | ✕ | ![]() | ![]() | 1.200 ± 0.111 |
| Exp ID 3 | + Affine Pre-alignment | ✕ | ✕ | ![]() | 1.215 ± 0.125 |
| Exp ID 4 | Ours (Full Model) | ![]() | ![]() | ![]() | 1.329 ± 0.122 |
| Metric | C-F-I-R | Ours |
|---|---|---|
| Mean ± Std | 0.8169 ± 0.0712 | 0.8217 ± 0.0693 |
| Median [IQR] | 0.8185 [0.7621–0.8705] | 0.8218 [0.7687–0.8726] |
| Min/Max | 0.6724/0.9615 | 0.6935/0.9601 |
| Valid Pairs | 32/32 | 32/32 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, J. Multi-Modal Image Registration Problem Integrating Multi-Scale Strategy and Deep Learning. Mathematics 2026, 14, 2131. https://doi.org/10.3390/math14122131
Zhang J. Multi-Modal Image Registration Problem Integrating Multi-Scale Strategy and Deep Learning. Mathematics. 2026; 14(12):2131. https://doi.org/10.3390/math14122131
Chicago/Turabian StyleZhang, Jiting. 2026. "Multi-Modal Image Registration Problem Integrating Multi-Scale Strategy and Deep Learning" Mathematics 14, no. 12: 2131. https://doi.org/10.3390/math14122131
APA StyleZhang, J. (2026). Multi-Modal Image Registration Problem Integrating Multi-Scale Strategy and Deep Learning. Mathematics, 14(12), 2131. https://doi.org/10.3390/math14122131
