1. Introduction
Medical image registration is a fundamental problem in medical image analysis, aiming to establish dense anatomical and semantic correspondences between images. These images may be acquired at different time points, from different subjects, or using different imaging modalities. Accurate registration is a prerequisite for a wide range of downstream clinical and research tasks. It enables clinicians and researchers to track disease progression over time, evaluate the effectiveness of therapeutic interventions, quantify structural or functional changes, and analyze anatomical variability across patient populations.
The primary challenges in medical image registration arise from variability in image appearance and complex nonlinear anatomical deformations. Intensity variations caused by differences in imaging protocols, scanner hardware, or acquisition modalities often make direct voxel-wise comparisons unreliable. In addition, anatomical structures may undergo substantial non-rigid deformations due to inter-subject variability, disease progression, respiration, or surgical interventions. These factors complicate the estimation of accurate spatial correspondences and motivate the development of robust registration frameworks capable of handling both appearance changes and large deformations.
Recent advances in self-supervised contrastive learning have shown strong potential for addressing these challenges in medical imaging [
1,
2,
3,
4,
5]. Contrastive objectives enable networks to learn semantically meaningful voxel-wise representations that remain robust to appearance variability and anatomical deformations. Existing registration approaches typically adopt a two-stage training strategy. First, a feature extractor is pretrained using a contrastive objective independently of the registration task. Second, the pretrained encoder is frozen and used to generate features for registration optimization. While effective, this decoupled design does not explicitly align feature learning with the downstream registration objective.
In this work, we introduce CoRe (
Contrastive learning for medical image
Registration) (The code is available at
https://github.com/EytanKats/reg-ssl, accessed on 23 May 2026), a framework that jointly optimizes deformable image registration and self-supervised contrastive learning. Building upon the hybrid registration framework of Bigalke et al. [
6], CoRe incorporates a self-supervised equivariant contrastive loss into the training objective. This enables online joint optimization of representation learning and deformable registration within a unified training process. In contrast to prior contrastive registration approaches such as SAMConvex [
1], which rely on separately pretrained and frozen feature extractors, CoRe does not require a dedicated pretraining stage and instead continuously adapts the learned feature representations to the downstream registration objective throughout training. The differences between the approaches are illustrated in
Figure 1.
The primary contributions of this work are as follows:
We propose a joint optimization strategy that integrates an online self-supervised equivariant contrastive objective directly into a deformable registration framework.
We show that jointly optimizing contrastive and registration objectives yields improved registration accuracy compared to separate pretraining or registration-only optimization.
We evaluate the proposed approach on abdominal and thoracic CT registration benchmarks in both inter-patient and intra-patient settings, demonstrating competitive performance against conventional, learning-based, and hybrid registration methods.
2. Related Work
Structural image representations: Traditional structural representation methods [
7,
8,
9,
10] aim to extract anatomical descriptors that are more robust to intensity variations than raw image intensities. These hand-crafted representations capture local structural patterns while reducing sensitivity to acquisition differences across modalities or scanners. Registration algorithms subsequently estimate spatial transformations by comparing descriptor similarity rather than raw intensities.
Supervised metric learning: Deep metric learning approaches replace hand-crafted descriptors with learned feature representations optimized to minimize distances between corresponding anatomical locations in aligned image pairs [
11]. Such methods can capture complex anatomical characteristics and tissue variability more effectively than manually designed descriptors. However, they require accurately aligned training data, which is expensive and difficult to obtain in medical imaging applications.
Self-supervised contrastive learning in medical imaging: Self-supervised contrastive learning has recently emerged as a powerful paradigm for representation learning in medical imaging [
12,
13,
14,
15,
16]. By maximizing agreement between augmented views of the same image, contrastive learning enables models to learn semantically meaningful representations without requiring manual annotations. Data augmentation plays a critical role in this process. Intensity augmentations encourage invariance to appearance variations, whereas geometric transformations such as rotations, scaling, and elastic deformations promote robustness to spatial variability. More recent work has incorporated equivariance constraints into contrastive objectives [
2,
17,
18], ensuring that transformations in the input space induce predictable transformations in the embedding space. Such equivariant representations are particularly relevant for deformable registration, where anatomical structures undergo spatial transformations.
Contrastive learning for medical image registration: Recent studies have demonstrated that contrastive learning can generate dense feature representations well suited for deformable registration [
1,
2,
3,
4,
5]. Existing approaches mainly differ in how feature extraction is integrated with deformation estimation.
Some methods extract features after deformation estimation. Mok et al. [
4] pretrain a feature extractor using contrastive learning and subsequently apply a mean squared error loss between features extracted from fixed and warped moving images during registration. Similarly, ContraReg [
5] applies a contrastive objective to dense multi-scale feature maps extracted from fixed and warped images using a pretrained autoencoder. In both approaches, the feature extractor is pretrained independently and remains frozen during registration training.
Other methods extract features prior to deformation estimation and use them as inputs to the registration framework. CoMIR [
2] employs supervised contrastive learning on aligned multimodal image pairs to map images into a shared latent space, followed by separate registration training. SAMConvex [
1] and SAME [
3] leverage Self-supervised Anatomical eMbeddings (SAM) [
15] for registration. SAMConvex combines SAM embeddings with convex optimization strategies [
19], while SAME integrates SAM features into a VoxelMorph-based registration framework [
20].
CoRe follows the pre-deformation feature extraction paradigm, where features are extracted independently from the fixed and moving images prior to deformation estimation and subsequently processed by a differentiable optimization module to infer the deformation field. In contrast to previous approaches that rely on independently pretrained embeddings, CoRe jointly optimizes feature learning and deformable registration within a unified framework by integrating a self-supervised equivariant contrastive objective directly into the registration process. This joint optimization enables the learned representations to remain robust to tissue deformations while being specifically tailored for accurate deformation estimation.
3. Materials and Methods
3.1. Problem Definition
Let denote the fixed and moving images, respectively. The training dataset consists of image pairs . The registration framework comprises a trainable feature extractor and a deterministic optimization module . Given and it predicts a displacement field . Ideally, the intensity values and should correspond to the same anatomical location, where represents warped by the spatial transformation induced by u. The objective is to train to extract high-quality features, enabling to compute an optimal displacement field for accurate image alignment.
In this work, we incorporate equivariance constraints, formulated through a contrastive objective (
Section 3.3), directly into the registration framework (
Section 3.2). This integration ensures that the internal feature representations corresponding to identical anatomical locations remain robust to tissue deformations.
Figure 2 presents an overview of the proposed joint optimization strategy, highlighting the simultaneous optimization of the feature extractor under both contrastive and registration objectives. Algorithm 1 outlines the pseudo-code for the training procedure, detailing the steps involved in leveraging the synergistic interaction between these two objectives to enhance registration accuracy and robustness.
| Algorithm 1: Joint training procedure of CoRe for a single stage t. |
![Sensors 26 03425 i001 Sensors 26 03425 i001]() |
3.2. Registration Framework
We use a hybrid registration pipeline comprising a convolutional feature extractor
, a convolutional projection head
, and a differentiable optimization module
, that infers a displacement field from the fixed and moving features. The optimization module
employs a differentiable version of the coupled convex-discrete optimization framework [
19]. The optimization begins by constructing a 6D correlation volume over a discrete mesh grid of relative displacements, computing the feature similarities between the projected fixed embeddings
and moving embeddings
. Next, a quadratic penalty term is added along the displacement dimensions of the cost volume to act as a regularizing coupling term. Finally, to enable end-to-end backpropagation, the traditional non-differentiable argmin operator is replaced with a softmin function across the displacement dimensions, followed by a point-wise multiplication with the discrete grid to compute a continuous expectation of displacements.
The pipeline begins with
extracting feature representations,
and
, from the fixed and moving images, respectively. These representations are then passed through the projection head
, and the resulting embeddings are processed by the optimizer
, which predicts the displacement field
u as follows:
We adopt the self-training scheme with pseudo-labels [
6] as a strong baseline for deformable image registration. Training proceeds in
M stages. At the beginning of each stage
, the registration pipeline
generates displacement fields
for all image pairs. These fields are refined through an instance optimization process comprising three key steps [
6]. First, a forward-backward consistency check is applied by estimating both the forward displacement field
and the backward field
, and subsequently minimizing the discrepancy between them. Second, a double warping procedure is employed, which warps the moving image with the inferred displacement field prior to repeating the registration steps. Third, an instance optimization loop is executed for a fixed number of iterations per image pair to jointly minimize the regularization cost and feature dissimilarity.
Pseudo-label generation is performed on the original image pairs without data augmentation. During stage t training, affine augmentations are applied to the input images, and the corresponding pseudo-labels are transformed accordingly to obtain augmented displacement fields , ensuring consistency between the supervision signal and the augmented image pairs. At the start of training, pseudo-labels are generated using randomly initialized and .
The training objective minimizes the mean squared error (MSE) loss between the displacement fields
predicted at training step of stage
t and the pseudo-labels
generated at the beginning of that stage:
To enhance the diversity of transformations during training, augmentation is applied to the pseudo displacement field (
Figure 2). Specifically, the fixed and moving images,
and
, are each transformed using unique random affine augmentations
and
, respectively. The pseudo displacement field
is then adjusted to account for these affine transformations, resulting in the augmented displacement field
.
3.3. Equivariance Constraint
The quality of the displacement field u generated by the optimizer relies on the quality of the features extracted by the network . Ideally, the embeddings produced by for the same anatomical location in the moving image and the fixed image should be identical, regardless of geometric deformations between the images. Such consistency in feature embeddings provides with a robust initialization, allowing it to generate accurate displacement fields.
While the registration loss
(
Section 3.2) naturally improves the features extracted by
during training, we propose incorporating a contrastive objective to further refine feature quality. Specifically, to address the challenges posed by geometric deformations in tissue, we introduce an equivariance constraint on the image embeddings.
This constraint enforces consistency between embeddings derived from the same image I under geometric transformations. We apply an affine transformation , sampled from a predefined augmentation set , to the image I. The constraint enforces consistency between the transformed features of the original image, , and the features extracted from the transformed image, . Note that the objective enforces geometric equivariance rather than invariance. Consequently, the model is encouraged to satisfy , ensuring that geometric transformations in the input space induce corresponding transformations in the feature space. By enforcing this property, the model learns representations that are robust to tissue deformations - an essential requirement for registration tasks, where features must remain consistent across anatomical distortions.
We implement the equivariance constraint using an InfoNCE loss [
21] applied to feature vectors extracted from corresponding spatial locations in the feature maps
and
. Let
and
denote feature vectors sampled from the
jth spatial location in
and
, where
. Each feature vector
forms one positive pair with the corresponding vector
and
negative pairs with other feature vectors sampled from
and
. The contrastive loss is then defined as:
where
,
denotes the inner product, and
is a temperature scaling factor that controls the sharpness of the similarity distribution. In our experiments, we set
.
During training, the equivariance constraint is applied independently to the fixed and moving images. The total contrastive loss is therefore computed as the sum of the losses evaluated on the corresponding feature pairs:
where
and
denote the feature pairs constructed from the fixed and moving images, respectively.
3.4. Joint Optimization
By jointly minimizing the registration loss (
Section 3.2) and the contrastive loss (
Section 3.3), the proposed framework ensures that the feature representations extracted by the network
are robust to geometric transformations while remaining well-suited to the optimization procedure defined by the optimizer
. The registration loss drives the alignment of fixed and moving images, encouraging the feature extractor to generate embeddings that are specifically tailored for consumption by the optimizer. Concurrently, the contrastive loss provides valuable guidance to the optimization process by imposing equivariance to geometric distortions, fostering the consistency of feature embeddings for same anatomical locations in registered images. The joint optimization process integrates the strengths of both losses, improving the robustness and accuracy of the registration framework. The combined loss function is defined as:
where
is a weighting coefficient that balances the contributions of the contrastive loss and the registration objective.
3.5. Implementation Details
The feature extractor consists of four convolutional blocks with convolutions, batch normalization, and ReLU activations. The projection head consists of a single convolutional block with 128 output channels, a kernel size of , and a stride of 2, followed by a final convolutional layer with a kernel size of , which projects the feature maps to 16 channels. The framework is trained for stages, with each stage consisting of 1000 iterations and a batch size of 2. Optimization is performed using the Adam optimizer, and the learning rate follows a cosine annealing warm restart schedule, decaying from to . The contrastive loss is applied to the output of the final block’s convolutional layer, with 1000 feature vectors sampled per image pair. All training and inference experiments were conducted on a single NVIDIA A100 GPU (NVIDIA Corporation, Santa Clara, CA, USA).
3.6. Datasets
We evaluate the performance of the proposed method on the challenging inter-patient abdominal CT registration dataset [
22]. This dataset comprises 30 3D abdominal CT scans from different patients, with 13 manually labeled anatomical structures: spleen, right kidney, left kidney, gall bladder, esophagus, liver, stomach, aorta, inferior vena cava, portal and splenic vein, pancreas, left adrenal gland, and right adrenal gland. All images are resampled to a uniform voxel resolution of 2 mm and standardized to spatial dimensions of
voxels. The training-test split of this dataset defined in Learn2Reg challenge [
23] widely adapted in the medical image registration community which facilitates direct comparison with prior works. Specifically, the training set includes 20 scans (190 image pairs), while the test set consists of 10 scans (45 image pairs).
To evaluate performance in the intra-patient setting, we utilize the RAD-ChestCT dataset [
24]. In this dataset, we identified 371 longitudinal scan pairs. We split the data to 300 pairs designated for training and 71 pairs for testing. The CT images are resampled to a consistent voxel resolution of 1.5 mm and spatial dimensions of
voxels. Since the RAD-ChestCT dataset does not include manual segmentation labels, we employ the TotalSegmentator tool [
25] to segment the CT scans. Using the resulting segmentations, we calculate registration accuracy across 22 anatomical structures: 5 lung lobes, vertebrae from T1 to T12, heart myocardium, left and right heart ventricles and atriums.
4. Results and Discussion
To assess accuracy of the registration, we compute the average Dice similarity coefficient () using available segmented structures. The plausibility of the deformation fields is evaluated using the standard deviation of the logarithm of the Jacobian determinant (). Additionally, we report inference run-time () across methods.
4.1. Registration Results
We compare our method with conventional registration approaches (NiftyReg [
26] and DEEDs [
27]), learning-based methods (VoxelMorph [
20], LapIRN [
28] and uniGradIcon [
29]), and two hybrid approaches (Bigalke et al. [
6] and SAMConvex [
1]) (
Table 1). NiftyReg uses multi-resolution optimization with mutual information, while DEEDs relies on edge-based similarity with B-spline deformation. VoxelMorph and LapIRN directly regress dense displacement fields using convolutional neural networks, with LapIRN incorporating multi-scale refinement. uniGradICON improves robustness via gradient inverse consistency (GradICON [
30]) and is trained on a diverse collection of data. During inference, we employ the instance-specific optimization option provided by uniGradICON, which fine-tunes the pretrained model weights for each image pair to achieve improved performance. Bigalke et al. and SAMConvex are hybrid approaches that leverage CNNs for feature extraction from image pairs and classical optimization techniques for displacement field estimation. SAMConvex uses a pretrained SAM model [
15] for feature extraction, while Bigalke et al. optimize the feature extractor with a differentiable optimizer and registration loss (
Figure 1).
VoxelMorph and LapIRN are computationally efficient, however, they often underperform compared to traditional and hybrid methods. uniGradICON achieves strong results on both datasets, but relies on instance-specific optimization during inference, which leads to longer inference times. DEEDS achieves strong results on the RadChestCT dataset, ranking as the second-best method. This performance is expected due to its focus on optimizing edge similarity, which is highly effective for intra-patient thoracic datasets where edges in image pairs align closely. However, on the AbdomenCT dataset, where deformations between image pairs are more complex, DEEDS demonstrates lower accuracy compared to hybrid methods. Hybrid approaches combine deep learning’s ability to extract robust features with the precision and reliability of classical optimization techniques for displacement field estimation. This synergy enables hybrid methods to achieve state-of-the-art performance on the challenging inter-patient AbdomenCT dataset while maintaining competitive results on RadChestCT. Our proposed CoRe method achieves the best performance on both datasets, delivering the highest Dice scores (DSC) while preserving smoothness in the predicted displacement fields (SDLogJ), comparable to competitive methods. These results underscore the effectiveness of our approach, which incorporates an equivariance-based contrastive objective directly into the registration framework, enabling performance improvement for image registration tasks.
Figure 3 presents qualitative registration results of the proposed method on the AbdomenCT and RadChestCT datasets.
4.2. Ablations Study
To assess the effectiveness of the proposed method, we trained the feature extractor
using regularization loss and contrastive loss independently and compared the results with the proposed joint optimization approach (
Table 2). For the contrastive loss, we initially pretrained
using only contrastive objective and subsequently trained the registration framework with
frozen using only registration objective. The joint optimization approach demonstrates superior performance, underscoring the synergistic benefits of combining these objectives. This strategy facilitates the extraction of more discriminative and spatially coherent features, enhancing registration accuracy across datasets.
Along with the equivariance constraint described in
Section 3.3, self-supervised contrastive learning methods commonly employ non-linear intensity augmentations during pretraining to promote feature invariance to appearance changes while preserving spatial encoding. To evaluate their impact, we train our framework with geometric equivariance and appearance invariance constraints independently and jointly (
Table 3). Interestingly, the results reveal that within the proposed framework, non-linear intensity augmentations do not provide additional benefits over training solely with the geometric equivariance constraint. For the AbdomenCT dataset, training with intensity augmentations for contrastive loss even results in inferior performance compared to using the registration objective alone. We hypothesize that this is due to the mono-modal nature of the CT datasets used in our evaluation. The standardized intensity values in the CT datasets may limit the effectiveness of intensity augmentations, as they do not enhance the discriminative capacity of the learned features. Future work may explore the utility of intensity augmentations in multi-modal settings or datasets with greater intensity variability, where these augmentations could play a more significant role in improving registration performance.
We further evaluate the performance of the proposed joint contrastive-registration framework with respect to different values of the weighting coefficient
(
Figure 4a), which controls the contribution of the contrastive loss in the total objective (Equation (
5)). Incorporating the contrastive component with
already yields a measurable improvement, increasing the Dice score by 0.93% compared to the baseline trained without contrastive supervision. As
increases, we observe a gradual improvement in performance, suggesting that a stronger emphasis on the contrastive objective encourages the learning of more robust and deformation-consistent feature representations. The best performance, with a Dice score of 52.59%, is achieved at
, indicating a favorable balance between registration accuracy and representation learning. Increasing
beyond this value leads to a decline in performance, which may indicate that excessive weighting of the contrastive objective can interfere with the optimization of the registration task. Nevertheless, even at higher values of
, the proposed framework consistently outperforms the baseline, supporting the effectiveness of the joint optimization strategy.
Figure 4b illustrates the effect of the contrastive loss across different stages of training. The most pronounced improvement over the baseline is observed during the early training phases, highlighting the impact of contrastive supervision in guiding the optimization process. The contrastive objective provides an informative learning signal at the beginning of training, enabling the model to converge more rapidly toward meaningful feature representations that are beneficial for registration. This is reflected in a performance gap of 6.12% Dice after the first 1000 iterations. After only 2000 iterations, corresponding to one quarter of the total training, the proposed joint optimization strategy already achieves a Dice score of 51.26%, surpassing the final performance of the baseline, 51.1% Dice. Although the performance gap decreases as training progresses, it remains significant throughout the optimization and persists until convergence. These results suggest that integrating contrastive learning not only improves final performance but also contributes to faster convergence.
Negative samples in the contrastive objective act as a regularization mechanism that prevents trivial solutions and promotes the formation of a well-structured latent feature space [
21]. In the proposed joint optimization framework, the additional registration objective already constrains the optimization process, reducing the likelihood of convergence to a trivial solution even when no negative samples are used. Nevertheless, training with only a cosine similarity objective, without negative samples, results in inferior performance compared to optimization using only the registration loss (50.2% versus 51.1% Dice on AbdomenCT), indicating insufficient feature discrimination and potential feature collapse. This setting corresponds to the case of zero negative samples in
Figure 4c. Furthermore, the number of negative samples has a positive impact on registration accuracy (
Figure 4c). Increasing the number of negative samples improves the discriminative capacity of the learned representations, leading to progressively better registration performance. However, these improvements gradually saturate as the number of negative samples increases, while the associated memory consumption grows substantially. This observation suggests that a moderate number of negative samples provides the reasonable trade-off between registration accuracy and computational efficiency.
5. Conclusions
We introduced CoRe, a hybrid image registration framework that integrates contrastive learning into the registration pipeline. We demonstrated that jointly optimizing the feature extractor under both contrastive and registration objectives facilitates the learning of semantically coherent and discriminative features, tailored to the requirements of classical optimization procedures. Our findings emphasize the important role of equivariant geometric constraints, implemented through contrastive loss, in enabling the extraction of robust features. These features are particularly effective in handling tissue deformations, thereby improving registration performance. In addition, our analysis shows that the inclusion of contrastive supervision accelerates convergence, especially during the early stages of training, where the model benefits from a stronger and more informative learning signal.
A key limitation of the current study is its exclusive evaluation on mono-modal thoracic and abdominal CT datasets. As indicated by our experimental findings, standard intensity augmentations did not yield significant performance gains, likely due to the standardized Hounsfield Units (HU) in CT imaging, which inherently simplifies intensity mapping. Future work will focus on adapting CoRe to different imaging modalities as well as multi-modal scenarios. This will necessitate investigating contrastive loss strategies capable of accommodating non-linear intensity relationships present when aligning features across different imaging modalities. Furthermore, while offering superior alignment accuracy, CoRe exhibits a higher inference time compared to pure learning-based networks due to the instance optimization step executed after feature extraction. A promising direction for future research involves designing a joint optimization framework that embeds contrastive loss into fully learning-based architectures. This would allow the model to preserve rapid, single-pass inference speeds while potentially boosting overall registration accuracy.
In summary, CoRe indicates that combining contrastive learning with registration objectives offers a promising direction for medical image alignment. By integrating transformation-equivariant feature representations into a registration pipeline, the proposed framework demonstrates an improvement in alignment accuracy for mono-modal CT data. These findings highlight the utility of joint optimization strategies in contributing to more robust and consistent registration workflows.
Author Contributions
Conceptualization, E.K. and M.P.H.; methodology, E.K.; software, E.K.; validation, E.K., C.G., Z.A.-H.H., F.F. and W.H.; formal analysis, E.K.; investigation, E.K.; resources, M.P.H.; data curation, E.K.; writing—original draft preparation, E.K.; writing—review and editing, E.K. and M.P.H.; visualization, E.K.; supervision, M.P.H.; project administration, M.P.H.; funding acquisition, M.P.H. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by German Research Foundation: DFG, HE 7364/10-1, project number 500498869.
Data Availability Statement
Conflicts of Interest
The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.
References
- Li, Z.; Tian, L.; Mok, T.C.; Bai, X.; Wang, P.; Ge, J.; Zhou, J.; Lu, L.; Ye, X.; Yan, K.; et al. Samconvex: Fast discrete optimization for ct registration using self-supervised anatomical embedding and correlation pyramid. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2023; pp. 559–569. [Google Scholar]
- Pielawski, N.; Wetzer, E.; Öfverstedt, J.; Lu, J.; Wählby, C.; Lindblad, J.; Sladoje, N. CoMIR: Contrastive multimodal image representation for registration. Adv. Neural Inf. Process. Syst. 2020, 33, 18433–18444. [Google Scholar]
- Liu, F.; Yan, K.; Harrison, A.P.; Guo, D.; Lu, L.; Yuille, A.L.; Huang, L.; Xie, G.; Xiao, J.; Ye, X.; et al. SAME: Deformable image registration based on self-supervised anatomical embeddings. In Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2021; pp. 87–97. [Google Scholar]
- Mok, T.C.; Li, Z.; Bai, Y.; Zhang, J.; Liu, W.; Zhou, Y.J.; Yan, K.; Jin, D.; Shi, Y.; Yin, X.; et al. Modality-Agnostic Structural Image Representation Learning for Deformable Multi-Modality Medical Image Registration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 18–22 June 2024; pp. 11215–11225. [Google Scholar]
- Dey, N.; Schlemper, J.; Salehi, S.S.M.; Zhou, B.; Gerig, G.; Sofka, M. Contrareg: Contrastive learning of multi-modality unsupervised deformable image registration. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2022; pp. 66–77. [Google Scholar]
- Bigalke, A.; Hansen, L.; Mok, T.C.; Heinrich, M.P. Unsupervised 3d registration through optimization-guided cyclical self-training. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2023; pp. 677–687. [Google Scholar]
- Borvornvitchotikarn, T.; Kurutach, W. mirid: Multi-modal image registration using modality-independent and rotation-invariant descriptor. Symmetry 2020, 12, 2078. [Google Scholar] [CrossRef]
- Heinrich, M.P.; Jenkinson, M.; Bhushan, M.; Matin, T.; Gleeson, F.V.; Brady, M.; Schnabel, J.A. MIND: Modality independent neighbourhood descriptor for multi-modal deformable registration. Med. Image Anal. 2012, 16, 1423–1435. [Google Scholar] [CrossRef] [PubMed]
- Jiang, D.; Shi, Y.; Yao, D.; Wang, M.; Song, Z. miLBP: A robust and fast modality-independent 3D LBP for multimodal deformable registration. Int. J. Comput. Assist. Radiol. Surg. 2016, 11, 997–1005. [Google Scholar] [CrossRef] [PubMed]
- Jaouen, V.; Conze, P.H.; Dardenne, G.; Bert, J.; Visvikis, D. Regularized directional representations for medical image registration. arXiv 2021, arXiv:2111.15509. [Google Scholar] [CrossRef]
- Simonovsky, M.; Gutiérrez-Becker, B.; Mateus, D.; Navab, N.; Komodakis, N. A deep metric for multimodal registration. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2016; pp. 10–18. [Google Scholar]
- Wang, X.; Zhang, R.; Shen, C.; Kong, T.; Li, L. Dense contrastive learning for self-supervised visual pre-training. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 3024–3033. [Google Scholar]
- Chaitanya, K.; Erdil, E.; Karani, N.; Konukoglu, E. Contrastive learning of global and local features for medical image segmentation with limited annotations. Adv. Neural Inf. Process. Syst. 2020, 33, 12546–12558. [Google Scholar]
- Goncharov, M.; Soboleva, V.; Kurmukov, A.; Pisov, M.; Belyaev, M. vox2vec: A framework for self-supervised contrastive learning of voxel-level representations in medical images. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2023; pp. 605–614. [Google Scholar]
- Yan, K.; Cai, J.; Jin, D.; Miao, S.; Guo, D.; Harrison, A.P.; Tang, Y.; Xiao, J.; Lu, J.; Lu, L. SAM: Self-supervised learning of pixel-wise anatomical embeddings in radiological images. IEEE Trans. Med. Imaging 2022, 41, 2658–2669. [Google Scholar] [CrossRef] [PubMed]
- Bai, X.; Bai, F.; Huo, X.; Ge, J.; Lu, J.; Ye, X.; Yan, K.; Xia, Y. SAMv2: A Unified Framework for Learning Appearance, Semantic and Cross-Modality Anatomical Embeddings. arXiv 2023, arXiv:2311.15111. [Google Scholar]
- Seince, M.; Folgoc, L.L.; de Souza, L.A.F.; Angelini, E. Dense Self-Supervised Learning for Medical Image Segmentation. arXiv 2024, arXiv:2407.20395. [Google Scholar] [CrossRef]
- Santhirasekaram, A.; Winkler, M.; Rockall, A.; Glocker, B. A geometric approach to robust medical image segmentation. Med. Image Anal. 2024, 97, 103260. [Google Scholar] [CrossRef] [PubMed]
- Siebert, H.; Heinrich, M.P. Learn to fuse input features for large-deformation registration with differentiable convex-discrete optimisation. In Proceedings of the International Workshop on Biomedical Image Registration; Springer: Berlin/Heidelberg, Germany, 2022; pp. 119–123. [Google Scholar]
- Balakrishnan, G.; Zhao, A.; Sabuncu, M.R.; Guttag, J.; Dalca, A.V. Voxelmorph: A learning framework for deformable medical image registration. IEEE Trans. Med. Imaging 2019, 38, 1788–1800. [Google Scholar] [CrossRef] [PubMed]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
- Xu, Z.; Lee, C.P.; Heinrich, M.P.; Modat, M.; Rueckert, D.; Ourselin, S.; Abramson, R.G.; Landman, B.A. Evaluation of six registration methods for the human abdomen on clinically acquired CT. IEEE Trans. Biomed. Eng. 2016, 63, 1563–1572. [Google Scholar] [CrossRef] [PubMed]
- Hering, A.; Hansen, L.; Mok, T.C.; Chung, A.C.; Siebert, H.; Häger, S.; Lange, A.; Kuckertz, S.; Heldmann, S.; Shao, W.; et al. Learn2Reg: Comprehensive multi-task medical image registration challenge, dataset and evaluation in the era of deep learning. IEEE Trans. Med. Imaging 2022, 42, 697–712. [Google Scholar] [CrossRef] [PubMed]
- Draelos, R.L.; Dov, D.; Mazurowski, M.A.; Lo, J.Y.; Henao, R.; Rubin, G.D.; Carin, L. Machine-learning-based multiple abnormality prediction with large-scale chest computed tomography volumes. Med. Image Anal. 2021, 67, 101857. [Google Scholar] [CrossRef] [PubMed]
- Wasserthal, J.; Breit, H.C.; Meyer, M.T.; Pradella, M.; Hinck, D.; Sauter, A.W.; Heye, T.; Boll, D.T.; Cyriac, J.; Yang, S.; et al. TotalSegmentator: Robust segmentation of 104 anatomic structures in CT images. Radiol. Artif. Intell. 2023, 5, e230024. [Google Scholar] [CrossRef] [PubMed]
- Modat, M.; Ridgway, G.R.; Taylor, Z.A.; Lehmann, M.; Barnes, J.; Hawkes, D.J.; Fox, N.C.; Ourselin, S. Fast free-form deformation using graphics processing units. Comput. Methods Programs Biomed. 2010, 98, 278–284. [Google Scholar] [CrossRef] [PubMed]
- Heinrich, M.P.; Jenkinson, M.; Brady, M.; Schnabel, J.A. MRF-based deformable registration and ventilation estimation of lung CT. IEEE Trans. Med. Imaging 2013, 32, 1239–1248. [Google Scholar] [CrossRef] [PubMed]
- Mok, T.C.; Chung, A.C. Large deformation diffeomorphic image registration with laplacian pyramid networks. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2020; pp. 211–221. [Google Scholar]
- Tian, L.; Greer, H.; Kwitt, R.; Vialard, F.X.; San José Estépar, R.; Bouix, S.; Rushmore, R.; Niethammer, M. unigradicon: A foundation model for medical image registration. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2024; pp. 749–760. [Google Scholar]
- Tian, L.; Greer, H.; Vialard, F.X.; Kwitt, R.; Estépar, R.S.J.; Rushmore, R.J.; Makris, N.; Bouix, S.; Niethammer, M. Gradicon: Approximate diffeomorphisms via gradient inverse consistency. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 18084–18094. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |