Ex-Vivo Hippocampus Segmentation Using Diffusion-Weighted MRI

: The hippocampus is a crucial brain structure involved in memory formation, spatial navigation, emotional regulation, and learning. An accurate MRI image segmentation of the human hippocampus plays an important role in multiple neuro-imaging research and clinical practice, such as diagnosing neurological diseases and guiding surgical interventions. While most hippocampus segmentation studies focus on using T1-weighted or T2-weighted MRI scans, we explore the use of diffusion-weighted MRI (dMRI), which offers unique insights into the microstructural properties of the hippocampus. Particularly, we utilize various anisotropy measures derived from diffusion MRI (dMRI), including fractional anisotropy, mean diffusivity, axial diffusivity, and radial diffusivity, for a multi-contrast deep learning approach to hippocampus segmentation. To exploit the unique benefits offered by various contrasts in dMRI images for accurate hippocampus segmentation, we introduce an innovative multimodal deep learning architecture integrating cross-attention mechanisms. Our proposed framework comprises a multi-head encoder designed to transform each contrast of dMRI images into distinct latent spaces, generating separate image feature maps. Subsequently, we employ a gated cross-attention unit following the encoder, which facilitates the creation of attention maps between every pair of image contrasts. These attention maps serve to enrich the feature maps, thereby enhancing their effectiveness for the segmentation task. In the final stage, a decoder is employed to produce segmentation predictions utilizing the attention-enhanced feature maps. The experimental outcomes demonstrate the efficacy of our framework in hippocampus segmentation and highlight the benefits of using multi-contrast images over single-contrast images in diffusion MRI image segmentation.


Introduction
The hippocampus, nestled within the human brain's temporal lobe, is a vital structure central to neuroscience and clinical research.Its seahorse-shaped form plays a crucial role in cognitive functions like memory formation and spatial navigation [1].The landmark case of patient H.M. underscored its importance, revealing memory's dependency on the hippocampus [2].Furthermore, the discovery of 'place cells' elucidated its role in spatial orientation [3].Neuro-imaging techniques such as magnetic resonance imaging (MRI) have deepened our understanding, revealing its vulnerability in neurodegenerative disorders [4] and activation patterns during memory tasks [5].The hippocampus remains a focal point in neuroscience, with ongoing studies unraveling its complexities and implications in various disorders [6][7][8][9].MRI hippocampus segmentation is crucial in neuro-imaging and clinical practice due to the hippocampus's pivotal role in cognitive functions and susceptibility to neurodegenerative diseases.It involves delineating the hippocampus from other brain structures, enabling precise analysis of size, shape, and volume.This accuracy is essential for various reasons:

•
Diagnosis and Disease Monitoring: Accurate segmentation aids in diagnosing and monitoring neurodegenerative diseases like Alzheimer's [10,11], epilepsy, and dementia.It helps quantify hippocampal atrophy, a key biomarker in Alzheimer's disease.• Surgical Intervention and Therapy Planning: In epilepsy surgery, precise segmentation ensures minimal impact on healthy brain tissue [12].During radiation therapy, protecting the hippocampus from exposure minimizes cognitive impairment risks [13].

•
Cognitive Neuroscience Research: Segmentation supports research on memory and spatial navigation, deepening understanding of neural mechanisms underlying cognition [14].• Personalized Medicine: Variations in hippocampal structure impact disease susceptibility and treatment responses.Accurate segmentation enables tailored treatment plans based on individual neuroanatomy [15].
Most of recent hippocampus segmentation studies focus on utilizing T1-weighted (T1w) or T2-weighted (T2w) MRI scans due to the convenience of data collection and acquisition [16][17][18][19][20].For example, [16] compared different segmentation methods by utilizing T1w and T2w multispectral MRI data, which highlighted the complexities of hippocampal structure and the benefits of using high-resolution T2w images for better contrast properties in subfield delineation.It also examined the reliability of different MRI sequences and their combinations for accurate hippocampal segmentation.Manjón et al. [17] introduced a novel deep learning-based hippocampus subfield segmentation method, which utilized a variant of the U-NET architecture [21], combining both T1w and T2w images to improve the performance of hippocampus segmentation.However, very few studies focus on utilizing diffusion-weighted MRI (dMRI) on hippocampus segmentation, which provides unique insights offered by dMRI into the microstructural properties of brain tissue.Firstly, dMRI offers insights into the microstructural environment of the hippocampus, which includes the orientation and integrity of white matter tracts and can reveal subtle changes in hippocampal tissue not visible on conventional MRI [22].Secondly, dMRI performs well in detecting early microstructural changes in the hippocampus associated with neurodegenerative diseases like Alzheimer's [23].Thirdly, the hippocampus is a hub in the brain's memory network, and its connections to other regions are crucial for its function.Hippocampus segmentation on dMRI scans will facilitate the assessment of these connections through techniques such as tractography, providing a more comprehensive view of hippocampal connectivity and its alterations in various conditions [24].Achieving a mesoscale resolution for diffusion MRI and tractography further improves the delineation of hippocampal substructures and uncovers more detailed level of intra-regional grey matter connectivity [25,26].Additionally, dMRI can offer a few insights into the functional aspects of the hippocampus, linking its structural properties to cognitive functions such as memory and learning.This is particularly relevant in understanding how structural changes impact function in diseases affecting the hippocampus [27][28][29].
In dMRI studies, several key metrics are utilized to quantify the diffusion of water molecules in brain tissue, each reflecting different aspects of tissue microstructure, and thereby can be utilized to derive multimodal dMRI images.The most important and widely utilized metrics are Fractional Anisotropy (FA), Mean Diffusivity (MD), Axia Diffusivity (AD) and Radial Diffusivity (RD).FA measures the degree of directional anisotropy of water diffusion in tissue.FA values tend to be high in areas where water diffusion is directionally restricted or oriented (e.g., white matter tracts), while they tend to be low in areas where diffusion is more isotropic (e.g., gray matter or cerebrospinal fluid).MD measures the average rate of water diffusion within a tissue and is derived from the diffusion coefficients along all measured directions.It reflects the overall mobility of water molecules, with higher values indicating more unrestricted diffusion.In brain tissue, increased MD can be a sign of tissue degeneration or loss, as it suggests a greater ease of water movement, potentially due to loss of barriers like cell membranes.AD measures the rate of water diffusion along the primary axis of white matter fibers reflecting axonal integrity.Low AD values potentially indicate axonal injury or degeneration.RD measures the rate of water diffusion perpendicular to the primary axis of white matter fibers, which is particularly sensitive to changes in the myelin sheath that surrounds axons.All these metrics are important for an accurate hippocampus segmentations as they provide insights into the integrity and microstructure of brain tissues, which highlights the significance of multimodal hippocampus segmentation studies.Particularly, each metric provides insights into different aspects of the hippocampal microstructure.For example, AD and RD can provide information about axonal integrity and myelination, respectively.Compared to segment on each single modal scan, combining different metrics can lead to more accurate segmentation of the hippocampus, since these multimodal dMRI scans can provide complementary information (e.g., provide sufficient contrast information) to distinguish different semantic features [30].
The integration of deep learning methods in multi-contrast MRI image segmentation has been a significant advancement in neuro-imaging studies.These methods have harnessed the power of artificial intelligence to analyze complex datasets, offering more accurate and detailed insights into the structures of human brain or brain subregions (e.g., the hippocampus).One of the most pivotal developments in this field is the development of convolutional neural networks (CNNs), which have become a cornerstone in medical image analysis [31][32][33][34].CNNs are particularly well suited for image data due to their ability to automatically and adaptively learn spatial hierarchies of features from images.In multi-contrast MRI segmentation, CNNs have been used to combine information from different MRI contrast methods (e.g., T1w, or T2w or diffusion MRI scans), to improve the segmentation performance, which is crucial as the different image contrasts provide complementary information about the anatomy of brain [35].The U-Net, a CNN based architecture, has shown remarkable effectiveness in medical image segmentation tasks [21].It can effectively captures contextual information while enabling precise localization, making it suitable for complex segmentation tasks, such as segmentation tasks related to regional brain substructures.However, U-Net is designed for single-modal tasks, which might limit its effectiveness to learn relationships across different contrasts of data when applying to the multi-contrast segmentation tasks.Therefore, we propose to address this issue via cross-attention mechanisms, which guides the deep learning framework to focus on the most relevant parts of an image to improve the segmentation performance in multicontrast dMRI hippocampus segmentation tasks.Cross-attention extends the concept of self-attention [36] to interactions between at least two types of data obtained from different contrasts [37].It has strong ability to model contextual relationships, and to combine the effective information across multi-contrast MRI images to perform the hippocampus segmentation tasks.
To sum up, our contributions in this work can be summarized as follows: (1).We used mesoscale diffusion MRI data to calculate multiple contrast images to segment the human hippocampus from its surrounding temporal lobe structures.(2).We build up a multimodal deep learning framework with the cross-attention mechanism to improve the performance of hippocampus segmentation.(3).We compared the segmentation performance of each single-contrast diffusion image with different combinations of multi-contrast images.

Semantic Segmentation on Hippocampus
The accurate segmentation of the hippocampus is a vital image-processing step to assist the study of the hippocampus and related neurological disorders caused by the impairment of the hippocampus.The early techniques of hippocampus segmentation primarily involved manual delineation, which is time consuming and prone to inter-rater variability [38][39][40].The complexity of the shape of hippocampus and its variable appear-ance across individuals posed additional challenges.To address these limitations, a few semi-automated segmentation methods were developed, which typically involved user intervention for initialization or correction of segmentation results, striking a balance between automation and accuracy [41][42][43].Advancements led to fully automated techniques, which were essential for handling large datasets in studies like Alzheimer's disease research.These methods utilized various computational strategies, such as region-growing algorithms, atlas-based approaches and machine learning algorithms [44,45].The development of machine learning and deep learning marked a significant leap in hippocampus segmentation, where Convolutional Neural Networks (CNNs) as well as CNN-based segmentation architectures (e.g., U-Net) have demonstrated high accuracy and efficiency in segmenting the hippocampus from MRI scans [17,20,21,[46][47][48][49][50].These methods have the ability to learn complex patterns from diverse datasets which yields robust segmentation results.Beyond 2D segmentation studies, [51] introduced a 3D convolution model named DeepHipp, which integrates dense block and attention mechanisms for T1w hippocampus segmentation.This model is designed to improve the efficiency of feature usage by reusing features of each layer learned by the network, which allows the model to focus on the segmentation target and suppress irrelevant regions of the input image, enhancing the accuracy of hippocampus segmentation.

Deep Neural Networks for Multimodal MRI Hippocampus Segmentation
The field of deep learning for multimodal hippocampus MRI image segmentation has seen notable advancements, where a variety of approaches have been explored to improve the accuracy and efficiency of segmentation processes.Manjón et al. [17] proposed a variant of the of the U-Net architecture, which incorporates multiple resolution levels and a deep supervision approach to capture detailed hippocampal structures from T1w and T2w MRI scans [17].Additionally, deep CNNs have been employed in the segmentation and classification of the hippocampus in Alzheimer's disease, which offers promising results in automating the segmentation process and potentially aiding in the early diagnosis of Alzheimer's [20].Most of the deep learning methods for multimodal MRI hippocampus segmentation are based on the T1w and T2w neuro-imaging data [52,53].

Proposed Segmentation Framework
The proposed segmentation framework (see Figure 1) integrates cross-attention mechanisms to improve the MRI image segmentation by utilizing multi-contrast MRI data.We will delve into the details of proposed multi-contrast segmentation framework in this section.

Framework Overview
As shown in Figure 1, the proposed segmentation framework consists of a multicontrast encoder including K different branches to embed K different contrast maps, and a shared decoder to reconstruct the segmentation predictions from the latent space.A K 2 gated cross-attention unit is inserted between the encoder and the decoder to capture the cross relationships among different contrast maps.We adopt the encoder and decoder setting of the U-Net [21] architecture to serve as the segmentation backbone of our segmentation framework, where we duplicate K different encoder branches to embed K different contrast maps.Each encoder branch takes an contrast map X as input and generate the latent image feature maps F layer by layer.After K feature maps (i.e., F 1 , F 2 , . . ., F K ) are generated, the K 2 gate cross-attention unit is utilized to generate the gated cross-attention matrix (i.e., A g ) based on these feature maps.Finally, the K feature maps are weighted by the gated cross-attention matrix as an attention-enhanced feature map (i.e., F a 1 , F a 2 , . . ., F a K ).The attention-enhanced feature map will be forwarded to the next encoder layer or the same decoder layer for segmentation predictions.

K 2 Gated Cross-Attention Unit
Assume that the dimension of the generated feature map is F ∈ R H×W×C , where H and W are the size of the feature map and C is the channel number.We first reshape the feature map F to a new feature matrix H ∈ R N×C , where N = H × W. A linear layer is then utilized on H to adjust the channel number to ĤN×c .As shown in Figure 2, Ĥ is utilized by the gated cross-attention unit to compute the gated cross-attention matrix.Let Ĥk be the k-th feature matrix, where i = 1, 2, . . ., K. The cross-attention matrix between the i-th and j-th feature matrix can be computed by: where i and j are in range of [1, 2, . . ., K] and the A i,j ∈ R N×N .Obviously, there are K 2 cross-attention matrix (When i = j, the cross-attention matrix is degraded into the selfattention matrix.).Based on this cross-attention matrix, the gated cross-attention matrix can be computed by: where ⊗ denotes the element-wise multiplication operation and σ is a nonlinear activation function (i.e., softmax).After the gated cross-attention matrix is generate, we compute the attention-enhanced feature matrices by: where i = 1, 2, . . ., K. W is trainable parameters of a linear layer to adjust the feature dimension back to the original dimension (i.e., from c to C).We then reshape these attention-enhanced feature matrices (i.e., Ĥa i ∈ R N×C ) back to the attention-enhanced feature maps (i.e., F a i ∈ R H×W×C ) and feed-forward them to the following encoder layer.Meanwhile, we combine these attention-enhanced feature maps by F a = ∑ K k=1 F a k and feed-forward F a to the segmentation decoder.

Hippocampus Segmentation with the Proposed Framework
In this study, we utilize four different contrast maps (i.e., K = 4) obtained from diffusion MRI images for hippocampus segmentation.For the U-Net based segmentation backbone, we adopt all default configurations used in the official implementations (https: //github.com/milesial/Pytorch-UNetaccessed on 15 March 2024) except for replacing the transposed convolution with the bi-linear interpolation in the decoder side.We deploy the gated cross-attention unit in third and fourth layer of encoder where the feature maps have 1/8 and 1/16 sizes of the original input images, respectively.The segmentation loss L seg is the sum of the binary cross-entropy (BCE) loss and Dice loss, shown as follows: where ŷ and y are the output segmentation and the segmentation groundtruth, respectively.λ is the weight parameter.

Datasets
Collection and use of human temporal lobes was approved by the Committee for Oversight of Research and Clinical Training Involving Decedents (CORID No.1063).Temporal lobes were obtained post-mortem from subjects who died of issues unrelated to the brain (e.g., septic shock or pancreatic ductal adenocarcinoma).The mean age was (males mean = 58 years, range 45-75 years; females mean = 50 years, range 21-72).The time to fixation after death was <42 hrs.Whole temporal lobes were immersed into 10% buffered formalin (CH 2 O equivalent to 4% formaldehyde) for 4 weeks at 4 • C prior to transfer to PBS for 4 weeks.
Diffusion MR images were acquired on 9.4.7T/30cm Bruker AV3 HD microimaging scanner equipped with a B-GA12S HP gradient set capable of 660 mT/m maximum gradient strength and a 40 mm quadrature resonator running Paravision 6.0.1 (Bruker Biospin, Billerica, MA, USA).
Multi-shell diffusion MR images were acquired with a 3D diffusion-weighted multishot spin-echo EPI sequence with the following parameters: TR = 500 ms, TE = 0.96 ms, diffusion time = 14 ms, diffusion duration δ = 6.5 ms, diffusion spacing △ = 13 ms, EPI segments = 30, with 1.2 partial Fourier acceleration in PE1, and a zero-filling acceleration factor 1.2 in the read and PE2 dimensions for a final isotropic resolution of 0.250 mm.A total of 94 images were collected with diffusion-weighted shells having b = 1000, 2000, 4000, 6000 s/mm 2 and 20, 30, 40, 60 directions, respectively (∼68 h total scanning time), as described in detail in [26], Image acquisition was performed at room temperature (21 • C) to provide a high SNR, as well as good water diffusion [25].Diffusion MR images were processed using DSI Studio (available at https://dsi-studio.labsolver.org/accessed on 17 March 2024) [54].Reconstruction of the diffusion tensor images (DTI) was achieved by performing an eigenvector analysis on the calculated tensor [55,56].Multi-shell diffusion MRI scans were reconstructed using Generalized Q-sampling Imaging (GQI) [57], with a diffusion sampling length ratio of 0.6 to yield image maps of mean diffusivity (MD), radial diffusivity (RD), axial diffusivity (AD), as well as fractional anisotropy (FA) images [26].

Implementation Details
Since our dataset consists of 10 3D multi-contrast diffusion MRI scans, we take 2 different scans each time as our validation set and the other 8 scans are utilized for training.We repeat this process 5 times to conduct five-fold cross-validations for all the experiments, and report the mean performance as well as the standard deviation (std).After the data partitions, we slice 3D diffusion MRI scans to 2D image slices along the Z-axis.In the training phase, we first applied data augmentation techniques on the fly to reduce potential overfitting, including random scaling (0.8 to 1.2), random rotation (±15 • ), random intensity shift of (±0.1), and intensity scaling of (0.9 to 1.1).Since the size of different image samples are different, we cropped or padded each image to a size of 512 × 512.The training iterations were set to 300 epochs with a linear warmup of the first 5 epochs.We trained the model using the Adam optimizer with a batch size of 32 and synchronized batch normalization.The initial learning rate was set to 1 × 10 −3 and decayed by (1 − current_epoch max_epoch ) 0.9 .We also regularized the training with an l 2 weight decay of 1 × 10 −5 .
In the inference phase, we only applied padding operations to the input image if its size can not be divisible by the down-sample rate of the model.All experiments were conducted based on Python 3.11.5 and PyTorch 1.7.1 and were deployed on a server with 2 NVIDIA A100 GPUs.

Baselines and Evaluation Metrics
We compared our approach with six segmentation baselines, i.e., U-Net [21], U 2 Net [58], DeepLabv3+ [59], Attention U-Net [60], NNU-Net [61] and IVD-Net [62].The U-Net model is a convolutional neural network initially developed for biomedical image segmentation.It features a distinctive architecture with a contracting path to capture context and a symmetric expanding path for precise localization, making it particularly effective for medical image tasks.The U 2 Net model is a deep learning architecture that features a novel nested U-structure that enhances the learning of local and global contextual information.The Attention U-Net model is an advanced version of the traditional U-Net architecture used for medical image segmentation, which incorporates attention gates to enhance the ability of U-Net to focus on specific areas of interest.The DeepLabv3+ is an advanced semantic image segmentation model that builds upon the previous DeepLab frameworks.It introduces an encoder-decoder structure, utilizing atrous convolutions to efficiently capture multi-scale contextual information, and an improved Atrous Spatial Pyramid Pooling module, allowing it to effectively segment objects at multiple scales with enhanced boundary definition.The IVD-Net is a variant of U-Net for multi-contrast MRI data segmentation.We adopted three metrics to assess the performance of segmentation models, including the mean intersection over union (mIoU), Dice similarity coefficient (DSC), Hausdorrf distance (HD).Specifically, mIoU, and DSC are two overlap-based metrics, each ranging from 0 to 1 and a larger value indicating better performance.HD is a shape distance-based metric, which can be used to measure the dissimilarity between the surfaces/boundaries of the segmentation result and the ground-truth.As for HD, a lower value indicates a better segmentation result.

Comparative Experimental Results
Table 1 provides the performance of our framework and six competing baseline methods, including U-Net [21], DeepLabv3+ [59], U 2 Net [58], Attention U-Net [60], NNU-Net [61], and IVD-Net [62] on our multi-contrast diffusion MRI data for hippocampus segmentation.Since some baseline methods (i.e., U-Net, DeepLabv3+, U 2 Net, Attention U-Net and NNU-Net) are designed for the single contrast input, we train these models by joint dataset including 4 contrasts of image slices.It shows that our framework outperforms all competing methods substantially and consistently in terms of DSC and mIoU, indicating the superiority of our model in hippocampus segmentation.Meanwhile, comparing to the multi-contrast based method, i.e., IVD-Net, our model achieves superior segmentation results, which tends to show the strong ability of cross-attention for hippocampus segmentation based on multi-contrast diffusion MRI images.We visualize the qualitative segmentation results in Figure 3, where we present three multi-contrast image slices with their segmentation ground-truth.Meanwhile, the visualized segmentation predictions produced by our method and other two classic baselines (i.e., NNU-Net and U-Net) are also provided in Figure 3.

Analytical Experimental Results
We conduct four different sets of analytical experiments.Firstly, we compare the hippocampus segmentation results by utilizing single-contrast and multi-contrast MRI scans to validate the importance of multi-contrast representations in the segmentation tasks.Meanwhile, we compare the segmentation results of our framework with and without cross-attention equipped to show the superiority of the attention mechanism.We also show the segmentation results of 4 single contrasts of diffusion MRI images (i.e., FA, MD, AD, RD) to compare the segmentation performance provided by each contrast of images.Finally, we perform a grid search experiment on the loss weight parameter (i.e., λ).

Comparisons: Single-Contrast vs. Multi-Contrast
In this experiment, we adopt the U-Net architecture as a foundation for segmentation tasks, utilizing fractional anisotropy (FA), mean diffusivity (MD), axial diffusivity (AD), and radial diffusivity (RD) as individual input contrasts.Given U-Net's absence of an attention mechanism, we deploy our proposed framework without attention mechanisms, utilizing these four contrasts as multi-contrast input.The comparison between segmentation results obtained from these two experimental settings underscores the compelling advantages of multi-contrast images over single-contrast data.Table 2 serves as a testament to this, illustrating the notable improvements in hippocampus segmentation achieved through the integration of multiple contrasts.This enhancement is indicative of the complementary information offered by each contrast, collectively enriching the feature representation and enhancing the segmentation accuracy.Generally, the utilization of multi-contrast imaging holds significant promise in neuro-imaging research and clinical applications [63][64][65].By leveraging the diverse information encapsulated within different contrasts, we can gain a more comprehensive understanding of the underlying tissue microstructure and pathology [34].

Segmentation with Attention Mechanisms
To elucidate the significance of attention mechanisms, we conduct a comparative analysis between the conventional U-Net and a modified version incorporating self-attention (U-Net + Self-attention) (The self-attention is deployed in the third and fourth layer of the U-Net encoder).This comparison aims to highlight the impact of attention mechanisms on segmentation performance.Simultaneously, we explore the efficacy of attention mechanisms within the context of multi-contrast imaging.Employing our proposed framework, we conduct experiments comparing segmentation results obtained with and without cross-attention mechanisms.The outcomes of these comparative experiments are provided in Figure 4a, shedding light on the relative performance of attention-enabled frameworks.Our comprehensive analysis reveals a consistent trend across all comparative experiments: segmentation backbones equipped with attention mechanisms consistently outperform those without.This compelling evidence underscores the indispensable role of attention mechanisms in enhancing segmentation accuracy and robustness [36].These findings carry profound implications for the field of medical image analysis, suggesting that attention mechanisms can serve as a powerful tool for improving the efficacy of segmentation algorithms.

Comparisons of Single Contrast of Diffusion MRI
We select three baseline methods including U-Net, DeepLabv3+ and Attention U-Net to compare the segmentation results provided by each of four modals of diffusion MRI in Figure 5.The results suggest that FA images consistently exhibit superiority in hippocampus segmentation compared to other contrasts such as MD, AD, and RD.However, it is important to acknowledge that the superiority of MD, AD, and RD cannot be conclusively determined solely from our findings.This ambiguity arises from the observation that the performance of these contrasts varies depending on the specific deep learning framework employed.Therefore, the relative efficacy of MD, AD, and RD warrants further investigation within different methodological contexts to ascertain their respective strengths and limitations in hippocampus segmentation.

Parameter Analysis
A grid search experiment is performed to determine the optimal value of the loss weight, λ.Particularly, we set the search space of λ as {0, 0.2, 0.4, 0.6, 0.8, 1.0}.Figure 4b shows that the best value of the loss weight is 0.4, and it shows that the segmentation performance of our framework is consistent under different loss weights.

Conclusions
This paper presents a novel multimodal deep-learning framework incorporating cross-attention mechanisms tailored specifically for the segmentation of the hippocampus.Leveraging mesoscale diffusion MRI data, we harness its potential to compute a spectrum of contrast images crucial for segmenting the human hippocampus from surrounding brain structures.In our experimental evaluation, we assess the performance of our framework in hippocampus segmentation, demonstrating its efficacy in this segmentation task.Moreover, our experimental findings provide compelling evidence supporting the superiority of multi-contrast images over their single-contrast counterparts in diffusion MRI image segmentation.Furthermore, our experimental results shed light on the advantages offered by the fractional anisotropy (FA) contrast of dMRI images, where we observe that the FA contrast consistently performs better than the other image contrasts (i.e., MD, AD and RD) in the hippocampus segmentation task.This may potentially inform future developments in diffusion MRI analysis for hippocampus segmentation.

Figure 1 .
Figure 1.Diagram of the proposed multi-contrast segmentation framework with gated crossattention unit.

Figure 2 .
Figure 2. The computation within the gated cross-attention unit.

Figure 3 .
Figure 3. Visualization of the hippocampus segmentation results based on multi-contrast MRI images produced by our model, as well as by NNU-Net and U-Net.GT represents ground-truth annotations.

Figure 4 .
Figure 4. (a).Result comparisons: with attention mechanism and without attention mechanism.U-Net and U-Net with self-attention methods are compared based on four different data contrasts including FA, MD, AD, and RD.Our proposed method is compared with its variant (i.e., the proposed method without cross-attention) based on the multi-contrast data.wo/Attention and w/Attention represent without and with the attention mechanisms, respectively.(b).Loss weight analysis.

Figure 5 .
Figure 5. Result comparisons: four different contrasts deployed on U-Net, DeepLabv3+ and Attention U-Net.

Table 1 .
Quantitative results of different methods on Hippocampus dataset.The best results are shown in bold.The DSC and mIoU are in percentage form.The HD is in mm.

Table 2 .
Hippocampus segmentation results based on single contrast and multi-contrast diffusion MRI.wo/Attention represents without the attention mechanism.The best results are shown in bold.