Cascaded Dual Domain Hybrid Attention Network

Cai, Yujia; Dong, Qingyu; Qiu, Cheng; Wang, Lubin; Yu, Qiang

doi:10.3390/sym17071020

Open AccessArticle

Cascaded Dual Domain Hybrid Attention Network

by

Yujia Cai

¹,

Qingyu Dong

¹,

Cheng Qiu

²,

Lubin Wang

^1,*

and

Qiang Yu

^3,*

¹

Guilin Institute of Information Technology, Guilin 541004, China

²

Guangxi Zhuang Autonomous Region Institute of Metrology and Test, Nanning 530200, China

³

National Space Science Center, Chinese Academy of Sciences, Beijing 100190, China

^*

Authors to whom correspondence should be addressed.

Symmetry 2025, 17(7), 1020; https://doi.org/10.3390/sym17071020

Submission received: 15 April 2025 / Revised: 27 May 2025 / Accepted: 30 May 2025 / Published: 28 June 2025

(This article belongs to the Section Computer)

Download

Browse Figures

Versions Notes

Abstract

High-quality reconstruction of magnetic resonance imaging (MRI) data from undersampled k-space remains a significant challenge in medical imaging. While the integration of compressed sensing and deep learning has notably improved the performance of MRI reconstruction, existing convolutional neural network-based methods are limited by their small receptive fields, which hinders the exploration of global image features. Meanwhile, Swin-Transformer-based approaches struggle with inter-window information interaction and global feature extraction and perform poorly when dealing with complex repetitive structures and similar texture features under undersampling conditions, resulting in suboptimal reconstruction quality. To address these issues, we propose a Symmetry-based Cascaded Dual-Domain Hybrid Attention Network (SCDDHAN). Leveraging the inherent symmetry of medical images, the network combines channel and self-attention to improve global context modeling and local detail restoration. The overlapping window self-attention module is designed with symmetry in mind to improve cross-window information interaction by overlapping adjacent windows and directly linking neighboring regions. This facilitates more accurate detail recovery. The concept of symmetry is deeply embedded in the network design, guiding the model to better capture regular patterns and balanced structures within MRI images. Experimental results demonstrate that under 5× and 10× undersampling conditions, SCDDHAN outperforms existing methods in artifact suppression, achieving more natural edge transitions, clearer complex textures and superior overall performance. This study highlights the potential of integrating symmetry concepts into hybrid attention modules for accelerating MRI reconstruction and offers an efficient, innovative solution for future research in this area.

Keywords:

deep learning; hybrid attention network; medical imaging; MRI reconstruction

1. Introduction

Magnetic resonance imaging (MRI), a non-invasive imaging technique based on the principle of magnetic resonance, has been widely used in clinical diagnosis [1,2,3]. Compared with other imaging modalities, MRI has unique advantages such as no ionizing radiation, high soft tissue contrast, and multidirectional imaging, particularly in neuroimaging and oncology, where it has an irreplaceable position [4,5]. However, long MRI scanning times cause discomfort and unconscious movement of the patient, both of which reduce image quality and raise the cost of the examination, making it difficult to promote its use in emergency medicine and rapid screening. Therefore, shortening the scanning time while guaranteeing image clarity is the main challenge for MRI technology. Early acceleration methods mainly rely on hardware improvement [6,7,8], such as parallel imaging, which can shorten acquisition time with fast sequences, but the equipment is expensive and the effect decreases under high acceleration, which makes it difficult to popularize. Compressed sensing (CS) utilizes the sparse property of images to accelerate imaging and has been widely used in MRI reconstruction [9,10], usually processing the data in domains such as wavelet, full-variance, or cosine transforms. However, CS is computationally intensive, and the removal of ringing artifacts is still unsatisfactory at high acceleration rates [11,12,13,14,15].

In contrast to traditional methods, deep learning dramatically streamlines MRI reconstruction by learning features end-to-end rather than relying on hand-crafted regularization. It can process large datasets, reduce computational overhead, and better suppress artifacts and detail loss at high acceleration factors [16,17]. For example, Schlemper et al. proposed a cascaded reconstruction network with a data-consistency layer in each block, yielding more stable, high-quality results [18], while KIKI-Net fuses physical priors with deep learning to optimize image–k-space interactions, enhancing accuracy and robustness under rapid acquisition conditions [19].

While deep learning brings these gains, Transformer-based architectures—celebrated in NLP for their global dependency modeling—have also been adapted to vision tasks [20,21]. Unfortunately, the original visual Transformer incurs quadratic complexity with respect to image size, making it impractical for high-resolution MRI reconstruction. Swin-Transformer addresses this by introducing shifted windows and relative position encoding to achieve linear computational scaling and efficiently capture both local and global features [22]. Restormer further combines multi-head attention with an efficient feed-forward network to model long-range pixel interactions in large images [23]. Subsequent Swin-based models—SwinIR’s residual Swin blocks and multi-scale feature aggregation [24], SwinMR’s combinatorial loss for improved texture retention [25], SwinGAN’s adversarial training for artifact suppression [26], The HAT network demonstrates the excellent performance of the super-resolution model based on the hybrid attention mechanism by enhancing the information exchange between windows [27], and CDSCU-Net’s multi-scale pixel-shuffle Swin learning for high-resolution detail recovery [28]—have all achieved notable gains in natural image and MRI restoration. However, these designs still rely on stacking Swin Transformer blocks (STBs), and thus do not resolve the information-blocking issue between adjacent windows.

To address these limitations, we propose a cascaded dual-domain hybrid attention network (CDDHAN). CDDHAN fuses spatial- and frequency-domain information through a cascade structure, effectively improving the expressive power of the network. CDDHAN adopts a dual-domain hybrid attention mechanism, which not only captures the image details in the local region but also optimizes the global information through cross-domain feature fusion, thus improving the accuracy and stability of image reconstruction. In addition, the model combines the original K-space data for data consistency (DC) correction, digs deeper into the frequency-domain features through the hybrid attention mechanism, and converts the data back to the null domain to achieve high-quality image reconstruction. Each sub-model is cascaded, and the model transmits and fuses information in the null and frequency domains layer by layer, effectively avoiding the problem of local optimality that may occur in a single-domain model. The experimental results show that CDDHAN has significant performance advantages over traditional MRI reconstruction methods and classical deep learning methods at high acceleration factors.

Overall, our main contributions are four-fold:

A novel cascaded hybrid attention network for fast MRI reconstruction is proposed, as shown in Figure 1a.
An effective overlapping attention is proposed, that is, a module that can effectively solve the information blocking problem in the existing STBs.
A new hybrid attention module is proposed that can effectively distinguish between different textures and artifacts for better learning image feature information.
Comparative experimental studies using different undersampled trajectories are conducted to verify the effectiveness of the proposed CDDHAN.

The first section of the paper describes the current state of domestic and international research on the more classical MR image reconstruction algorithms; Section 2 describes our proposed model architecture and its various components; Section 3 presents the design of the comparison and ablation experiments as well as the related parameter settings; Section 4 discusses the experimental results; and Section 5 summarizes the paper and outlines directions for future work.

2. Methods

In Section 2, we first introduce the overall network architecture (Section 2.1), then detail the subnetwork structures (Section 2.2), describe the hybrid attention modules within each subnetwork (Section 2.3), explain the overlapping window attention mechanism (Section 2.4), and, finally, present the data consistency block (Section 2.5).

2.1. Network Structure of CDDHAN

The CDDHAN model proposed in this paper consists of a spatial-domain residual hybrid attention subnetwork (RHAN-I), a frequency-domain residual hybrid attention network subnetwork (RHAN-K), and DC module; the overall architecture is shown in Figure 1a. RHAN-I and RHAN-K have the same structure, consisting of four convolutional layers, three hybrid Swin-Transformer attention blocks (HSTBs), and one efficient overlapping attention block (EOAB) composed of the RHAN, as shown in Figure 1b. Each RHAN is built upon a residual-in-residual design, which facilitates multi-scale feature extraction and enhances the model’s generalization. By alternating RHAN modules between the spatial and frequency domains and enforcing physical constraints through the DC module, the network iteratively refines and fuses feature representations to yield high-fidelity MRI reconstructions.

2.2. Residual Hybrid Attention Network (RHAN)

As illustrated in Figure 2, the RHAN comprises shallow feature extraction, deep feature extraction, and feature fusion modules, with its convolutional layer stack configured as four 3 × 3 layers (stride = 1, padding = 1), each followed by instance normalization and a Leaky ReLU activation (α = 0.2). The first layer extracts shallow features by mapping low-quality input images (

I_{M R} \in R^{H \times W \times C_{i n}}

) to high-dimensional pixel embeddings with 64 output channels, while intermediate layers maintain 128 channels to ensure feature dimensional consistency; the final layer adjusts feature dimensions to 128 for seamless integration with subsequent modules. This hierarchical design—combining shallow feature extraction, deep contextual refinement via HSTBs and EOAB, and cross-level fusion—enables efficient integration of low-level structural details and high-level semantic features for robust representation learning. Shallow feature acquisition is expressed as:

F_{0} = H_{c o n v 1} (I_{M R})

(1)

where

H_{c o n v 1} (*)

is a 3 × 3 convolutional layer. The extraction of features from low-quality images helps the model learn useful information. To obtain richer image features, the image information is extracted using the following function:

H_{D} (\cdot) = {Conv}_{3 \times 3} (EOAB (\underset{s - 1 times}{\underset{⏟}{HSTB (\circ \dots \circ HSTB (\cdot))}}))

(2)

where

H_{D} (*)

is the deep feature extraction module consisting of s hybrid attention modules, an overlapping attention, and a 3 × 3 convolutional layer. The intermediate features

F_{1}

,

F_{2}

, …,

F_{s}

and the output depth features,

F_{N}

, are extracted block-by-block as follows:

F_{N - 1} = H_{H S T B} (F_{s - 1}), s = 1,2, \dots, N - 1

(3)

F_{N} = H_{EOAB} (F_{N - 1})

(4)

F_{D} = H_{C o n v 2} (F_{N})

(5)

where

H_{H S T B} (*)

denotes the sth hybrid attention module (HSTB) and

H_{E O A B} (*)

denotes the overlapping attention module (EOAB). At the end of the feature extraction phase, we introduce a convolutional layer to integrate the deep feature information by exploiting the inductive bias available to the convolutional operation [29]. Additionally, we introduce a global residual connection, which fuses shallow features with deep high-frequency features, enabling the network to focus on learning deep incremental representations rather than re-encoding low-level information, thus further enhancing its expressive capacity.

Feature fusion:

We use two 3 × 3 convolutional layers to fuse shallow features with deep incremental features to improve the expressiveness of the model. As shown in Figure 2, the fusion results are expressed as follows:

I_{H Q} = H_{I t e} (F_{0} + F_{D})

(6)

where

H_{I t e} (*)

denotes the feature fusion module.

2.3. Hybrid Swin-Transformer Attention Block (HSTB)

The HSTB consists of four Swin channel attention blocks (SCAB) and one convolutional layer, the specific structure of which is shown in Figure 3a. Each SCAB consists of channel attention block (CAB) and shifted window multi-head self-attention (S)W-MSA in parallel, aiming to improve the ability of the model to model the local details and global semantics of the image by increasing the size of the window and introducing the channel attention mechanism; thus, the HAB is able to capture more contextual information, thus improving the reconstruction of image details and overall structure.

A CAB consists of a channel-compressed block (CCB) and pooled branches, as shown in Figure 3b. The channel attention mechanism computes channel weights by integrating global information, which, we believe, helps address the overlap and confusion of similar textures and local features in medical images—especially in complex backgrounds caused by undersampling, where distinguishing between textures and artifacts becomes challenging. Channel attention enhances image clarity and realism by differentially processing similar regions, thereby accurately recovering details. Meanwhile, this approach has been validated in the field of computer vision [30,31]. However, Transformer-based structures typically require numerous channels for token embedding, and the direct use of constant-width convolution may lead to high computational costs. Therefore, we employ CCB to compress the number of channels in the convolutional layer to reduce computational overhead and adaptively combine global and local features to further enhance channel information capture. The pooling branch consists of two components: global average pooling (GAP), which captures global feature distribution by extracting overall statistical information, and global maximum pooling (GMP), which emphasizes the most significant local features to highlight important image details. The combination of GMP and GAP enables channel attention to consider the global context while preserving locally salient features, thereby accurately assigning appropriate weights to different channels. To better demonstrate its role in the overall system, the specific computational flow of SCAB is provided below. Given an input feature map

x \in R^{H \times W \times C}

, the computational flow of SCAB is expressed as follows:

X_{n} = L_{N} (x)

(7)

X_{c} = G A P (X_{n}) + G A M (X_{n})

(8)

X_{M} = (S) W - M S A (X_{n}) + α X_{c} + X

(9)

Y = M L P (LN (X_{M})) + X_{M}

(10)

where

X_{n}, X_{M}

, and

X_{c}

denote the intermediate features,

Y

denotes the output of the SCAB,

L_{N} (*)

denotes the layer normalization operation, and MLP denotes the multilayer perceptron. (S)W-MSA refers to the standard and shifted window multi-headed self-attention module used for the computation of the self-attention module. Given an input feature of size H × W × C, it is first divided into

\frac{H W}{M^{2}}

local windows of size

M \times M

, and then the self-attention is computed within each of these windows. For the local window feature

X_{W} \in R^{M^{2} \times C}

, the query, key, and value matrices are computed by linear mapping to Q, K, and V. The window-based self-attention is then formulated as follows:

Attention (Q, K, V) = S o f t M a x (\frac{Q K^{T}}{\sqrt{d}} + B) V

(11)

where

d

denotes the dimension of the query/key and

B

denotes the relative position encoding. To establish connections between neighboring non-overlapping windows, we utilize the moving window partitioning method and set the move size to half of the window size.

2.4. Efficient Overlapping Attention Block (EOAB)

The OCAB of HAT can overcome the information blocking present in existing STB computations; however, owing to the overlapping attention using a larger overlapping window, this results in a significant increase in computation. Therefore, we propose the EOAB, which differs from the design of the HAT in that the EOAB consists of easy attention (EOA) and an MLP (shown in Figure 4). For EOA, convolution reduces the number of channels of input feature maps by linear transformation, making the computation more efficient. GELU can introduce nonlinearity to help the network learn complex patterns better and enhance the expressive ability of the network, whereas EOA differs from STB in window segmentation, as shown in Figure 5. The EOA uses overlapping window sizes for feature projection segmentation, which is different from the standard window kernel size. In contrast to the standard window kernel size of M and spanning distance of M, our segmentation window size is

M + \frac{M}{2}

and the spanning distance is M. Zero padding is used to ensure the consistency of the overlapping windows. The formula for overlapping window attention is as follows:

Attention (Q_{1}, K_{1}, V_{1}) = SoftMax (\frac{Q_{1} {K_{1}}^{T}}{\sqrt{d}} + B) V_{1}

(12)

where the relative positional deviation is

B \in R^{M \times (M + \frac{M}{2})}

. Thus, our EOAB is fundamentally different from the existing STBs. Standard STB blocks are limited to operations within a single window at each computation, which restricts inter-window interactions and thus triggers the phenomenon of information blocking. Specifically, STBs derive their queries (Q₁), keys (K₁), and values (V₁) from features within the same window, thereby limiting their ability to capture global information. By contrast, EOAB achieves effective interaction between different windows through an overlapping design between windows, effectively solving information blocking and enabling the model to integrate more useful information across windows, thus improving the accuracy and comprehensiveness of the query. In addition, EOAB uses spectral or pixel markers to compute the cross-attention within each window feature, which further enhances the ability of the model to perceive detailed features.

2.5. Data Consistency Module

The data consistency module ensures network stability and consistency under different data distributions or sampling patterns, thereby enabling collaborative learning across domains, which is crucial for cascaded networks. In this study, the DC module is introduced after each RHAN subnetwork and is divided into image data consistency (IDC) and k-space data consistency (KDC), which are applied to different domains of the RHAN after the Fourier transform and inverse Fourier transform, as shown in Figure 6. The specific formulas are as follows:

s_{r e c} (j) = \{\begin{array}{l} \frac{y + λ o u t p u t_{R H A N (j)}}{1 + λ}, & k \in Ω \\ o u t p u t_{R H A N}, & k \notin Ω \end{array}

(13)

where

s_{r e c} (j)

denotes the K-domain data after applying data constraints, y denotes the original reconstructed MRI image data, Ω denotes the index of the undersampled K-space data, and RHAN-NET is the cascaded network proposed in this paper. The data consistency block ensures that the image output reconstructed by the network is consistent with the original measurement data and effectively improves the reconstruction quality by applying data consistency constraints at the undersampled K-space locations.

3. Experimental Details

In Section 3, we first describe the IXI dataset (source, composition, and preprocessing), then outline the eight retrained reconstruction methods, summarize the evaluation results under various acceleration factors, and finally present ablation studies isolating the contributions of the submodules: HSTB, EOAB, SCAB pooling, and DC components.

3.1. Dataset and Participation Settings

Dataset: We conducted the experiment using the public dataset IXI dataset [30]. We used T1-weighted scans from 25 IXI subjects (Hammersmith & Guy’s Hospitals) (see Table 1). After excluding incomplete volumes, the training set contained 2012 images (20 subjects) and the validation set contained 503 images (five subjects). All images were resized to 128 × 128 pixels before input.

Acquisition and training details: Scanner and sequence settings are summarized in Table 2. Key training hyperparameters (learning rate = 1 × 10⁻⁴, epochs = 200, CAB window = 16 with α = 0.01, overlap = 0.5, batch size = 4) and hardware (PyTorch 2.1.0 on NVIDIA A100) appear in Table 3.

3.2. Comparative Methods

To comprehensively verify the effectiveness of the proposed CDDHAN method in image reconstruction tasks, eight representative reconstruction methods were selected for comparative analysis. These methods cover traditional compressed perception techniques, classical deep learning models, and the latest Transformer-based models, including Restormer [23], SwinIR [24], SwinMR [25], HAT [27], CDSCU-Net [28], NAFNet [31], CS [32], and UNet [33]. To ensure the fairness of the comparison, all comparison models were retrained on the same dataset, and the same hyperparameter settings and training details as in the original papers were used. In addition, two commonly used evaluation metrics, the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM), were chosen to assess the reconstruction performance of the models, which are expressed as follows:

PSNR = 10 \times \log_{10} (\frac{{MAX}^{2}}{\frac{1}{m n} \sum_{i = 1}^{m} \sum_{j = 1}^{n} {[x (i, j) - y (i, j)]}^{2}})

(14)

SSIM (x, y) = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(15)

3.3. Result

Table 4 demonstrates the performance of different methods in the MRI-accelerated reconstruction task, including the traditional CS method, the classical deep learning method (U-Net), and the Transformer-based method. While CS and U-Net can improve part of the reconstruction quality at low acceleration factors, their PSNR and SSIM metrics decrease significantly under high acceleration factor conditions, making it difficult to recover complex textures and high-frequency details effectively. By contrast, the Transformer-based method achieves better reconstruction performance by introducing an attention mechanism. Under Radial-5X conditions, the reconstruction results closely match the ground truth; however, at 10× acceleration, these methods still struggle to accurately reconstruct the white region. This limitation likely stems from the traditional single-attention mechanism’s inability to effectively distinguish between similar textures and artifacts. In contrast, CDDHAN achieves a significant lead in both PSNR and SSIM, which is attributed to the adoption of a parallel strategy of channel attention and shift attention as well as the introduction of overlapping attention, which enables fuller learning of image feature information. To analyze the superiority of the proposed algorithm more intuitively, the PSNR and SSIM results are in Figure 7. The figure shows the performance of each method for different acceleration factors (Cartesian 5×, Cartesian 10×, radial 5×, radial 10×). The left and right figures illustrate the PSNR and SSIM values, respectively.

Figure 8 shows the reconstruction effect and error plots of the different methods on the IXI dataset. Under low-sampling conditions, all methods achieve better reconstruction results. However, at high acceleration factors, the zero-filled reconstruction suffers from severe aliasing artifacts, and even compression-aware methods can no longer produce high-quality images. UNET still exhibits obvious artifacts in the edges and high-frequency regions. In contrast, the model using a Transformer improves the artifact problem to some extent but still falls short in detail recovery. In this case, CDSCU-Net performs considerably better in terms of detail recovery and artifact elimination than HAT, CDSCU-Net, Restormer, and NafNet, and the overall image is very close to the real fully sampled image. Although the reconstruction performance is close to that of CDDHAN, CDDHAN achieves superior accuracy in edge regions (e.g., gray-white matter junctions) and complex textures (e.g., intricate brain structures), yielding more natural edge transitions, complete preservation of fine details, and overall better reconstructions compared to other methods. These improvements are also evident in the error maps. CDDHAN shows higher accuracy in edge regions (e.g., gray matter-white matter junc-tion) and complex texture structures (e.g., detailed regions of the brain), resulting in a more natural and clearer edge transition, complete preservation of the detailed texture, and better overall reconstruction than the other methods. This is also observed in the error maps.

In summary, the proposed CDDHAN method performs well under both Cartesian and radial mask conditions, effectively suppresses artifacts, recovers image details, and significantly outperforms other existing methods in the reconstruction task.

3.4. Ablation Study

In this section, we evaluate the effectiveness of each component in the CDDHAN model under radial-5× conditions. This study analyzes the contributions of each proposed module and demonstrates how they improve the overall model performance.

We first evaluated the effectiveness of the network components, which include three sub-models (one of which contains the DC module), HSTB, and EOAB. The results are shown in Table 5. The “✓” symbol in the table indicates that the corresponding module has been added, while blank spaces indicate that the corresponding module has not been added. As we increased the number of submodules, reconstruction quality improved markedly, demonstrating the effectiveness of cascaded learning. The experiments show that the HSTB module significantly enhances model performance, primarily because CAB captures both global channel statistics and highlights salient features, enabling the model to differentiate between similar regions during detailed reconstruction. Moreover, the introduction of the EOAB module yields a noticeable performance gain while reducing the total number of parameters.

Effects of different pooling methods on SCAB: We experimentally verify the effects of different pooling methods on SCAB, as shown in Table 6. The results confirm that “GAP (Global Average Pooling)” is more suitable for the task than “GMP (Global Max Pooling)”. We also tested the combination of GAP and GMP (GAP + GMP) pooling and found that it outperforms the two separate methods by improving the PSNR by about 0.10–0.15 dB.

Effects of DC: We also examined the impact of the DC module on overall network performance. As shown in Table 7, the addition of the DC module to the existing model resulted in a significant improvement in PSNR. The results show that the data consistency module helps the model recover key features more accurately by constraining the network output to be consistent with the sampled data during the reconstruction process, thus effectively improvive the reconstruction performance.

4. Discussion

In this paper, we propose the CDDHAN for accelerated MRI reconstruction. By integrating a hybrid attention mechanism with dual-domain information fusion, CDDHAN addresses two key limitations of Swin Transformer–based methods: limited information exchange caused by sliding-window attention and inadequate recovery of complex background textures under heavy undersampling. The experimental results in Figure 7 show that CDDHAN performs well in suppressing artifacts and recovering complex details, particularly in critical regions such as the brain sulcus and the junction of gray and white matter. The reconstruction results are close to the fully sampled real image, with PSNR and SSIM superior to those of the existing methods, which fully verifies its excellent performance in accelerated reconstruction in MRI. The current limitations of CDDHAN have also been recognized. It is designed mainly for single-coil MRI data, whereas multi-coil systems are widely used in real clinical scenarios, which limits the direct application of the model. In addition, the scalability and computational efficiency of the model for high-dimensional 3D MRI data must be further optimized.

5. Conclusions

In this paper, we present CDDHAN, a hybrid-attention network for accelerated MRI reconstruction. CDDHAN leverages an alternating spatial- and frequency-domain attention mechanism to accurately capture complex texture features, and incorporates an overlapping-window self-attention module to enhance information exchange and aggregation across windows. Extensive undersampling experiments demonstrate that CDDHAN outperforms state-of-the-art methods in both PSNR and SSIM. Ablation studies further confirm the effectiveness of each component: the hybrid attention module alone yields a 0.36 dB PSNR gain; the addition of a data consistency module provides a further 0.17 dB improvement; and the overlapping-window mechanism, while reducing parameter count, contributes an additional 0.04 dB increase. Overall, these results show that the hybrid attention framework has great potential for accelerated MRI reconstruction. However, the direct applicability of CDDHAN to 3D or multi-coil MRI data still requires further validation, and more work is needed to optimize the balance between computational efficiency and robustness. Therefore, in future research, we will focus on multi-coil reconstruction tasks, fully exploiting the advantages of multi-coil data acquisition to further shorten scan times and improve image SNR. At the same time, we will optimize the model architecture to reduce computational costs and promote the real-time clinical deployment of CDDHAN.

Author Contributions

Conceptualization, Y.C.; methodology, Y.C. and Q.D.; software, Y.C. and Q.D.; validation, Y.C. and Q.Y.; formal analysis, Y.C., L.W., and C.Q.; investigation, Y.C. and Q.D.; resources, L.W. and Q.Y.; data curation, Q.D., L.W., and C.Q.; writing—original draft preparation, Y.C.; writing—review and editing, Y.C., Q.D., L.W., C.Q., and Q.Y.; funding acquisition, C.Q. and Q.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Deep Learning-based MRI Image Reconstruction Algorithm Research Project of the Basic Ability Enhancement Program for Young and Middle-aged Teachers (Research) in Guangxi Universities (2024KY1733), the Scientific Research Project of Guilin Institute of Information Technology (XJ202310), the Talent Introduction and Research Start-up Fund of Guilin University of Information Technology (XJ2024014), the Innovative and Applied Curriculum Comprehensive Reform Program of Guilin College of Information Science and Technology (2024ZCRH05), the Research Project on Key Technology Research of Automatic Detection of Book Ladder Marks Based on Machine Vision under the Program of Enhancing the Scientific Research Basic Ability of Young and Middle-aged Teachers in Guangxi Universities (2025KY1072), the Guilin Major Special Project (20220103-1), the Guangxi Science and Technology Base and Talent Special Project (Gui Ke AD24010012), and the Guangxi Key Research and Development Plan (Gui Ke AB23026105).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Arnold, T.C.; Freeman, C.W.; Litt, B.; Stein, J.M. Low-field MRI: Clinical promise and challenges. J. Magn. Reson. Imaging 2023, 57, 25–44. [Google Scholar] [CrossRef]
Letertre, M.P.M.; Giraudeau, P.; de Tullio, P. Nuclear magnetic resonance spectroscopy in clinical metabolomics and personalized medicine: Current challenges and perspectives. Front. Mol. Biosci. 2021, 8, 698337. [Google Scholar] [CrossRef]
van Beek, E.J.; Kuhl, C.; Anzai, Y.; Desmond, P.; Ehman, R.L.; Gong, Q.; Gold, G.; Gulani, V.; Hall-Craggs, M.; Leiner, T.; et al. Value of MRI in medicine: More than just another test? J. Magn. Reson. Imaging 2019, 49, 14–25. [Google Scholar] [CrossRef] [PubMed]
Gómez-Guzmán, M.A.; Jiménez-Beristaín, L.; García-Guerrero, E.E.; López-Bonilla, O.R.; Tamayo-Perez, U.J.; Esqueda-Elizondo, J.J.; Palomino-Vizcaino, K.; Inzunza-González, E. Classifying brain tumors on magnetic resonance imaging by using convolutional neural networks. Electronics 2023, 12, 955. [Google Scholar] [CrossRef]
Chiu, F.-Y.; Yen, Y. Imaging biomarkers for clinical applications in neuro-oncology: Current status and future perspectives. Biomark. Res. 2023, 11, 35. [Google Scholar] [CrossRef] [PubMed]
Deshmane, A.; Gulani, V.; Griswold, M.A.; Seiberlich, N. Parallel MR imaging. J. Magn. Reson. Imaging 2012, 36, 55–72. [Google Scholar] [CrossRef]
Hamilton, J.; Franson, D.; Seiberlich, N. Recent advances in parallel imaging for MRI. Prog. Nucl. Magn. Reson. Spectrosc. 2017, 101, 71–95. [Google Scholar] [CrossRef]
Griswold, M.A.; Jakob, P.M.; Heidemann, R.M.; Nittka, M.; Jellus, V.; Wang, J.; Kiefer, B.; Haase, A. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 2002, 47, 1202–1210. [Google Scholar] [CrossRef]
Lustig, M.; Santos, J.M.; Donoho, D.L.; Pauly, J.M. Compressed sensing MRI. IEEE Signal Process. Mag. 2008, 25, 72–82. [Google Scholar] [CrossRef]
Jaspan, O.N.; Fleysher, R.; Lipton, M.L. Compressed sensing MRI: A review of the clinical literature. Br. J. Radiol. 2015, 88, 20150487. [Google Scholar] [CrossRef] [PubMed]
Li, G.; Lv, J.; Wang, C. A modified generative adversarial network using spatial and channel-wise attention for cs-mri reconstruction. IEEE Access 2021, 9, 83185–83198. [Google Scholar] [CrossRef]
Xie, J.; Zhang, J.; Zhang, Y.; Ji, X. PUERT: Probabilistic Under-Sampling and Explicable Reconstruction Network for CS-MRI. IEEE J. Sel. Top. Signal Process. 2022, 16, 737–749. [Google Scholar] [CrossRef]
Wang, A.Q.; Dalca, A.V.; Sabuncu, M.R. Hyperrecon: Regularization-agnostic cs-mri reconstruction with hypernetworks. In Proceedings of the Machine Learning for Medical Image Reconstruction: 4th International Workshop, MLMIR 2021, Held in Conjunction with MICCAI 2021, Strasbourg, France, 1 October 2021; Proceedings 4. Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 3–13. [Google Scholar]
Fei, T.; Feng, X. Learing sampling and reconstruction using Bregman iteration for CS-MRI. Electronics 2023, 12, 4657. [Google Scholar] [CrossRef]
Ravishankar, S.; Bresler, Y. MR image reconstruction from highly undersampled k-space data by dictionary learning. IEEE Trans. Med. Imaging 2010, 30, 1028–1041. [Google Scholar] [CrossRef]
Su, J.; Xu, B.; Yin, H. A survey of deep learning approaches to image restoration. Neurocomputing 2022, 487, 46–65. [Google Scholar] [CrossRef]
Shao, J.; Zhang, J.; Huang, X.; Liang, R.; Barnard, K. Fiber bundle image restoration using deep learning. Opt. Lett. 2019, 44, 1080–1083. [Google Scholar] [CrossRef]
Schlemper, J.; Caballero, J.; Hajnal, J.V.; Price, A.N.; Rueckert, D. A deep cascade of convolutional neural networks for dynamic MR image reconstruction. IEEE Trans. Med. Imaging 2017, 37, 491–503. [Google Scholar] [CrossRef]
Eo, T.; Jun, Y.; Kim, T.; Jang, J.; Lee, H.; Hwang, D. KIKI-net: Cross-domain convolutional neural networks for reconstructing undersampled magnetic resonance images. Magn. Reson. Med. 2018, 80, 2188–2201. [Google Scholar] [CrossRef]
Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A survey on vision transformer. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 87–110. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021. [Google Scholar]
Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.S. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.V.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021. [Google Scholar]
Huang, J.; Fang, Y.; Wu, Y.; Wu, H.; Gao, Z.; Li, Y.; Del Ser, J.; Xia, J.; Yang, G. Swin transformer for fast MRI. Neurocomputing 2022, 493, 281–304. [Google Scholar] [CrossRef]
Zhao, X.; Yang, T.; Li, B.; Zhang, X. SwinGAN: A dual-domain Swin Transformer-based generative adversarial network for MRI reconstruction. Comput. Biol. Med. 2023, 153, 106513. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating More Pixels in Image Super-Resolution Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
Sheng, J.; Yang, X.; Zhang, Q.; Huang, P.; Huang, H.; Zhang, Q.; Zhu, H. Cascade dual-domain swin-conv-unet for MRI reconstruction. Biomed. Signal Process. Control 2024, 96, 106623. [Google Scholar] [CrossRef]
Xiao, T.; Singh, M.; Mintun, E.; Darrell, T.; Dollár, P.; Girshick, R. Early convolutions help transformers see better. Adv. Neural Inf. Process. Syst. 2021, 34, 30392–30400. [Google Scholar]
Brain Development Organization. IXI Dataset. 2008. Available online: https://brain-development.org/ixi-dataset/ (accessed on 1 March 2024).
Chen, L.; Chu, X.; Zhang, X.; Sun, J. Simple baselines for image restoration. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2022. [Google Scholar]
Lustig, M.; Donoho, D.; Pauly, J.M. Sparse MRI: The application of compressed sensing for rapid MR imaging. Magn. Reson. Med. 2007, 58, 1182–1195. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Munich, Germany, 5–9 October 2015; Volume 9351, pp. 234–241. [Google Scholar]

Figure 1. Overview of CDDHAN. (a) Overall CDDHAN structure; (b) RHAN sub-model structure.

Figure 2. Residual difference subnetwork (RHAN).

Figure 3. Structure of the HSTB. (a) Hybrid attention block (HSTB); (b) displacement channel attention block (SCAB).

Figure 4. Efficient overlapping attention block (EOAB) structure.

Figure 5. OSTB window division.

Figure 6. Data consistency module.

Figure 7. Performance comparison of different methods. (a) PSNR; (b) SSIM.

Figure 8. Qualitative comparison of the different methods on the IXI dataset, evaluated under different mask conditions. Subfigure (a) shows the results for the Cartesian mask R = 5, subfigure (b) shows the results for the Cartesian mask R = 10, subfigure (c) shows the results for the radial mask R = 5, and subfigure (d) shows the results for the radial mask R = 10. The second column of each subfigure shows the corresponding error plot, with the red boxes highlighting the advantages of CDDHAN over the competing methods in these regions.

Table 1. Dataset Composition.

Split	Subjects	Valid Images	Remarks
Training set	20	2012	Randomly from Hammersmith & Guy’s
Validation set	5	503	Randomly from Hammersmith & Guy’s
Total	25	2515	–

Table 2. Acquisition parameters.

Hospital	Field Strength and System	TR/TE (ms)	Remarks
Hammersmith Hospital	Philips 3 T	9.6/4.6	Used in training and validation
Guy’s Hospital	Philips 1.5 T	9.813/4.603	Used in training and validation
Institute of Psychiatry (London)	GE 1.5 T	–/–	Scan parameters unavailable

Table 3. Training parameters and hardware environment.

Parameter	Value	Notes
Learning rate	1 × 10⁻⁴ (Unitless)	Base optimizer LR
Number of epochs	200 (Cycles)	Full training cycles
Window size	16 (Pixels)	Size of sliding window in CAB module
CAB module weight (α)	0.01 (Unitless)	Controls attention branch contribution
Overlap rate	0.5 (Fraction)	Fractional overlap between windows
Batch size	4 (Images)	Images per GPU batch
Training time	40 h (Time)	Time for 200 epoch experimental training
Framework and hardware	PyTorch on NVIDIA A100	Training environment

Table 4. Quantitative results of different methods on the IXI dataset.

Method	Cartesian				Radial
	Cartesian 5×		Cartesian 10×		Radial 5×		Radial 10×
	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Zero-Filling	25.64	0.43	27.56	0.31	27.65	0.56	26.57	0.41
CSMRI	29.54	0.73	28.36	0.62	32.38	0.91	30.17	0.82
U-Net	31.63	0.86	30.40	0.78	34.16	0.93	32.33	0.89
SwinIR	34.54	0.94	32.98	0.91	37.24	0.97	32.97	0.93
SwinMR	34.68	0.94	33.00	0.92	37.43	0.97	33.18	0.93
HAT	34.93	0.95	33.48	0.93	38.73	0.97	34.58	0.94
CDSCU-Net	35.32	0.95	33.16	0.93	38.15	0.97	35.15	0.96
Restormer	35.36	0.95	33.82	0.92	38.39	0.97	35.06	0.95
NafNet	35.48	0.96	34.04	0.94	38.01	0.97	35.63	0.96
CDDHAN	37.94	0.97	34.28	0.94	39.27	0.98	36.19	0.97

Table 5. Ablation study of the proposed modules in CDDHAN.

Component					PSNR (dB)	Paras (m)
Model1	Model2	Model3	HSTB	EOAB	PSNR (dB)	Paras (m)
✓					38.28	24.8
✓	✓				38.57	32.5
✓	✓	✓			38.91	38.9
✓	✓	✓	✓		39.23	43.6
✓	✓	✓	✓	✓	39.27	40.5

Table 6. Ablation Study on SCAB Pooling Method.

Structure	Null	GAP	GMP	GAP + GMP
PSNR (dB)	38.91	39.17	39.12	39.27

Table 7. Ablation Study of the DC module in CDDHAN.

Structure	RHAN	RHAN + DC
PSNR (dB)	38.54	38.71

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cai, Y.; Dong, Q.; Qiu, C.; Wang, L.; Yu, Q. Cascaded Dual Domain Hybrid Attention Network. Symmetry 2025, 17, 1020. https://doi.org/10.3390/sym17071020

AMA Style

Cai Y, Dong Q, Qiu C, Wang L, Yu Q. Cascaded Dual Domain Hybrid Attention Network. Symmetry. 2025; 17(7):1020. https://doi.org/10.3390/sym17071020

Chicago/Turabian Style

Cai, Yujia, Qingyu Dong, Cheng Qiu, Lubin Wang, and Qiang Yu. 2025. "Cascaded Dual Domain Hybrid Attention Network" Symmetry 17, no. 7: 1020. https://doi.org/10.3390/sym17071020

APA Style

Cai, Y., Dong, Q., Qiu, C., Wang, L., & Yu, Q. (2025). Cascaded Dual Domain Hybrid Attention Network. Symmetry, 17(7), 1020. https://doi.org/10.3390/sym17071020

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Cascaded Dual Domain Hybrid Attention Network

Abstract

1. Introduction

2. Methods

2.1. Network Structure of CDDHAN

2.2. Residual Hybrid Attention Network (RHAN)

2.3. Hybrid Swin-Transformer Attention Block (HSTB)

2.4. Efficient Overlapping Attention Block (EOAB)

2.5. Data Consistency Module

3. Experimental Details

3.1. Dataset and Participation Settings

3.2. Comparative Methods

3.3. Result

3.4. Ablation Study

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI