Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network

Liu, Xin; Huang, Chuangxin; Meng, Jianli; Chen, Qi; Ji, Wuzheng; Wang, Qiuliang

doi:10.3390/ai6110291

Open AccessArticle

Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network

by

Xin Liu

^1,*,

Chuangxin Huang

^1,2,

Jianli Meng

¹

,

Qi Chen

¹,

Wuzheng Ji

¹ and

Qiuliang Wang

^1,2,3

¹

Ganjiang Innovation Academy, Chinese Academy of Sciences, Ganzhou 341000, China

²

College of Rare Earth Elements, University of Science and Technology of China, Hefei 230026, China

³

Institute of Electrical Engineering, Chinese Academy of Sciences, Beijing 100190, China

^*

Author to whom correspondence should be addressed.

AI 2025, 6(11), 291; https://doi.org/10.3390/ai6110291

Submission received: 21 October 2025 / Revised: 8 November 2025 / Accepted: 12 November 2025 / Published: 14 November 2025

Download

Browse Figures

Versions Notes

Abstract

Magnetic Resonance Imaging (MRI) serves as a pivotal medical diagnostic technique widely deployed in clinical practice, yet high-resolution reconstruction frequently introduces motion artifacts and degrades signal-to-noise ratios. To enhance imaging efficiency and improve reconstruction quality, this study proposes a Transformer network-based super-resolution framework for MRI images. The methodology integrates Nonuniform Fast Fourier Transform (NUFFT) with a hybrid-attention Transformer network to achieve high-fidelity reconstruction. The embedded NUFFT module adaptively applies density compensation to k-space data based on sampling trajectories, while the Mixed Attention Block (MAB) activates broader pixel engagement to amplify feature extraction capabilities. The Interactive Attention Block (IAB) facilitates cross-window information fusion via overlapping windows, effectively suppressing artifacts. Evaluated on the fastMRI dataset under 4× radial undersampling, the network demonstrates 3.52 dB higher PSNR and 0.21 SSIM improvement over baselines, outperforming state-of-the-art methods across quantitative metrics. Visual assessments further confirm superior detail preservation and artifact suppression. This work establishes an effective pipeline for high-quality radial MRI reconstruction, providing a novel technical pathway for low-field MRI systems with significant research and application value.

Keywords:

MRI; super-resolution; reconstruction; transformer network

1. Introduction

Conventional MRI reconstruction methodologies struggle with computational burden and prolonged processing times when handling complex non-Cartesian sampled data, failing to meet real-time imaging demands [1,2,3]. Furthermore, reconstruction quality is constrained by hardware limitations and sampling strategies—particularly under low-SNR conditions where resolution deteriorates—making simultaneous acceleration and quality preservation a critical challenge [4,5]. Recent advances in artificial intelligence have established non-Cartesian sampling coupled with deep learning as a pivotal research direction in MRI reconstruction [6,7]. Deep learning techniques autonomously extract features from data, demonstrating superior adaptability for irregular data patterns [8]. Unlike traditional approaches reliant on handcrafted sparsity constraints, deep learning achieves optimal reconstruction through end-to-end training strategies [9]. More significantly, deep learning methods substantially enhance image quality in low-field MRI reconstruction, automatically restoring fine details, mitigating ghosting artifacts, and boosting resolution within low-SNR environments [10].

In recent years, differentiable NUFFT layers, physics-guided unrolled networks, and Transformer-augmented non-Cartesian reconstruction frameworks have emerged as significant advancements in MRI image reconstruction [11,12,13]. These approaches aim to bridge the gap between traditional signal processing techniques and deep learning models, enabling more accurate and efficient reconstructions. For instance, differentiable NUFFT layers have been widely adopted to integrate the MRI data acquisition process into end-to-end deep learning models [14]. In their 2024 work published in Medical Image Analysis, Khatami et al. [15] proposed a novel differentiable NUFFT framework that enables gradient-based optimization of the reconstruction process, significantly improving the quality of undersampled MRI images. Similarly, physics-guided unrolled networks have gained attention for their ability to incorporate domain-specific knowledge into deep learning models. For example, Zumbo et al. [16] introduced a physics-guided unrolled network in Nature Machine Intelligence that leverages the MRI forward model to iteratively refine image estimates, achieving state-of-the-art performance in non-Cartesian MRI reconstruction.

Transformer-based architectures have also been explored for MRI reconstruction, particularly in handling non-Cartesian data [17,18]. Transformer’s architectural flexibility enables seamless integration with complementary frameworks to enhance reconstruction quality [19]. Feng et al. [20] unified MRI reconstruction and super-resolution within a multi-modal Transformer for multi-contrast MRI imaging. However, standard Transformers process images via isolated patches, inhibiting edge pixels from accessing contextual inter-patch information. Addressing this, Chen et al. [21] developed the Hybrid Attention Transformer (HAT), enhancing SwinIR through a dual-attention mechanism that synergizes channel attention and self-attention, delivering significant performance gains. NC-PDNet (Tanabene et al.) [22] and ReconFormer (Guo et al.) [23] are notable examples that integrate Transformer networks into the reconstruction pipeline to exploit long-range dependencies and improve image quality. Additionally, non-Cartesian self-supervised learning approaches, such as those proposed by Xu et al. [24], have demonstrated the potential of learning-based methods without relying on paired training data, further advancing the field of MRI reconstruction. While these methods have shown impressive results, they often face challenges in balancing computational efficiency and reconstruction accuracy, particularly for high-resolution MRI images. Our purpose is that addresses these limitations by introducing adaptive compensation mechanisms and dynamic attention designs, enabling more flexible and efficient handling of diverse MRI reconstruction tasks while maintaining high fidelity.

Building upon this foundation, this study proposes a Transformer-based hybrid-attention network for MRI super-resolution reconstruction. To address texture degradation and edge blurring in low-field MRI, a dual-phase enhancement architecture integrating traditional NUFFT with the Transformer framework is developed, leveraging NUFFT-derived priors to guide network learning. The Transformer incorporates a Mixed Attention Block (MAB) fusing localized features and an Interactive Attention Block (IAB) enabling cross-window communication. MAB activates broader pixel engagement to strengthen feature extraction capabilities, while IAB facilitates inter-window interaction for contextual refinement. Quantitative and qualitative evaluations on the fastMRI dataset demonstrate the model’s superiority over state-of-the-art methods in detail preservation, artifact suppression, and all assessment metrics, achieving significant enhancements in resolution and structural fidelity.

2. Theoretical Basis

2.1. Principle of NUFFT Image Reconstruction

As illustrated in Figure 1, Radial sampling generates nonuniform k-space data incompatible with direct IFFT-based MRI reconstruction, necessitating the determination of Cartesian grid values from adjacent radial sampling points. To convert this nonuniform data into a format amenable to Fourier transformation, convolutional interpolation with specialized kernels is applied, followed by resampling onto Cartesian grids—a process efficiently implemented via Nonuniform Fast Fourier Transform (NUFFT).

NUFFT constitutes an efficient methodology for processing non-Cartesian sampled data by performing Fourier transforms directly on nonuniform sampling points, thereby eliminating error propagation inherent in the interpolation steps of conventional gridding methods. The mathematical formulation of this interpolation procedure is specified in Equation (1).

f (x) = \sum_{i} ω (x - k_{i}) e^{i 2 π k_{i} \cdot x}

(1)

Here,

f (x)

denotes the reconstructed value in the image domain,

k_{i}

represents the k-space coordinates of nonuniform sampling points,

ω (\cdot)

is the interpolation kernel function, and

x

indicates the spatial coordinate in the image domain. Upon interpolating these nonuniform sampling points, NUFFT leverages the FFT algorithm to compute the reconstructed image. While traditional Fourier transform approaches become computationally prohibitive for nonuniform data, NUFFT’s interpolation step harnesses the computational efficiency of FFT, enabling rapid processing of large-scale non-Cartesian sampled datasets within feasible timeframes.

Integrating NUFFT gradients into end-to-end training requires careful implementation to support backpropagation through the NUFFT operator. This is typically achieved by leveraging libraries such as torchkbnufft or by designing custom differentiable kernels that compute the forward and adjoint NUFFT operations while maintaining the ability to propagate gradients. For instance, torchkbnufft provides an efficient and differentiable implementation of NUFFT and its adjoint, enabling seamless integration into neural network architectures.

2.2. Principle of Transformer Network

Transformer leverages its self-attention mechanism to effectively model global dependencies among elements within input sequences, enhancing feature representation expressiveness and contextual awareness. The self-attention mechanism enables models to incorporate global context by considering all elements when processing any position in a sequence. Computationally, input sequences undergo linear transformations into Query (Q), Key (K), and Value (V) matrices. Unlike RNNs/LSTMs where long-range information propagation is inherently inefficient, self-attention directly models global dependencies to dramatically enhance computational efficiency, with its mathematical procedure formulated in Equation (2).

Attention (Q, K, V) = softmax (\frac{{QK}^{T}}{\sqrt{d_{k}}}) V

(2)

where,

d_{k}

denotes the dimensionality of keys, the core computation of self-attention involves calculating similarity scores between query and key vectors, applying softmax normalization to derive attention weights, and performing a weighted summation of value vectors. This allows the model to dynamically adjust attention distributions across input positions, thereby effectively capturing global patterns within complex data structures.

Given identical sets of queries, keys, and values, the model independently learns distinct behavioral patterns through the same attention mechanism, subsequently integrating these patterns as composite knowledge to capture dependencies across varying ranges within sequences. Each set of such matrices (

W_{Q} {, W}_{K} {, W}_{V}

) constitutes an Attention Head, with each Transformer layer incorporating multiple heads. The multi-head attention mechanism—a pivotal innovation in Transformers—splits matrices into subspaces for independent self-attention computation, enabling each head to attend to distinct semantic subspaces and thereby capture richer contextual information. The concatenated outputs from all heads undergo a final linear transformation to yield the integrated result (Figure 2), mathematically formalized in Equation (3).

{Multi - Head Attention (Q, K, V) = Concat (head}_{1} {, head}_{2}, \dots {, head}_{h} {) W}^{O}

(3)

Here, h denotes the number of heads,

{head}_{i} {= Attention (QW}_{i}^{Q} {, KW}_{i}^{K} {, VW}_{i}^{V})

represents the attention output per head, and

W^{O}

is the output weight matrix. The multi-head attention mechanism enhances model performance by enabling parallelized learning across distinct subspaces, thereby significantly boosting the model’s representation capacity.

3. Design of Image Super Resolution Reconstruction Network

3.1. Overall Network Structure Design

Building on the foundational theories established in Section 1, this work proposes a Transformer-based Hybrid-Attention Network for image super-resolution reconstruction, with its architectural framework schematically illustrated in Figure 3.

As illustrated in Figure 4, this Radial sampling MRI reconstruction framework comprises two core components: (1) an Adaptive Density Compensation NUFFT module, and (2) an Image Super-Resolution Network integrating shallow feature extraction, deep feature extraction, and image reconstruction modules. The workflow initiates by applying NUFFT to preprocessed k-space data for initial reconstruction, followed by shallow and deep feature extraction via Transformer networks, feature fusion through a global residual connection module, and ultimately generating high-resolution outputs via the reconstruction module—with the end-to-end super-resolution procedure detailed as follows.

Initially, radial-sampled k-space data

k_{n c}

undergoes processing through a density-compensated NUFFT module to yield low-resolution input image data

I_{l r} \in R^{H \times W \times C^{i n}}

for the network. Subsequent shallow feature extraction employs a convolutional layer

H_{C o n v} (\cdot)

with kernel size

3 \times 3

, stride = 1 and padding = 1 to derive shallow features

F_{0} \in R^{H \times W \times C}

, mathematically formalized in Equation (4).

F_{0} = H_{C o n v} (I_{l r})

(4)

Here,

C_{i n}

and

C

denote the channel counts of input and intermediate features respectively. The shallow feature extraction module learns low-level image signals, mapping input features from low-dimensional to high-dimensional spaces. This is followed by deep feature extraction comprising Residual Multi-Attention Groups (RMAGs) and a convolutional layer

H_{C o n v} (\cdot)

with kernel size

3 \times 3

to derive depth features

F_{D} \in R^{H \times W \times C}

, formally expressed in Equation (5).

\{\begin{matrix} F_{i} = H_{R M A G_{i}} (F_{i - 1}), i = 1, 2, \dots, N \\ F_{D} = H_{C o n v} (F_{N}) \end{matrix}

(5)

Here,

H_{R M A G_{i}} (\cdot)

denotes the i-th RMAG module, where each RMAG’s output serves as input to the subsequent RMAG, denoted

F_{i - 1}

and

F_{i}

respectively. Every RMAG integrates three core components: a cascaded Multi-Attention Block (MAB) capturing long-range dependencies, an Intersective Attention Block (IAB) enabling multi-scale feature fusion, and a residual-connected convolutional layer that aggregates output depth features

F_{i}

for enhanced representation learning.

Subsequently, a global residual connection module fuses shallow features

F_{0}

with deep features

F_{D}

post deep-feature extraction, ultimately enabling the image reconstruction module to map features to high-resolution images via sub-pixel convolution—formulated mathematically in Equation (6).

I_{h r} = H_{Re c} (F_{0} + F_{D})

(6)

Here,

H_{Re c} (\cdot)

denotes the image reconstruction module, which employs two convolutional layers with kernel size

3 \times 3

and a Pixel Shuffle layer to upsample fused features. During network training, parameters are optimized using an L1 loss function, with each input image

k_{n c}

and its corresponding ground-truth reconstruction

I_{g t}

forming training pairs

k_{n c} - I_{g t}

. Iterative multi-epoch training minimizes discrepancies between reconstructed high-resolution images

I_{h r}

and ground-truth images

I_{g t}

, while Section 4 details structural designs and functionalities of core modules.

3.2. The Main Modules of the Network

3.2.1. Adaptive Density Compensated NUFFT Module

Unlike Cartesian sampling reconstruction where FFT has its inverse IFFT, the adjoint operator of NUFFT is not necessarily its inverse—in fact, NUFFT generally lacks an invertible operator. Consequently, applying NUFFT adjoint to k-space data necessitates density compensation, which fundamentally weighs sampling factors at different spatial positions to ensure their uniform contribution during adjoint NUFFT application.

The non-uniform Fourier transform (NUFFT) algorithm employed in this work comprises two principal operators: a forward NUFFT operator

{nufft}_{op}

that emulates non-uniform k-space data from uniformly sampled Cartesian measurements, and an adjoint NUFFT operator

{adj}_{op}

that reconstructs images by regridding non-uniform data onto regular grids. For radial sampling trajectories, traditional methods suffer from excessive central k-space sampling density—when computing the adjoint operator, this overrepresented high-energy region is assigned disproportionate weights, consequently amplifying reconstruction artifacts in the image domain.

This module’s density compensation design optimizes data distribution through an adaptive weighting strategy, effectively reducing computational complexity in subsequent convolutional interpolation while enhancing reconstruction accuracy. The technique’s core innovation lies in partitioned k-space compensation: in densely sampled central regions, measurements are multiplied by weights < 1 to alleviate computational burden from data redundancy; in sparsely sampled peripheral regions, weights > 1 amplify high-frequency signal components. This spatially variant weighting strategy dynamically modulates compensation coefficients based on the actual sampling trajectory’s density profile, enabling adaptability to diverse non-uniform undersampling patterns—with the entire process formally captured by Equation (7).

\{\begin{array}{l} d_{0} = 1 \\ d_{n + 1} = \frac{d_{n}}{F_{Ω} F_{Ω}^{H} d_{n}} \end{array}

(7)

Here,

d_{0}

denotes the initial compensation factor and

d_{n + 1}

denotes updated counterpart at the (n + 1)-th iteration, while

F_{Ω}

represents the forward NUFFT operator and

F_{Ω}^{H}

the adjoint NUFFT operator for radial MRI reconstruction. Crucially, density compensation factor d is obtained by iteratively applying operators

F_{Ω}^{H}

and

F_{Ω}

initialized from unity, which dynamically refines radial trajectory compensation.

Equation (7) demonstrates that the density compensation in this module is sampling-density-dependent, constituting an adaptive scaling mechanism distinct from traditional fixed compensation schemes, with detailed NUFFT workflow outlined in Figure 4.

From Figure 4, the NUFFT layer in our proposed framework is learnable, meaning that its parameters, including the density compensation factors, can be optimized during training. This adaptive mechanism is implemented through a combination of precomputed density compensation weights and learned adjustments, ensuring that the NUFFT layer remains flexible and capable of handling diverse MRI datasets.

3.2.2. Mixed Attention Module MAB

The Residual Multi-Attention Group (RMAG) integrates a cascaded Multi-Attention Block (MAB) and an Intersective Attention Block (IAB), with the subsequent elucidation centering on the MAB module—whose internal architecture is detailed in Figure 5.

The Multi-Attention Block (MAB) integrates a Channel-Focused Block (CFB) for spatial-channel refinement, a (Shifted) Window-based Multi-head Self-Attention ((S)W-MSA) module capturing local window dependencies, Layer Normalization (LN), a Multilayer Perceptron (MLP), and residual connections. Figure 6 augments the standard Transformer module by incorporating the convolutional CFB to enhance representational capacity, where—as depicted—CFB operates in parallel with (S)W-MSA after the first LN layer within the Swin Transformer framework, with (S)W-MSA strengthening local window attention features. To mitigate optimization conflicts between CFB and (S)W-MSA, a scaling coefficient α dynamically adjusts CFB’s output features.

Figure 6 details the Channel-Focused Block (CFB) design, which employs a dual cascaded convolutional structure integrated with a channel attention mechanism for feature refinement. The processing flow comprises: (1) feature compression and expansion, where input features with C channels undergo two convolutional layers to compress and recover channel dimensions; (2) adaptive channel scaling via a Channel Attention (CA) module using global max pooling and convolutional layers; (3) calibrated output of channel-weighted features. For input features

F

, two residual connections in MAB integrate CFB/(S)W-MSA outputs with

F

, and fuse intermediate features

F_{M}

(from prior residuals) with MLP outputs Y—effectively mitigating gradient vanishing. The MLP further enhances nonlinear representational capacity, with the complete MAB computation formalized as Equation (8).

\{\begin{array}{l} F_{N} = L N (F) \\ F_{M} = (S) M S A (F_{N}) + α C F B (F_{N}) + F \\ Y = M L P (L N (F_{M})) + F_{M} \end{array}

(8)

Here,

F_{N}

denotes intermediate features after the first LayerNorm,

F_{M}

represents intermediate features from the initial residual connection, and

Y

signifies the final MAB output via the Multilayer Perceptron (MLP). Window size critically governs self-attention efficacy—appropriately enlarging it substantially enhances Transformer performance. Conventional approaches confine self-attention to small windows, relying on shifted window mechanisms to progressively expand the receptive field, which reduces computational costs but compromises attention capacity. To stimulate more input pixels for stronger image reconstruction, our MAB employs large-window self-attention (

16 \times 16

), transcending the perceptual limitations of traditional small-window (

8 \times 8

) designs.

To compute self-attention in the (S)W-MSA module, an input feature map of size

H \times W \times C

is partitioned into non-overlapping local patches of dimension

M \times M

, yielding

HW / M^{2}

distinct windows. Within each window, self-attention operates on features

F_{W} \in R^{M^{2} \times C}

by deriving query (Q), key (K), and value (V) matrices through three linear transformations, as formally computed in Equation (9).

Attention (Q, K, V) = Softmax ({QK}^{T} / \sqrt{d} + B) V

(9)

Here, d denotes the query/key dimension while B serves as a relative positional bias encoding spatial positioning. To enhance information exchange between adjacent non-overlapping windows, a shifted window strategy deploys half-window displacements, achieving dual advantages: preserving the computational efficiency of standard window partitioning while enabling cross-window feature fusion.

3.2.3. Cross Attention Module IAB

The Intersective Attention Block (IAB) architecture, analogous to standard Swin Transformer modules, incorporates two Layer Normalization (LN) layers, an Overlapping Cross-Attention (OCA) layer, an MLP, and dual residual connections, as structurally detailed in Figure 7. This module directly establishes cross-window interactions to enhance channel-wise attention representation. Since conventional non-overlapping window partitioning in OCA restricts adjacent-window communication, IAB’s input features F are segmented using heterogeneous window strategies: features a are partitioned into d non-overlapping windows of size c via standard division, while features e are unfolded into g overlapping windows of size f through expanded window dimensions—a process formally defined by Equation (10).

M_{o} = (1 + γ) \times M

(10)

Here,

γ

serves as a tunable constant governing window overlap size. By leveraging this cross-window self-attention mechanism, the IAB activates substantially more pixels for reconstruction, dramatically expanding the effective receptive field to enhance image recovery fidelity.

3.2.4. Image Reconstruction Module and Loss Function

The image reconstruction module transforms extracted features into high-resolution MRI images through a lightweight, efficient architecture employing upsampling convolutional blocks. This design utilizes two

3 \times 3

convolutional layers for feature fusion and channel adjustment to match target image dimensions, interleaved with a Pixel Shuffle layer for progressive resolution enhancement. Pixel Shuffle operates via sub-pixel convolution to reassemble low-resolution inputs into high-resolution outputs, delivering superior computational efficiency and reduced interpolation artifacts compared to bilinear interpolation or transposed convolution—significantly improving reconstruction fidelity.

To optimize model training for generating super-resolution MR images that closely approximate ground-truth high-resolution references, an L1 loss function is adopted for the reconstruction module. For a given training image pair

{I_{l r}^{i}, I_{g t}^{i}}_{i = 1}^{N}

, the optimization objective

L_{Re c}

formally minimizes the absolute pixel-wise discrepancy, as defined in Equation (11).

L_{Re c} = \frac{1}{N} \sum_{i}^{N} {‖H_{S R} (I_{l r}^{i}) - I_{g t}^{i}‖}_{1}

(11)

Here,

I_{h r} = H_{S R} (I_{l r})

denotes the reconstructed high-resolution MR image while

I_{g t}

represents the ground-truth reference. The adoption of L1 loss over L2 loss critically preserves structural integrity and suppresses over-smoothing artifacts, yielding outputs with sharper edges and enhanced textural fidelity. Moreover, L1’s reduced sensitivity to outliers significantly improves training stability by mitigating undue influence from extreme errors. This optimization strategy ensures robust recovery of high-resolution anatomical structures and fine-grained details critical for diagnostic reliability.

4. Experimental Research

4.1. Data Sets and Evaluation Indicators

4.1.1. Experimental Data Set

This study utilizes the publicly accessible fastMRI dataset [25] as the foundation for model development and validation. Recognized as one of the most authoritative open databases in medical imaging, fastMRI was jointly released by NYU Langone Health and Facebook AI Research (FAIR) to advance deep learning research in MRI reconstruction. The dataset comprises fully-sampled k-space data from 1.5T and 3T MRI scanners, covering knee and brain MRI acquisitions. Knee data is further categorized into single-coil and multi-coil subsets, with comprehensive statistics detailed in Table 1.

Table 1 delineates the composition of the fastMRI dataset, with this study primarily utilizing the single-coil knee subset for experiments; ground-truth images are reconstructed as single-channel complex-valued data via Inverse Fast Fourier Transform (IFFT) applied to raw multi-channel k-space acquisitions, comprising 973 volumetric 2D knee scans for training and 199 volumes for validation, all stored in HDF5 format where each volume contains over 30 axial slices per knee, with slice attributes including Kspace (raw k-space data), reconstruction_rss (root sum square reconstruction), and reconstruction_esc (emulated single-coil reconstruction) to enable comprehensive knee image reconstruction.

Following rigorous curation, the training set comprises 34,742 slices and the test set 7135 slices, where raw k-space dimensions (variable across slices) are standardized to (640, 400) experimentally while reconstruction_esc IFFT-reconstructed images maintain a uniform 320 × 320 resolution. Figure 8 visualizes full-slice reconstructions from volume file1000001.h5, demonstrating that initial slices (e.g., #1–15) consistently exhibit partial anatomical coverage—such as isolated distal femur or proximal tibia structures—rather than complete knee cross-sections, a pervasive pattern across all volumes necessitating systematic exclusion of invalid slices during preprocessing to address fastMRI’s inherent data distribution characteristics.

Table 2 shows the specific properties of single coil knee joint data.

As delineated in Table 2, the fastMRI dataset provides Cartesian grid-based emulated single-coil k-space data subjected to sparse sampling via predefined undersampling masks; however, these predetermined sampling patterns fundamentally conflict with our non-Cartesian radial acquisition protocol in both sampling trajectories and k-space distribution characteristics, where direct adoption would induce experimental bias and methodological evaluation inconsistency—necessitating a tailored preprocessing pipeline to generate compatible training/test datasets through the following steps:

Raw Data Extraction: Retrieve fully-sampled k-space data from .h5 files and discard all invalid slices dominated by noise or devoid of meaningful signals;
Data Normalization: Normalize each slice’s k-space magnitude spectrum to a maximum amplitude of 1;
Radial Sampling Emulation: Generate radial undersampling trajectories via golden-angle non-uniform spacing and resample the fully-sampled k-space data accordingly;
Data Augmentation: Enhance model robustness and generalization by applying random ±10° rotations and ±10% scaling transformations to images during training;
Low-Resolution Synthesis: Crop the 320 × 320 ground-truth images into overlapping patches through downsampling for network training.

To rigorously assess algorithm performance in authentic clinical environments, an external validation cohort comprising 50 additional knee MRI scans acquired from third-party multi-vendor scanners was established; these datasets utilized protocol-matched parameters (e.g., field strength, TR/TE) comparable to fastMRI but implemented standard radial undersampling trajectories—forming a dual-validation strategy (internal simulated + external clinical) to ensure comprehensive generalizability assessment and cross-platform reliability verification for real-world deployment.

To emulate authentic MRI acquisition constraints while enhancing scanning efficiency, this study leverages k-space redundancy characteristics to perform undersampling, utilizing a forward Non-Uniform Fast Fourier Transform (NUFFT) operator to apply radial trajectory undersampling to fully-sampled k-space data; Table 3 details the sampling patterns of test slices (640 × 400 k-space) under varying acceleration factors and radial schemes—where final sampled point counts are jointly determined by trajectory design and acceleration rates—with uniform radial parameterization across all experiments, while Figure 9 visually contrasts fully-sampled k-space with representative radial sampling trajectories.

4.1.2. Evaluating Indicator

While conventional MRI reconstruction metrics like Normalized Mean Squared Error (NMSE) and Signal-to-Noise Ratio (SNR) offer computational efficiency, their pixel-wise evaluation often overlooks holistic structural integrity—prompting our adoption of Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity Index (SSIM) as the core performance assessment framework, where PSNR quantifies pixel-level accuracy via the logarithmic ratio of peak signal intensity to mean squared error, and SSIM holistically evaluates perceptual quality through human vision-inspired luminance, contrast, and structure comparisons.

As a quantitatively precise metric derived from grayscale statistical analysis, the Peak Signal-to-Noise Ratio (PSNR) outperforms human subjective perception in objective assessment. For image super-resolution tasks, PSNR is mathematically defined via the logarithmic relationship between an image’s maximum pixel intensity and its Mean Squared Error (MSE), as formalized in Equation (12).

{PSNR (\overset{⌢}{v}, v) = 10 \log}_{10} \frac{{\max (v)}^{2}}{MSE (\overset{⌢}{v}, v)}

(12)

Among them,

\overset{⌢}{v}

is the reconstructed image,

ν

is the real image, and

MSE (\overset{⌢}{v}, v)

is the mean square error between

\overset{⌢}{v}

and

ν

, which is defined as

1 / (n | | \overset{⌢}{v} - v | |_{2}^{2})

. This metric serves merely as a supplementary quantitative reference in practical reconstruction evaluations, since its exclusive focus on pixel-level mean squared error between images fundamentally neglects visual perceptual coherence—a critical determinant of diagnostic utility.

The Structural Similarity Index (SSIM) quantifies inter-image similarity by modeling spatial dependencies among neighboring pixels, with its mathematical formulation between given images

\overset{⌢}{m}

and

m

formalized in Equation (13).

SSIM (\overset{⌢}{m}, m) = \frac{{(2 μ}_{\overset{⌢}{m}} μ_{m} {+ c}_{1} {) (2 σ}_{\overset{⌢}{m} m} {+ c}_{2})}{{(μ}_{\overset{⌢}{m}}^{2} {+ μ}_{m}^{2} {+ c}_{1} {) (σ}_{\overset{⌢}{m}}^{2} {+ σ}_{m}^{2} {+ c}_{2})}

(13)

where

μ_{\overset{⌢}{m}}

and

μ_{m}

represent the mean values of the two images

\overset{⌢}{m}

and

m

,

σ_{\overset{⌢}{m}}^{2}

and

σ_{m}^{2}

denote their variances, and

σ_{\overset{⌢}{m} m}

indicates their covariance. The two constants are defined as

c_{1} = {(k_{1} L)}^{2}, c_{2} = {(k_{2} L)}^{2}

. In this experiment, k₁ = 0.01 and k₂ = 0.03. SSIM evaluates image reconstruction quality based on the human visual system, where values closer to 1 indicate higher inter-image similarity and superior reconstruction quality.

4.2. Experimental Results and Analysiss

The specific experimental workflow is illustrated in Figure 10. First, necessary preprocessing is applied to the fastMRI dataset; next, the k-space data undergo initial reconstruction via the NUFFT module; finally, the low-resolution images paired with ground-truth images serve as input pairs fed into the network for both super-resolution processing and quantitative evaluation.

4.2.1. Model Parameters and Training Settings

Within the NUFFT module, to evaluate the performance of the designed NUFFT operator on the fastMRI dataset, the module is first utilized to generate preliminary reconstructed images from the fastMRI training set, with detailed parameter configurations specified in Table 4.

In terms of the super-resolution network architecture, to benchmark against Swin Transformer-based networks, this study employs controlled experimentation to systematically evaluate the impact of varying network depths—specifically different quantities of RMAG and MAB —on reconstruction performance. All comparative experiments maintain consistent core parameters: the attention mechanism configuration uses 6 attention heads for (S)-WMSA in MAB modules and sets the window size to 16 for OCA in IAB modules. Hyperparameters for the proposed modules include an output weight coefficient

α = 0.01

and feature channel scaling factor

β = 3

in the CFB, along with an OCA cross-window overlap ratio

γ = 0.5

.

Primary experiments and ablation studies for the image super-resolution task were conducted using the fastMRI dataset for both training and validation, with quantitative evaluation metrics employing PSNR and SSIM calculated on the luminance (Y) channel. Given the ground-truth image size of 320 × 320 in the dataset, input images were cropped to 160 × 160 for 2× super-resolution (×2 SR) and 80 × 80 for 4× super-resolution (×4 SR). For ×4 SR tasks, models were initialized with pre-trained weights from ×2 SR, while halving both the iteration interval for learning rate decay and the total training iterations.

The deep learning model proposed in this paper was developed within a PyTorch 1.7.1 virtual environment on a computational platform configured as follows: hardware environment using two NVIDIA A40 GPUs with 48 GB VRAM each, paired with a 32-core AMD EPYC 7543 processor and 80 GB system memory, operating under the Ubuntu 16 software environment. During model training, a mini-batch gradient descent strategy was employed with a batch size of 8 per iteration for 500,000 epochs, utilizing the Adam optimizer (β₁ = 0.9 β₂ = 0.99) with an initial learning rate of 2 × 10⁻⁴ and implementing a step decay schedule that reduced the learning rate to 50% of its current value at the 250K, 400K, 450K, and 475K iterations. This integrated hardware configuration and training strategy ensured convergence stability while maximizing training efficiency.

4.2.2. Ablation Experiment

This section employs ablation studies to validate the effectiveness and necessity of distinct module designs within the network, including the impact of window size dimensions, the essentiality of the CFB and IAB modules, the efficacy of parametric configurations, and the influence of factors such as network scale on magnetic resonance image super-resolution outcomes.

(1): The Effect of Window Size in MAB

In window-based self-attention mechanisms, expanding the window size activates more input pixels for the network, thereby enhancing performance. To isolate the impact of newly introduced blocks, experiments were conducted directly on the preliminary version of SwinIR to examine how window dimensions affect Transformer performance in image super-resolution. While prior studies only investigated window sizes below 12 × 12 on natural image datasets, this work further compares four distinct window dimensions (8 × 8, 16 × 16, 24 × 24, and 32 × 32) for their influence on magnetic resonance image reconstruction capability. The results are shown in Table 5.

As demonstrated in Table 5, increasing the window size from 8 × 8 to 16 × 16 elevates PSNR on the fastMRI dataset from 27.53 dB to 27.68 dB, demonstrating measurable performance gains attributable to the 16 × 16 window model accessing broader contextual information compared to the 8 × 8 variant. However, further enlargement to 24 × 24 and 32 × 32 yields diminishing returns in PSNR improvement. Crucially, larger window dimensions exponentially increase computational overhead—the 32 × 32 window necessitates explicit VRAM optimization to mitigate memory pressure. Experiments confirm the 16 × 16 configuration achieves an optimal balance between reconstruction fidelity and computational efficiency.

Results indicate that models employing the large 16 × 16 window size exhibit superior quantitative evaluation metrics, providing empirical evidence that expanding the window dimension effectively enhances Transformer performance. Based on this conclusion, all subsequent experiments adopt 16 × 16 as the default window configuration.

(2): Effectiveness of CFB and IAB Modules

To validate the efficacy of the designed Channel Fusion Block (CFB) and Inter-scale Attention Block (IAB) in the network architecture, ablation studies for 2× super-resolution were conducted on Set5 and Set14 datasets, with results presented in Table 6. Compared to the baseline, both modules enhance image super-resolution performance: CFB augments the network’s capacity for long-range dependency modeling through channel-wise computation, while IAB strengthens local feature extraction. Their synergistic integration delivers a performance gain of 0.1 dB.

(3): The influence of CFB weighting factor and IAB overlap rate

The two critical hyperparameters

α

and

γ

significantly impact network performance. The weighting coefficient

α

controls the contribution of the CFB module within MAB feature fusion—higher

α

amplifies CFB-extracted features, while

α = 0

disables CFB entirely. The constant

γ

regulates window overlap size in IAB modules, with

γ = 0

indicating standard Transformer configuration. To investigate their effects, experiments evaluated discrete value sets ranging from 0 to 1 for

α

and 0 to 0.75 for

γ

, with detailed designs and performance outcomes presented in Table 7.

As evidenced in Table 7, the model achieves optimal performance at hyperparameter configurations

α = 0.01

and

γ = 0.5

, indicating that the CFB and self-attention mechanisms synergize most effectively when the CFB branch’s influence is strategically modulated. Conversely, setting

γ

= 0.25 or 0.75 yields no significant improvement and even degrades performance, demonstrating that inappropriate overlap sizes fail to enhance cross-window interactions, while excessive overlap introduces computational redundancy that compromises efficiency. Consequently, these empirically optimized settings are adopted for all subsequent experiments.

(4): Comparison of Model Sizes

To achieve optimal image reconstruction performance, experiments further compared models with varying quantities of RMAG layers and MAB modules, specifically sweeping the number of RMAG layers from 1 to 3, each containing 1 to 3 MAB modules; quantitative results are presented in Table 8, while convergence behavior is illustrated in Figure 11.

Table 8 demonstrates that configuring three MAB modules per RMAG layer yields optimal PSNR/SSIM metrics, attributable to enhanced fusion of global-local features through hybrid attention mechanisms, enabling deeper hierarchical MRI feature learning. Conversely, the optimal RMAG layer count is two, as excessive layers induce overfitting, degrading generalization across test sets including diverse MRI modalities.

Figure 12 shows the variation trend of PSNR and SSIM indexes on the observation test set.

Figure 11 reveals that the network exhibits significant performance improvements during early training stages, with gains gradually converging as iterations progress. Beyond 25 epochs, the performance curve enters a plateau phase where continued training yields only marginal enhancements. This indicates full network convergence by epoch 25—further training constitutes computational resource waste. Analysis confirms that the proposed hybrid attention mechanism achieves rapid convergence, with performance ceilings primarily constrained by physical limitations of Radial sampling rather than network capacity. All comparative results were obtained on test sets, validating the method’s efficacy.

4.2.3. Contrast Experiment

To evaluate the performance of the proposed Radial undersampled MRI super-resolution network, the experiment will compare the following schemes:

(1): Traditional NUFFT without deep learning (implemented via tfkbnufft [26]);
(2): U-Net-based single-coil baseline model from fastMRI [25];
(3): Non-Cartesian Primal-Dual Network (NCPDNet) with density compensation [27];
(4): Transformer-based HAT super-resolution network [28];
(5): The proposed method (ours).

For comparative experiments, 2D single-coil knee data from the fastMRI dataset underwent standardized processing, with fully sampled k-space volumes uniformly resampled to dimensions (640 readouts × 400 projections). Cartesian-sampled data were reconstructed via IFFT into 320 × 320 high-resolution reference images (label a). During preprocessing, radial undersampling with golden-angle trajectories was applied to k-space at acceleration factors AF = 4 and AF = 6. The NUFFT-reconstructed low-resolution images (label b) were then downsampled and cropped at 2× and 4× super-resolution ratios to generate input-target image pairs c, with the complete workflow illustrated in Figure 12.

The single-coil U-Net baseline model provided with the fastMRI dataset comprises two deep convolutional networks—a downsampling path and an upsampling path. The downsampling path consists of two 3 × 3 convolutional blocks, each containing a normalization layer and ReLU activation. U-Net modules are interleaved with downsampling operations implemented through stride-2 max pooling layers that halve spatial dimensions. The upsampling path mirrors this structure but differs by concatenating skip connections to the first convolutional layer of each module. Finally, a series of 1 × 1 convolutions at the upsampling path’s terminus reduce channel depth to unity while preserving spatial resolution.

Figure 13 presents quantitative comparisons between the proposed method and alternative schemes under 4× and 6× accelerated radial sampling trajectories, evaluated across the entire test set. Results indicate that the traditional tfkbnufft reconstruction prioritizes computational efficiency with fast processing, yet fundamentally relies on convolutional interpolation—a non-exact solution inducing inherent approximation errors. Consequently, it manifests severe aliasing artifacts and image blurring in undersampled regimes, yielding significantly lower PSNR/SSIM metrics.

Compared to NUFFT, the U-Net baseline improves PSNR/SSIM metrics but fails to incorporate k-space density compensation, resulting in insufficient local detail recovery and suboptimal quantitative performance. NCPDNet integrates non-uniform sampling compensation by using density compensation factors to homogenize k-space data and optimizes reconstruction via deep neural networks. This dual strategy yields superior quantitative results versus U-Net. However, its generalization across undersampling patterns is limited, exhibiting suboptimal performance on the fastMRI dataset. HAT, a transformer-based super-resolution network, leverages hybrid attention mechanisms to enhance detail recovery. While outperforming CNNs in capturing long-range pixel dependencies and SR tasks, its deep architecture and large parameter count prolong training time, and its limited generalization for MRI reconstruction.

Experimental results demonstrate that the proposed super-resolution network—integrating NUFFT foundations, density compensation, and hybrid attention mechanisms—outperforms baselines and competing methods in radial undersampled MRI super-resolution. Specifically, on the fastMRI test set at 4× acceleration, it achieves +3.52 dB PSNR and +0.21 SSIM versus the U-Net baseline. Under 6× acceleration, gains reach +1.8 dB PSNR and +0.17 SSIM. It also surpasses NCPDNet and HAT in both metrics. Furthermore, compared to HAT, our model exhibits 48% fewer parameters, 68% shorter training time, and comparable inference speed while maintaining superior PSNR/SSIM and rapid reconstruction capabilities.

To visually compare reconstruction efficacy across methods, Figure 14 and Figure 15 present results under 4× and 6× radial undersampling, with ground truth rows displaying fully sampled raw knee MRI references, tfkbnufft rows showing NUFFT-reconstructed low-resolution images, U-net rows exhibiting baseline outputs, alongside NCPDNet and HAT—two state-of-the-art deep learning reconstructions on fastMRI—and our proposed method’s results. Each volume in the test set exhaustively sampled an entire knee, with figures showcasing representative slices.

Figure 16 and Figure 17 present local detail comparisons of reconstruction results under 4×/6× radial acceleration for representative fastMRI test slices.

From Figure 16 and Figure 17, all methods exhibit pronounced radial streaking artifacts versus ground truth, attributable to high acceleration factors trading off reconstruction quality for speed. NUFFT reconstructions suffer from severe radial artifacts and blurring due to k-space interpolation limitations, resulting in critical edge degradation and low resolution. While U-Net reduces some artifacts through standard convolutions, it demonstrates limited adaptability to non-uniform sampling, manifesting as residual edge blurring and texture loss, particularly in high-frequency soft-tissue regions. NCPDNet’s density compensation improves artifact suppression over U-Net but retains measurable resolution loss. Although HAT excels in edge recovery for generic SR tasks, its direct processing of low-resolution images without NUFFT-based k-space conditioning causes structural hallucination errors in complex anatomical regions.

The above results coincide with the quantitative results in Figure 14. The following conclusions can be verified from the results: The proposed network integrates NUFFT foundations, adaptive density compensation, and transformer-based hybrid attention, achieving superior edge acuity and texture detail recovery versus comparators while maintaining high computational efficiency. Specifically, the Transformer leverages hybrid attention for enhanced spatial feature extraction, coupled with Pixel Shuffle for resolution upscaling. Visual results confirm that NUFFT reconstructions exhibit substantial artifacts and low resolution under radial undersampling, whereas deep learning methods (U-Net/NCPDNet/HAT) show moderate improvements. Our methodology—synthesizing non-uniform Fourier transforms, density compensation, and hybrid attention—demonstrates optimal efficacy in reconstructing radially undersampled MRI data, effectively suppressing artifacts while recovering high-resolution anatomical details.

5. Conclusions

This study addresses the challenge of low-resolution reconstruction in radial undersampled MRI by proposing a novel Transformer-based super-resolution framework with hybrid attention mechanisms, systematically evaluating its performance through comprehensive experiments. The core innovation lies in integrating NUFFT’s proficiency in radial data processing with Transformer networks to enhance reconstruction quality. Experimental results demonstrate superior nuanced feature extraction capabilities across radial acceleration factors, effectively suppressing artifacts while outperforming existing methods in PSNR/SSIM metrics. The approach significantly improves resolution and detail fidelity in MRI reconstruction, offering new pathways for advanced MRI reconstruction techniques.

Future work will extend this methodology to multi-coil MRI reconstruction and explore its generalization capabilities across diverse anatomical imaging tasks. Furthermore, we will validate the proposed method’s applicability across broader clinical scenarios, including diverse MRI field strengths (e.g., 1.5T/3T/7T) and scanning protocols for heterogeneous pathologies (e.g., neurodegenerative disorders, oncology, musculoskeletal injuries).

Author Contributions

Conceptualization, X.L. and Q.W.; methodology, X.L.; software, X.L. and W.J.; validation, J.M. and X.L.; formal analysis, Q.W. and X.L.; investigation, Q.W. and X.L.; resources, X.L.; data curation, X.L. and C.H.; writing—original draft preparation, X.L.; writing—review and editing, X.L. and Q.C.; visualization, X.L.; supervision, Q.W.; project administration, Q.W.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Informed Consent Statement

Not Applicable.

Data Availability Statement

All data generated or analyzed during this study are included in this published article.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yu, X.; Zhou, N.; Zheng, J.; Liang, M.; Qiu, L.; Xu, Q. Mixture density knowledge distillation in super-resolution reconstruction of mri medical images. Med. Eng. Phys. 2025, 139, 104330. [Google Scholar] [CrossRef]
Rahman, T.; Bilgin, A.; Cabrera, S.D. Multi-channel MRI reconstruction using cascaded Swinμ transformers with overlapped attention. Phys. Med. Biol. 2025, 70, 075002. [Google Scholar] [CrossRef]
Cui, C.; Jung, K.J.; Al-Masni, M.A.; Kim, J.-H.; Kim, S.-Y.; Park, M.; Huang, S.Y.; Chun, S.Y.; Kim, D.-H. Deep Network Regularization for Phase-Based Magnetic Resonance Electrical Properties Tomography With Stein’s Unbiased Risk Estimator. IEEE Trans. Biomed. Eng. 2025, 72, 43–55. [Google Scholar] [CrossRef] [PubMed]
Yadav, S.; Kumar, U. A Hybrid Dehazing and Illumination Based Approach for Preprocessing, Enhancement and Segmentation of Lung Images Using Deep Learning. Trait. Du Signal 2025, 42, 1161. [Google Scholar] [CrossRef]
Wu, T.; Liu, S.; Zhang, H.; Zeng, T. Estimation-Denoising Integration Network Architecture With Updated Parameter for MRI Reconstruction. IEEE Trans. Comput. Imaging 2025, 11, 142–153. [Google Scholar] [CrossRef]
Zhang, D.; Lin, L.; Deng, C. Advanced Imaging Strategies Based on Intelligent Micro/Nanomotors. Cyborg Bionic Syst. 2025, 6, 0384. [Google Scholar] [CrossRef]
Plummer, J.W.; Hussain, R.; Bdaiwi, A.S.; Soderlund, S.A.; Hoyos, X.; Lanier, J.M.; Garrison, W.J.; Parra-Robles, J.; Willmering, M.M.; Niedbalski, P.J. A decay-modeled compressed sensing reconstruction approach for non-Cartesian hyperpolarized 129Xe MRI. Magn. Reson. Med. 2024, 92, 13. [Google Scholar] [CrossRef] [PubMed]
Jurgita, I.; Justina, I.; Farnaz, L.S. A New Methodology for Evaluation of Large Vestibular Aqueduct in CT and MRI Images. Otol. Neurotol. 2024, 45, 440–446. [Google Scholar] [CrossRef]
Cai, X.; Hou, X.; Sun, S.N.S. Accelerating image reconstruction for multi-contrast MRI based on Y-Net3+. J. X-Ray Sci. Technol. 2023, 31, 797–810. [Google Scholar] [CrossRef]
Almansour, H.; Herrmann, J.; Gassenmaier, S.; Afat, S.; Jacoby, J.; Koerzdoerfer, G.; Nickel, D.; Mostapha, M.; Nadar, M.; Othman, A.E. Deep Learning Reconstruction for Accelerated Spine MRI: Prospective Analysis of Interchangeability. Radiology 2023, 306, 8. [Google Scholar] [CrossRef]
Dane, B.; Bagga, B.; Bansal, B.; Beier, S.; Kim, S.; Reddy, A.; Fenty, F.; Keerthivasan, M.; Chandarana, H. Accelerated T2-weighted MRI of the Bowel at 3T Using a Single-shot Technique with Deep Learning-based Image Reconstruction: Impact on Image Quality and Disease Detection. Acad. Radiol. 2025, 32, 210–217. [Google Scholar] [CrossRef]
Li, J.; Xia, Y.; Zhou, T.; Dong, Q.; Lin, X.; Gu, L.; Jiang, S.; Xu, M.; Wan, X.; Duan, G.; et al. Accelerated Spine MRI with Deep Learning Based Image Reconstruction: A Prospective Comparison with Standard MRI. Acad. Radiol. 2025, 32, 2121–2132. [Google Scholar] [CrossRef]
Choi, Y.; Ko, J.S.; Park, J.E.; Jeong, G.; Seo, M.; Jun, Y.; Fujita, S.; Bilgic, B. Beyond the Conventional Structural MRI: Clinical Application of Deep Learning Image Reconstruction and Synthetic MRI of the Brain. Investig. Radiol. 2025, 60, 16. [Google Scholar] [CrossRef]
Li, Y.; Qi, H.; Hu, Z.; Sun, H.; Li, G.; Zhang, Z.; Liu, Y.; Guo, H.; Chen, H. Deep Convolutional Neural Network Enhanced Non-uniform Fast Fourier Transform for Undersampled MRI Reconstruction. Int. J. Comput. Vis. 2025, 133, 4158–4176. [Google Scholar] [CrossRef]
Khatami, H.R.; Riahi, M.A.; Abedi, M.M.; Dehkhargani, A.A. A comparative study over improved fast iterative shrinkage-thresholding algorithms: An application to seismic data reconstruction. Stud. Geophys. Geod. 2024, 68, 61. [Google Scholar] [CrossRef]
Zumbo, S.; Mandija, S.; Meliadò, E.F.; Stijnman, P.; Meerbothe, T.G.; Berg, C.A.v.D.; Isernia, T.; Bevacqua, M.T. Unrolled Optimization via Physics-Assisted Convolutional Neural Network for MR-Based Electrical Properties Tomography: A Numerical Investigation. IEEE Open J. Eng. Med. Biol. 2024, 5, 505–513. [Google Scholar] [CrossRef] [PubMed]
Pham, T.V.; Vu, T.N.; Le, H.M.Q.; Pham, V.-T.; Tran, T.-T. CapNet: An Automatic Attention-Based with Mixer Model for Cardiovascular Magnetic Resonance Image Segmentation. J. Digit. Imaging 2025, 38, 94–123. [Google Scholar] [CrossRef]
Davarani, M.N.; Darestani, A.A.; Caas, V.G.; Harirchian, M.H.; Zarei, A.; Havadaragh, S.H.; Hashemi, H. Enhanced Segmentation of Active and Nonactive Multiple Sclerosis Plaques in T1 and FLAIR MRI Images Using Transformer-Based Encoders. Int. J. Imaging Syst. Technol. 2025, 35, 1–14. [Google Scholar] [CrossRef]
Touvron, H.; Cord, M.; Douze, M.; Massa, F.; Sablayrolles, A.; Jegou, H. Training data-efficient image transformers & distillation through attention. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; Volume 139, pp. 10347–10357. [Google Scholar]
Feng, C.M.; Yan, Y.; Fu, H.; Chen, L.; Xu, Y. Task transformer network for joint MRI reconstruction and super-resolution. In Proceedings of the Medical Image Computing and Computer Assisted Intervention–MICCAI 2021: 24th International Conference, Strasbourg, France, 27 September–1 October 2021; Part VI 24. Springer International Publishing: Cham, Switzerland, 2021; pp. 307–317. [Google Scholar]
Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
Tanabene, A.; Radhakrishna, C.G.; Massire, A.; Nadar, M.S.; Ciuciu, P. Benchmarking 3D multi-coil NC-PDNet MRI reconstruction. In Proceedings of the IEEE 22nd International Symposium on Biomedical Imaging (ISBI), Houston, TX, USA, 14–17 April 2025. [Google Scholar]
Guo, P.; Mei, Y.; Zhou, J.; Jiang, S.; Patel, V.M. ReconFormer: Accelerated MRI Reconstruction Using Recurrent Transformer. IEEE Trans. Med. Imaging 2024, 43, 12. [Google Scholar] [CrossRef]
Xu, S.; Früh Marcel Hammernik, K.; Lingg, A.; Kübler, J.; Krumm, P.; Rueckert, D.; Gatidis, S.; Küstner, T. Self-supervised feature learning for cardiac Cine MR image reconstruction. IEEE Trans. Med. Imaging 2025, 44, 3858–3869. [Google Scholar] [CrossRef]
Zbontar, J.; Knoll, F.; Sriram, A.; Murrell, T.; Huang, Z.; Muckley, M.J.; Defazio, A.; Stern, R.; Johnson, P.; Bruno, M. fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv 2018, arXiv:1811.08839. [Google Scholar]
Muckley, M.J.; Stern, R.; Murrell, T.; Knoll, F. TorchKbNufft: A high-level, hardware-agnostic non-uniform fast Fourier transform. In Proceedings of the ISMRM Workshop on Data Sampling & Image Reconstruction, Sedona, AZ, USA, 26–29 January 2020; p. 22. [Google Scholar]
Ramzi, Z.; Ciuciu, P.; Starck, J.L. XPDNet for MRI reconstruction: An application to the fastMRI 2020 brain challenge. arXiv 2020, arXiv:2010.07290. [Google Scholar]
Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 22367–22377. [Google Scholar]

Figure 1. Schematic diagram of grid interpolation.

Figure 2. Multi-head attention.

Figure 3. Image super-resolution network structure based on Transformer hybrid attention mechanism.

Figure 4. Adaptive density compensated NUFFT process.

Figure 5. The physical structure of MAB.

Figure 6. The physical structure of CFB.

Figure 7. The physical structure of IAB.

Figure 8. All slices in volume file1000001.h5.

Figure 9. Fully sampled K-space data and radial undersampling.

Figure 10. Experimental procedure of super resolution reconstruction of radial sampling data.

Figure 11. Generate a process for inputting images to the network.

Figure 12. Quantitative comparison of evaluation indicators in different iterations.

Figure 13. Comparison of quantitative results of different reconstruction schemes.

Figure 14. Reconstruction results of different schemes at AF = 4.

Figure 15. Reconstruction results of different schemes at AF = 6.

Figure 16. Comparison of different reconstruction schemes at AF = 4.

Figure 17. Comparison of different reconstruction schemes at AF = 6.

Table 1. Composition of fastMRI data set.

fastMRI Datasets	Volumes		Slices
fastMRI Datasets	Multi-Coil	Single-Coil	Multi-Coil	Single-Coil
training	973	973	34,742	34,742
validation	199	199	7135	7135
test	118	108	4092	3903

Table 2. Properties of each volume of data in fastMRI.

Datasets	Column Properties	Size	Introduce
Trainingsets	Kspace	(number of slices, height, width)	Simulate single coil K space data
	reconstruction_rss	(number of slices, 320, 320)	The reconstruction of the single coil K-space cut to the central region is summed
	reconstruction_esc	(number of slices, 320, 320)	The ground truth of the real reconstructed image data is cropped to the central region
Validationsets	Kspace	(number of slices, height, width)	Simulate single coil K space data
	reconstruction_rss	(number of slices, 320, 320)	The reconstruction of the single coil K-space cut to the central region is summed
	reconstruction_esc	(number of slices, 320, 320)	The ground truth of the real reconstructed image data is cropped to the central region
Testsets	Kspace	(number of slices, height, width)	Simulate single coil K space data
	Mask	(width, 1)	The Cartesian undersampling mask

Table 3. Radial sampling Settings.

Radial Sampling	Number of Sampling Points	Sampling Length L	Number of Sampling Lines N
accelerating factor AF = 4	64,000	640	100
accelerating factor AF = 6	42,240	640	66

Table 4. The parameter settings for NUFFT module.

Parameter	Value
scale_factor	$10^{6}$
oversampling_factor	$2^{10}$
image_size	(640,400)
width	2.34

Table 5. Effect of window size on self-attention module in MAB PSNR (dB).

Window Size	Set5 Datasets	Set14 Datasets	fastMRI Datasets
8 × 8	32.88	29.09	27.53
16 × 16	32.97	29.12	27.68
24 × 24	33.04	29.16	27.71
32 × 32	33.11	29.18	27.66

Table 6. Evaluation of the effectiveness of CFB and IAB modules.

Module	Different Combinations of Modules
CFB	✗	✓	✗	✓
IAB	✗	✗	✓	✓
PSNR (dB)	34.14	34.19	34.21	34.25

Table 7. The effect of CFB weighting factor and IAB overlap rate.

$α$	0	0.01	0.1	1
PSNR(dB)	27.81	27.97	27.90	27.86
$γ$	0	0.25	0.50	0.75
PSNR(dB)	27.85	27.81	27.91	27.86

Table 8. Quantitative comparison of performance of different network sizes.

RMAGs	MABs	PSNR(dB)	SSIM
1	1	32.1254	0.8512
	2	32.3182	0.8526
	3	32.5846	0.8557
2	1	32.469	0.8555
	2	32.7404	0.8566
	3	32.7832	0.8579
3	1	32.6585	0.8559
	2	32.6597	0.8563
	3	32.7128	0.8574

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, X.; Huang, C.; Meng, J.; Chen, Q.; Ji, W.; Wang, Q. Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network. AI 2025, 6, 291. https://doi.org/10.3390/ai6110291

AMA Style

Liu X, Huang C, Meng J, Chen Q, Ji W, Wang Q. Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network. AI. 2025; 6(11):291. https://doi.org/10.3390/ai6110291

Chicago/Turabian Style

Liu, Xin, Chuangxin Huang, Jianli Meng, Qi Chen, Wuzheng Ji, and Qiuliang Wang. 2025. "Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network" AI 6, no. 11: 291. https://doi.org/10.3390/ai6110291

APA Style

Liu, X., Huang, C., Meng, J., Chen, Q., Ji, W., & Wang, Q. (2025). Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network. AI, 6(11), 291. https://doi.org/10.3390/ai6110291

Article Menu

Super-Resolution Reconstruction Approach for MRI Images Based on Transformer Network

Abstract

1. Introduction

2. Theoretical Basis

2.1. Principle of NUFFT Image Reconstruction

2.2. Principle of Transformer Network

3. Design of Image Super Resolution Reconstruction Network

3.1. Overall Network Structure Design

3.2. The Main Modules of the Network

3.2.1. Adaptive Density Compensated NUFFT Module

3.2.2. Mixed Attention Module MAB

3.2.3. Cross Attention Module IAB

3.2.4. Image Reconstruction Module and Loss Function

4. Experimental Research

4.1. Data Sets and Evaluation Indicators

4.1.1. Experimental Data Set

4.1.2. Evaluating Indicator

4.2. Experimental Results and Analysiss

4.2.1. Model Parameters and Training Settings

4.2.2. Ablation Experiment

4.2.3. Contrast Experiment

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI