A Hybrid Residual Attention Convolutional Neural Network for Compressed Sensing Magnetic Resonance Image Reconstruction

We propose a dual-domain deep learning technique for accelerating compressed sensing magnetic resonance image reconstruction. An advanced convolutional neural network with residual connectivity and an attention mechanism was developed for frequency and image domains. First, the sensor domain subnetwork estimates the unmeasured frequencies of k-space to reduce aliasing artifacts. Second, the image domain subnetwork performs a pixel-wise operation to remove blur and noisy artifacts. The skip connections efficiently concatenate the feature maps to alleviate the vanishing gradient problem. An attention gate in each decoder layer enhances network generalizability and speeds up image reconstruction by eliminating irrelevant activations. The proposed technique reconstructs real-valued clinical images from sparsely sampled k-spaces that are identical to the reference images. The performance of this novel approach was compared with state-of-the-art direct mapping, single-domain, and multi-domain methods. With acceleration factors (AFs) of 4 and 5, our method improved the mean peak signal-to-noise ratio (PSNR) to 8.67 and 9.23, respectively, compared with the single-domain Unet model; similarly, our approach increased the average PSNR to 3.72 and 4.61, respectively, compared with the multi-domain W-net. Remarkably, using an AF of 6, it enhanced the PSNR by 9.87 ± 1.55 and 6.60 ± 0.38 compared with Unet and W-net, respectively.


Introduction
Magnetic resonance imaging (MRI) is a noninvasive and sophisticated medical imaging method that sheds light on the anatomical structure and operation of the human body and brain [1] by generating high-quality images. Each tissue's unique properties can be recognized using a novel MRI acquisition method. Based on the different tissue signal diversities, MRI reconstructs quantitative images that are very important for the early diagnosis of sickness or physical changes. MRI does not entail exposure to harmful radiation [2], unlike X-rays, photoacoustic tomography, and computed tomography. However, the long image acquisition time [3] makes it challenging to use MRI in time-sensitive situations, such as in cases of stroke, although acquisition time can be reduced given that MRI systems allow for comprehensive control of data acquisition. Acquired frequencies in MRI are stored in k-space instead of image space. K-space is a matrix the same size as the reconstructed image that stores complex (real and imaginary) raw MRI data. Every point in this matrix holds a portion of the data needed to create the entire image. The periphery of k-space possesses high spatial frequency that depicts information concerning image edges, details, and sharp transitions. On the other hand, the central area of k-space retains the high spatial frequency that expresses the image at its brightest. Fully sampled k-space is essential for obtaining high-resolution images but increases acquisition time. Acquiring a few frequencies is one of the most popular methodologies for rapid MRI reconstruction. However, due to undersampling, tissue structures are often distorted, and aliasing artifacts the current layer's output. This skip connection facilitates network learning and leads to improved performance. The ResNet architecture has demonstrated performance in various tasks, including image classification, object detection, and semantic segmentation.
diagram of non-residual and residual CNNs is shown in Figure 1. Traditional CNNs a severe flaw in that they must learn the total feature map, which requires a large nu of parameters. As a result, they are costly to train and slow to run. Residual netw (ResNets) are a type of neural network developed as an enhancement over regular C It is a form of CNN in which the previous layer's input is added to the current la output. This skip connection facilitates network learning and leads to improved pe mance. The ResNet architecture has demonstrated performance in various tasks, in ing image classification, object detection, and semantic segmentation. Here, we introduce an advanced dual-domain MRI reconstruction approach uses a complex-valued residual attention convolutional neural network (RA-CN modify the multi-domain W-net architecture through residual connectivity and atte mechanisms for frequency and image domains. Recently, the attention mechanism been effectively used for computer-based medical diagnosis. CNNs can easily incorp this mechanism to automatically highlight salient elements. The proposed complex ued RA-CNN is comprised of two subnetworks: one each for the sensor and imag mains. First, the sensor domain subnetwork predicts the unmeasured frequencies space to reduce aliasing artifacts. Second, the image domain subnetwork performs a p wise operation to remove blur and noise artifacts. Skip connections efficiently concat the feature maps to alleviate the vanishing gradient problem that occurs as network d increases, thus preventing interruption of the network training procedure. Integratio communication between the two subnetworks promotes data consistency and more cient learning of the features from both domains. At the same time, the attention gate in each decoder layer enhances network generalizability and speeds up image recons tion by eliminating irrelevant activations, without an iteration process. Therefore, the posed technique reconstructs real-valued clinical images from sparsely sampled k- Here, we introduce an advanced dual-domain MRI reconstruction approach that uses a complex-valued residual attention convolutional neural network (RA-CNN) to modify the multi-domain W-net architecture through residual connectivity and attention mechanisms for frequency and image domains. Recently, the attention mechanism has been effectively used for computer-based medical diagnosis. CNNs can easily incorporate this mechanism to automatically highlight salient elements. The proposed complex-valued RA-CNN is comprised of two subnetworks: one each for the sensor and image domains. First, the sensor domain subnetwork predicts the unmeasured frequencies of k-space to reduce aliasing artifacts. Second, the image domain subnetwork performs a pixel-wise operation to remove blur and noise artifacts. Skip connections efficiently concatenate the feature maps to alleviate the vanishing gradient problem that occurs as network depth increases, thus preventing interruption of the network training procedure. Integration and communication between the two subnetworks promotes data consistency and more efficient learning of the features from both domains. At the same time, the attention gate (AG) in each decoder layer enhances network generalizability and speeds up image reconstruction by eliminating irrelevant activations, without an iteration process. Therefore, the proposed technique reconstructs real-valued clinical images from sparsely sampled k-space that are identical to the fully sampled k-space images. These images appear to be of higher quality than those produced using other dual-domain cascade networks or single-domain methods. Complex-valued data augmentation (DA) is also applied to overcome data scarcity issues.

Proposed Methodology
According to the Nyquist-Shannon theorem [39], discrete Fourier methods can only access information when the sampling rate is double the bandwidth of the recorded continuous-time signal. MRI relies heavily on this Nyquist rate. If pertinent preceding information is acquirable, reconstruction techniques other than Fourier analysis can be used to retrieve valuable information from sparsely sampled data below this rate. Hence, we developed an RA-CNN technique that calculates intrinsic correlations between fully sampled and undersampled k-spaces, along with their reconstructions. The RA-CNN consists of two neural subnetworks: sensor domain and image domain networks. These subnetworks are trained from end to end through sparsely sampled complex-valued MRI data. The skip connections of residual CNNs perform better than traditional CNNs and execute much faster. Added novel attention gates with each skip connection help to extract the more important features from both the signals and images. The k-spaces and images reconstructed by these subnetworks are similar to fully sampled k-spaces and images. Figure 2 shows the workflow of our proposed multi-domain MRI reconstruction method, i.e., the RA-CNN, which uses two RA Unet blocks connected by an IFFT. that are identical to the fully sampled k-space images. These images appear to be of higher quality than those produced using other dual-domain cascade networks or single-domain methods. Complex-valued data augmentation (DA) is also applied to overcome data scarcity issues.

Proposed Methodology
According to the Nyquist-Shannon theorem [39], discrete Fourier methods can only access information when the sampling rate is double the bandwidth of the recorded continuous-time signal. MRI relies heavily on this Nyquist rate. If pertinent preceding information is acquirable, reconstruction techniques other than Fourier analysis can be used to retrieve valuable information from sparsely sampled data below this rate. Hence, we developed an RA-CNN technique that calculates intrinsic correlations between fully sampled and undersampled k-spaces, along with their reconstructions. The RA-CNN consists of two neural subnetworks: sensor domain and image domain networks. These subnetworks are trained from end to end through sparsely sampled complex-valued MRI data. The skip connections of residual CNNs perform better than traditional CNNs and execute much faster. Added novel attention gates with each skip connection help to extract the more important features from both the signals and images. The k-spaces and images reconstructed by these subnetworks are similar to fully sampled k-spaces and images. Figure 2 shows the workflow of our proposed multi-domain MRI reconstruction method, i.e., the RA-CNN, which uses two RA Unet blocks connected by an IFFT. The undersampled k-space (K u ) is generated by elementwise multiplication between the entire k-space (K) and the sub-sampling mask (U). First, the sensor/frequency network S net tries to estimate the unmeasured frequencies by reducing frequency loss between the fully sampled and reconstructed k-space. The initial image Î 0 is then reconstructed by IFFT from the output of the first network. Finally, blurring and noise artifacts are removed by the image/spatial domain network I net , which is achieved by reducing pixel disparity between the reconstructed final output (R) and the fully sampled reference image (T).

Deep Learning
DL methods accelerate MRI reconstruction and improve image quality through the signal projection of undefined regions. This interpolation removes aliasing artifacts by satisfying the Nyquist rate. Here, a DL model reconstructs unique suitable images by fitting the sparsely sampled data. Notably, our approach does not generalize the signal to indeterminable regions, unlike conventional band-limited signal extrapolation techniques [40] and low-rank modeling of local k-space neighborhoods [41].
The proposed method is described below in terms of the sensor domain network, IFFT operation, and image domain network. The undersampled k-space (K u ) is generated by elementwise multiplication between the entire k-space (K) and the sub-sampling mask (U). First, the sensor/frequency network S net tries to estimate the unmeasured frequencies by reducing frequency loss between the fully sampled and reconstructed k-space. The initial imageÎ 0 is then reconstructed by IFFT from the output of the first network. Finally, blurring and noise artifacts are removed by the image/spatial domain network I net , which is achieved by reducing pixel disparity between the reconstructed final output (R) and the fully sampled reference image (T).

Deep Learning
DL methods accelerate MRI reconstruction and improve image quality through the signal projection of undefined regions. This interpolation removes aliasing artifacts by satisfying the Nyquist rate. Here, a DL model reconstructs unique suitable images by fitting the sparsely sampled data. Notably, our approach does not generalize the signal to indeterminable regions, unlike conventional band-limited signal extrapolation techniques [40] and low-rank modeling of local k-space neighborhoods [41].
The proposed method is described below in terms of the sensor domain network, IFFT operation, and image domain network.

Sensor Domain Network
The sensor domain network, S net , attempts to fully regain the k-space,K norm , from the undersampled k-space, K u . Mathematically, this is represented aŝ where K unorm represents the normalized undersampled k-space and is expressed by where µK utrain and σK utrain represent the mean and standard deviation (SD) of the given undersampled k-spaces, respectively. This network uses complex, normalized, two-channel (real and imaginary), and sparsely sampled raw MRI data as input and outputs the complex k-space.

Inverse Fast Fourier Transform
The reconstructed k-space of the sensor domain network is denormalized before IFFT. The normalization can be reversed bŷ After denormalization of the reconstructed k-space to yieldK, it is transformed into images using the IFFT (F −1 ) operation: whereÎ 0 indicates the initial reconstructed image. There are no trainable parameters for this section.

Image Domain Network
The image domain network I net takes the abovementioned initial reconstructed image as input, which is renormalized to increase the convergence speed of the network: where µI 0train and σI 0train represent the mean and SD of the elementary reconstructed image, respectively. This normalized image then traverses the network to generate the final output image, R: This network also uses a residual attention Unet architecture through concatenation with the undersampled input k-space.

Proposed Network Architecture
Our advanced RA-CNN was designed based on residual connectivity and modification of the AG [42]. Batch normalization [43] speeds up the training process compared with the baseline Unet. Unet possesses 23 layers, whereas the RA-CNN possesses 81 convolutional and deconvolutional layers. The gradient details must pass across many tiers and could dissipate before they reach subsequent layers, which causes the vanishing gradient problem. Residual connectivity lessens the likelihood of a vanishing gradient and simplifies the network train.
The sensor domain network structure shown in Figure 3 (left side) consists of two main subdivisions: the down-sampling and up-sampling sections. The down-sampling section consists of four consecutive convolutional blocks (CBs) used for extracting k-space features. Every CB contains two 3 × 3 convolutional layers with a rectified linear unit (ReLU) [44] activation function and padding = 1. The first CB is applied to the 256 × 256 normalized undersampled k-space with 48 kernels. The channel numbers then gradually increase by Diagnostics 2023, 13, 1306 6 of 18 64, 128, and 256. After each CB, with the exception of the last CB in the decoding section, a max-pooling operation is performed. At each step, this process doubles the feature number and halves the input dimension. Up-sampling involves 2 × 2 deconvolution (upscaling), an AG, and CBs. The up-sampling portion re-establishes the size of the features and preserves the symmetric form of the encoding portion. The loss of data generated by the decoding/encoding operation is reduced by this balanced form, which also allows for the reprocessing of features through their concatenating in the associated layer. The features of both layers flow through the AG before concatenation. The AG can determine the correlations between frequencies and create long dependencies to access important information. The last layer of this section executes a linear 1 × 1 convolutional operation using two filters. Finally, this network generates complex k-space by concatenating the last layer's output and normalizing the undersampled k-space through a residual connection.
Diagnostics 2023, 13, x FOR PEER REVIEW 6 of 18 (ReLU) [44] activation function and padding = 1. The first CB is applied to the 256 × 256 normalized undersampled k-space with 48 kernels. The channel numbers then gradually increase by 64, 128, and 256. After each CB, with the exception of the last CB in the decoding section, a max-pooling operation is performed. At each step, this process doubles the feature number and halves the input dimension. Up-sampling involves 2 × 2 deconvolution (upscaling), an AG, and CBs. The up-sampling portion re-establishes the size of the features and preserves the symmetric form of the encoding portion. The loss of data generated by the decoding/encoding operation is reduced by this balanced form, which also allows for the reprocessing of features through their concatenating in the associated layer. The features of both layers flow through the AG before concatenation. The AG can determine the correlations between frequencies and create long dependencies to access important information. The last layer of this section executes a linear 1 × 1 convolutional operation using two filters. Finally, this network generates complex k-space by concatenating the last layer's output and normalizing the undersampled k-space through a residual connection. The image domain network structure shown in Figure 3 (right side) also has two main sections: the down-sampling and up-sampling sections. In the down-sampling section, the first CB is applied to the 256 × 256 normalized initial reconstructed images with 48 filters. The filter numbers then gradually increase by 64, 128, and 256. After each CB, a max-pooling process is executed to extract more specific image features, with the exception of the last CB in the decoding section. The up-sampling step, in contrast, retains the symmetric shape of the encoding block and preserves the dimensions of the feature maps. At every skip connection, the features of upper and lower layers flow via the AG. The AG assembles the essential features of various types of spatial information. The last layer of this part performs a linear 1 × 1 convolutional operation with a single filter. The final image is reconstructed by concatenating the output of the last layer of the network and inputting the undersampled k-space.

Attention Gate
The attention mechanism [45] for medical image analysis automatically accrues new information by focusing on target structures of varying size and shape. Models with AGs intuitively discover the important hidden elements from an input image for a certain task. To increase model sensitivity and prediction accuracy, AGs may be readily attached to popular CNNs such as Unet, without any increase in computing complexity. In an encoder-decoder-based approach, various low-level feature extractions are carried out The image domain network structure shown in Figure 3 (right side) also has two main sections: the down-sampling and up-sampling sections. In the down-sampling section, the first CB is applied to the 256 × 256 normalized initial reconstructed images with 48 filters. The filter numbers then gradually increase by 64, 128, and 256. After each CB, a maxpooling process is executed to extract more specific image features, with the exception of the last CB in the decoding section. The up-sampling step, in contrast, retains the symmetric shape of the encoding block and preserves the dimensions of the feature maps. At every skip connection, the features of upper and lower layers flow via the AG. The AG assembles the essential features of various types of spatial information. The last layer of this part performs a linear 1 × 1 convolutional operation with a single filter. The final image is reconstructed by concatenating the output of the last layer of the network and inputting the undersampled k-space.

Attention Gate
The attention mechanism [45] for medical image analysis automatically accrues new information by focusing on target structures of varying size and shape. Models with AGs intuitively discover the important hidden elements from an input image for a certain task. To increase model sensitivity and prediction accuracy, AGs may be readily attached to popular CNNs such as Unet, without any increase in computing complexity. In an encoder-decoder-based approach, various low-level feature extractions are carried out during feature interpretation in the first few layers. The redundant features are reduced through active suppression using AGs at the skip connections. Two inputs, g and x, are required for every AG. The next bottom layer of the network provides the gating signal, g; since it comes from a more extensive area of the network, it accurately represents further useful features. The input feature, x, is the outcome of skipped connections that arise from the early phases and, notably, provides better spatial information.
As presented in Figure 4, input features x l i execute a 1 × 1 convolution operation with a stride of 2 × 2 to decrease the size (H × W) by half, whereas gating signals g l+1 i execute a 1 × 1 convolution operation with a stride of 1 × 1. Consequently, the gating signals and updated input features maintain the same spatial geometry. Using elementwise summation, the ReLU activates them before mapping by W T int into a lower-dimensional space for gating procedures. The vector in [0, 1] is leveled by the sigmoid function, with coefficients closer to 1 indicating more important traits. The dimension of the attention weighting matrix α l i is then restored to match the pixel intensity of the provided input features using a trilinear up-sampler. The attention weighting matrix α l i and input features x l i are multiplied elementwise to produce the outputx l i of the AG, which is then sent to the regular CBs. during feature interpretation in the first few layers. The redundant features are reduced through active suppression using AGs at the skip connections. Two inputs, g and x, are required for every AG. The next bottom layer of the network provides the gating signal, g; since it comes from a more extensive area of the network, it accurately represents further useful features. The input feature, x, is the outcome of skipped connections that arise from the early phases and, notably, provides better spatial information. As presented in Figure 4, input features x i l execute a 1 × 1 convolution operation with a stride of 2 × 2 to decrease the size (H × W) by half, whereas gating signals g i l+1 execute a 1 × 1 convolution operation with a stride of 1 × 1. Consequently, the gating signals and updated input features maintain the same spatial geometry. Using elementwise summation, the ReLU activates them before mapping by W int T into a lower-dimensional space for gating procedures. The vector in [0,1] is leveled by the sigmoid function, with coefficients closer to 1 indicating more important traits. The dimension of the attention weighting matrix α i l is then restored to match the pixel intensity of the provided input features using a trilinear up-sampler. The attention weighting matrix α i l and input features x i l are multiplied elementwise to produce the output x i l of the AG, which is then sent to the regular CBs.

Dataset and Undersampling Mask
In this simulation, we used publicly available Calgary-Campinas [46,47] T1weighted, fully sampled brain MRI k-space data. The dataset was acquired using a magnetic resonance scanner (Discovery MR750; General Electric Healthcare, Waukesha, WI, USA) via a collaboration between researchers at the University of Calgary's Vascular Imaging Lab (Calgary, AB, Canada) and the Medical Image Computing Lab at the University of Campinas (São Paulo, Brazil). A total of 45 k-spaces were used, where every k-space had 170 sagittal cross-sectional T1-weighted MRI sequence images (256 × 256). In total, 25 k-spaces (4250 slices) were used during training, 10 k-spaces (1700 slices) were used for validation, and another 10 k-spaces (1700 slices) were used only for testing. In the training, validation, and testing phases, there were no repeat object slices. During training, the zerofilled undersampled k-space was used for network input. The fully sampled k-space and corresponding fully sampled images were used as the network target outputs. To train and test each network, two-dimensional Gaussian undersampling sequences were employed. Simulation analyses were conducted using four-, five-, and six-fold acceleration factors (AFs).

Data Augmentation
DL models require large amounts of training data to avoid overfitting and underfitting issues. However, it can be expensive and time-consuming to gather well-annotated medical data. DA [48] is frequently used in DL to enhance the size and heterogeneity of the training set and thus significantly improve efficiency while decreasing generalization

Dataset and Undersampling Mask
In this simulation, we used publicly available Calgary-Campinas [46,47] T1-weighted, fully sampled brain MRI k-space data. The dataset was acquired using a magnetic resonance scanner (Discovery MR750; General Electric Healthcare, Waukesha, WI, USA) via a collaboration between researchers at the University of Calgary's Vascular Imaging Lab (Calgary, AB, Canada) and the Medical Image Computing Lab at the University of Campinas (São Paulo, Brazil). A total of 45 k-spaces were used, where every k-space had 170 sagittal cross-sectional T1-weighted MRI sequence images (256 × 256). In total, 25 k-spaces (4250 slices) were used during training, 10 k-spaces (1700 slices) were used for validation, and another 10 k-spaces (1700 slices) were used only for testing. In the training, validation, and testing phases, there were no repeat object slices. During training, the zero-filled undersampled k-space was used for network input. The fully sampled k-space and corresponding fully sampled images were used as the network target outputs. To train and test each network, two-dimensional Gaussian undersampling sequences were employed. Simulation analyses were conducted using four-, five-, and six-fold acceleration factors (AFs).

Data Augmentation
DL models require large amounts of training data to avoid overfitting and underfitting issues. However, it can be expensive and time-consuming to gather well-annotated medical data. DA [48] is frequently used in DL to enhance the size and heterogeneity of the training set and thus significantly improve efficiency while decreasing generalization errors in DL-based models. DA strategies are less popular for image reconstruction applications and are also considerably more challenging due to the extensive measurements (i.e., complex data) involved [49]. By providing fully sampled complex MRI scans, augmented training data can be generated that include the undersampled k-space along with reference images The visual effects of the augmented k-spaces and their corresponding images are shown in Figure 5. Notably, the field strengths of the MRI scanners used for acquisition may have differed. DA impacts the ability to generalize new MRI scanner models (i.e., those not available during training). On unknown scanner sequences, DA can enhance reconstruction quality. errors in DL-based models. DA strategies are less popular for image reconstruction applications and are also considerably more challenging due to the extensive measurements (i.e., complex data) involved [49]. By providing fully sampled complex MRI scans, augmented training data can be generated that include the undersampled k-space along with reference images and the associated k-space. The following augmentation processes were applied using the proposed RA-CNN method: The visual effects of the augmented k-spaces and their corresponding images are shown in Figure 5. Notably, the field strengths of the MRI scanners used for acquisition may have differed. DA impacts the ability to generalize new MRI scanner models (i.e., those not available during training). On unknown scanner sequences, DA can enhance reconstruction quality.

Evaluation Metrics
Network performance was evaluated using three parameters: the structural similarity index measure (SSIM) [50], normalized root mean square error (NRMSE), and peak signal-to-noise ratio (PSNR). The SSIM perceptual index measures the similarity between two images by comparing the reciprocal dependencies between neighboring pixels with respect to contrast, structural characteristics, and brightness. The SSIM between the desired image (T) and reconstructed image (R) is given by

Evaluation Metrics
Network performance was evaluated using three parameters: the structural similarity index measure (SSIM) [50], normalized root mean square error (NRMSE), and peak signalto-noise ratio (PSNR). The SSIM perceptual index measures the similarity between two images by comparing the reciprocal dependencies between neighboring pixels with respect to contrast, structural characteristics, and brightness. The SSIM between the desired image (T) and reconstructed image (R) is given by where µ T and µ R are the average values of T and R, respectively, σ 2 T and σ 2 R represent the respective pixel variance values, and σ TR is the covariance value. c 1 and c 2 are calculated to assist with data division: NRMSE compares pixel disparities between the network predictions and reference values, which is calculated as follows: where T i and R i represent the fully sampled and reconstructed images, respectively, and N represents the size of the images. PSNR measures the relationship between the signal's peak potential power and the noise that degrades fidelity: These metrics were chosen because they are frequently applied for image reconstruction assessment. Higher SSIM and PSNR values denote a better outcome. On the other hand, smaller NRMSE values indicate better images.

Loss Function
The loss function assesses the difference between the fully sampled and sub-sampled images. The main objective of the proposed method is to reduce the value of this loss function, in which a smaller difference between the fully sampled and undersampled images indicates efficient reconstruction. Reconstruction performance can be enhanced using an appropriate loss function that provides precise gradient data for network training. The weighted sum of the NRMSE in each domain was used as the loss function to compute frequency and pixel-wise disparities and is expressed as where K i andK i represent the target and reconstructed k-spaces in the sensor domain, respectively; T i and R i denote the target and reconstructed images in the spatial domain of the i-th instance in the training dataset, respectively; and M is the number of training instances. With our method, the initial weight (ω) was 0.001; this value was updated continuously to minimize loss.

Experimental Setup
A Windows 10 Pro 64-bit computer with an Intel core i7-9800X 3.80 GHz processor, 128 GB of RAM, and an NVIDIA GeForce RTX 2080 Ti graphics processing unit was used for model training and testing. The TensorFlow v2.4 and Keras v2.4 open-source libraries were utilized to implement the models in Python 3.8 and PyCharm environments. In the RA-CNN, the loss function was reduced by an Adam optimizer using the momentum of β1 = 0.9 and β2 = 0.999. The initial learning rate was 10 −3 but was subsequently reduced to 10 −7 . A small batch size (n = 16) was used, and 1500 epochs were employed for training the proposed RA-CNN. The proposed network's training and validation losses are illustrated in Figure 6; the measurements show that the normalizing effect of the residual connection reduces the probability of overfitting.
Diagnostics 2023, 13, x FOR PEER REVIEW 10 of in Figure 6; the measurements show that the normalizing effect of the residual connect reduces the probability of overfitting.

Results and Discussion
The effectiveness of our proposed RA-CNN was compared with direct mapping a single-and multi-domain networks. Unet [36], the de-aliasing generative adversarial n work (DAGAN) [13], RefineGAN [14], the projection-based cascade Unet (PBCU) [5 and the fully dense attention (FDA)-CNN [52] are single-domain techniques, while DC-CNN [34], KIKI-net [29], W-net [35], the hybrid cascade [30], and the dual-encod Unet [53] are multi-domain approaches. The GAN-based DAGAN applies a residual U architecture for the generator and combined adversarial and innovative content loss RefineGAN measures cyclic loss using a residual Wasserstein GAN [54]. PBCU uses f consecutive Unet blocks, and the FDA-CNN employs attention mechanisms with densely connected CNN. The DC-CNN utilizes multiple cascading CNNs for MRI ima enhancement. KIKI-net uses four Unet blocks, which sequentially operate in k-space, i age space, k-space, and image space. W-net uses two Unet blocks, and the hybrid casca uses six Unet blocks. The dual-encoder Unet applies decomposing automated transfo by manifold approximation (dAUTOMAP) [28] and two encoders within the Unet fram work. The implementation and hyperparameters were based on original research for ea approach. Simulations were conducted to test the performance of the proposed RA-CN when using the abovementioned state-of-the-art techniques; SSIM, NRMSE, and PSN values were evaluated through numerical analysis as performance metrics.
An AF of 4 was used to train the networks, which were evaluated using four-, fiv and six-fold AFs (AFs 4-6, respectively). Visual assessment of the tested k-spaces and construction images for AFs 4 and 5 are shown in Figures 7 and 8, respectively. In figures, the first row shows the fully sampled reference image (a), undersampled k-sp (b), and reconstructed undersampled image (c). The second row shows the images rec structed by the Unet (d), W-net (e), and RA-CNN (f) networks using the undersampled space. The undersampled image was generated using an IFT; the image exhibited unn ural inconsistent artifacts and blurred edges. The single-domain network Unet enhanc

Results and Discussion
The effectiveness of our proposed RA-CNN was compared with direct mapping and single-and multi-domain networks. Unet [36], the de-aliasing generative adversarial network (DAGAN) [13], RefineGAN [14], the projection-based cascade Unet (PBCU) [51], and the fully dense attention (FDA)-CNN [52] are single-domain techniques, while the DC-CNN [34], KIKI-net [29], W-net [35], the hybrid cascade [30], and the dual-encoder Unet [53] are multi-domain approaches. The GAN-based DAGAN applies a residual Unet architecture for the generator and combined adversarial and innovative content losses. RefineGAN measures cyclic loss using a residual Wasserstein GAN [54]. PBCU uses five consecutive Unet blocks, and the FDA-CNN employs attention mechanisms with a densely connected CNN. The DC-CNN utilizes multiple cascading CNNs for MRI image enhancement. KIKI-net uses four Unet blocks, which sequentially operate in k-space, image space, k-space, and image space. W-net uses two Unet blocks, and the hybrid cascade uses six Unet blocks. The dual-encoder Unet applies decomposing automated transform by manifold approximation (dAUTOMAP) [28] and two encoders within the Unet framework. The implementation and hyperparameters were based on original research for each approach. Simulations were conducted to test the performance of the proposed RA-CNN when using the abovementioned state-of-the-art techniques; SSIM, NRMSE, and PSNR values were evaluated through numerical analysis as performance metrics.
An AF of 4 was used to train the networks, which were evaluated using four-, five-, and six-fold AFs (AFs 4-6, respectively). Visual assessment of the tested k-spaces and reconstruction images for AFs 4 and 5 are shown in Figures 7 and 8, respectively. In the figures, the first row shows the fully sampled reference image (a), undersampled k-space (b), and reconstructed undersampled image (c). The second row shows the images reconstructed by the Unet (d), W-net (e), and RA-CNN (f) networks using the undersampled k-space. The undersampled image was generated using an IFT; the image exhibited unnatural inconsistent artifacts and blurred edges. The single-domain network Unet enhanced the qualitative values of the initial zero-filling image. The hybrid W-net performed better than Unet and improved the image quality and quantitative values. Moreover, the RA-CNN more accurately reconstructed the undersampled k-space as the target image than the single-and multi-domain networks. Specifically, the RA-CNN focused on the features essential for diagnosis during image reconstruction and performed better than the other techniques in terms of the elimination of artifacts. The RA-CNN produced superior quantitative values for a specific slice (No. 100) of the reconstructed image compared to the single-and multi-domain networks.
the qualitative values of the initial zero-filling image. The hybrid W-net performed better than Unet and improved the image quality and quantitative values. Moreover, the RA-CNN more accurately reconstructed the undersampled k-space as the target image than the single-and multi-domain networks. Specifically, the RA-CNN focused on the features essential for diagnosis during image reconstruction and performed better than the other techniques in terms of the elimination of artifacts. The RA-CNN produced superior quantitative values for a specific slice (No. 100) of the reconstructed image compared to the single-and multi-domain networks.  The mean and standard deviation SSIM, NMRSE, and PSNR values obtained using the various state-of-the-art methods for AFs 4 and 5 are shown in Table 1. Along with these quantitative results, clinical parameters such as edge sharpness, motion fidelity, artifacts, image distortion, and diagnostic score are essential for accurately diagnosing the reconstructed images. To assess statistically significant changes, we utilized one-way analysis of variance (ANOVA) and post hoc paired t-tests. Statistical importance was determined using a p-value < 0.01. The one-way ANOVA testing revealed differences (p < 0.01) with statistical significance among all measurements and acceleration parameters. Multi-domain networks generated more accurate quantitative results than single-imagedomain networks. These observations and numerical analyses showed that the proposed RA-CNN generated the most accurate SSIM and PSNR values, although that was not the case for the NRMSE values. The hybrid cascade approach yielded better NRMSE values The mean and standard deviation SSIM, NMRSE, and PSNR values obtained using the various state-of-the-art methods for AFs 4 and 5 are shown in Table 1. Along with these quantitative results, clinical parameters such as edge sharpness, motion fidelity, artifacts, image distortion, and diagnostic score are essential for accurately diagnosing the reconstructed images. To assess statistically significant changes, we utilized one-way analysis of variance (ANOVA) and post hoc paired t-tests. Statistical importance was determined using a p-value < 0.01. The one-way ANOVA testing revealed differences (p < 0.01) with statistical significance among all measurements and acceleration parameters. Multi-domain networks generated more accurate quantitative results than single-imagedomain networks. These observations and numerical analyses showed that the proposed RA-CNN generated the most accurate SSIM and PSNR values, although that was not the case for the NRMSE values. The hybrid cascade approach yielded better NRMSE values under both sampling rates, although the difference in performance from the RA-CNN was very small in this respect. The paired t-tests revealed that the RA-CNN outperformed the other approaches in these assessments.  Figure 9 depicts a fully sampled slice (No. 100) (a) and its corresponding undersampled k-space (b) for AF 6. The zero-filling image (c) contains severe noise and blur artifacts. Unet (d) showed a slight improvement in qualitative value, and the hybrid W-net (e) performed better than Unet. The output image of the RA-CNN was of higher quality than the images of other models, and the details were better restored. This appropriate reconstruction with artifact-free high temporal resolution is essential for a number of medical image post-processing activities such as classification and segmentation.
The mean and standard deviation SSIM, NMRSE, and PSNR values obtained using the various state-of-the-art methods for AF 6 are shown in Table 2, along with the number of parameters for each network. According to these data, the RA-CNN outperformed all of the other networks, followed by the FDA-CNN with the AG. The cascade CNN showed higher qualitative values for the measured metrics. Compared to the standard Unet and cascade CNN models, the proposed RA-CNN improved the mean PSNR value by 9.87 and 6.11 dB, respectively. The paired t-tests revealed that the RA-CNN outperformed the other approaches in this acceleration factor. Direct mapping dAUTOMAP computes only 0.16 M parameters. On the other hand, the FDA-CNN and dual-encoder Unet have almost 1 M trainable parameters. Attention mechanism-based models require fewer parameters than the more sophisticated Unet-based cascade networks; specifically, Unet, Wnet, and PBCU have approximately 3.13 million (M), 1.13 M, and 3.15 M parameters, respectively, whereas the RA-CNN has 0.68 M parameters. A compromise between scan time and image quality results in an MR exploration. Optimizing an MR exploration procedure and its sequence parameters will be necessary based on the organs and disease.  The mean and standard deviation SSIM, NMRSE, and PSNR values obtained using the various state-of-the-art methods for AF 6 are shown in Table 2, along with the number of parameters for each network. According to these data, the RA-CNN outperformed all of the other networks, followed by the FDA-CNN with the AG. The cascade CNN showed higher qualitative values for the measured metrics. Compared to the standard Unet and cascade CNN models, the proposed RA-CNN improved the mean PSNR value by 9.87 and 6.11 dB, respectively. The paired t-tests revealed that the RA-CNN outperformed the other approaches in this acceleration factor. Direct mapping dAUTOMAP computes only 0.16 M parameters. On the other hand, the FDA-CNN and dual-encoder Unet have almost 1 M trainable parameters. Attention mechanism-based models require fewer parameters than the more sophisticated Unet-based cascade networks; specifically, Unet, Wnet, and PBCU have approximately 3.13 million (M), 1.13 M, and 3.15 M parameters, respectively, whereas the RA-CNN has 0.68 M parameters. A compromise between scan time and im- The results demonstrate that the RA-CNN was superior in terms of generating aliasing artifact-and blur-free images. Our multi-domain approach uses two residual Unet architectures with AGs and maintains symmetric encoders and decoders on either side of the network. The first advantage of this framework is that it establishes long-range connections between the encoder and equivalent decoder parts, thus allowing for the merging of various pieces of hierarchical information from the encoder and decoder and thereby increasing the network's precision and scalability. Unet, Wnet, and PBCU have 23, 43, and 74 convolutional layers, respectively, whereas the RA-CNN has 81 convolution and deconvolution layers. The vanishing gradient issue in a neural network such as the RA-CNN poses a challenge with respect to the training results obtained during backpropagation; however, the shallow residual connection solves this issue. Model performance is enhanced by the residual units, which directly convey features from the early to end stage of convolution. As demonstrated in Figure 6, the regularization effect of residual connections decreases the risk of overfitting the training data. However, if the steps used for obtaining consistent cascade data are applied, computing time and costs increase. AGs merge lower and higher spatial data to identify meaningful features during a single computation. Consequently, the RA-CNN model requires fewer parameters than the single-and dual-domain networks. The single-domain models require 0.6-1.0 s (s) for each slice reconstruction. The cascade dual-domain methods require > 1 s per image reconstruction. The proposed RA-CNN generates better images within an average of 0.6 s. Therefore, computation time and cost are reduced. Notably, the results showed that the AG-based methods performed better than the other single and cascade networks at higher AFs. Even though we are currently just testing our method with brain data and different sampling rates, not with other types of MRI datasets involving areas such as the knee or abdomen, the results are still significant.

Conclusions
The proposed dual-domain RA-CNN reconstructs MRI images from sparsely sampled k-space data using two neural networks. The first CNN in the sensor domain predicts unacquired frequencies and then applies the second CNN in the image domain for image enhancement. Furthermore, each network has a unique impact on MRI reconstruction. Edge content and geometry are restored more effectively from undersampled k-space using this multi-domain CNN. As a CNN is used directly to retrieve the sensor data, some lower frequencies might be recoverable. Consequently, this method is capable of extracting realistic visual features and reconstructing images that are identical to real images. Since the visual characteristics are preserved, radiologists can interpret data accurately and rapidly. Residual connection significantly enhances feature reuse and network data flow. Moreover, AGs mix lower and higher spatial data to identify valuable features while using fewer parameters than the other sophisticated Unet-based methods. Although network training takes a long time, images can be generated rapidly after training.
We show that the aggregation of two domains has an impact on MRI reconstruction performance. In end-to-end reconstruction based on residual and attention mechanisms, the RA-CNN performed better than several alternative single-and multi-domain networks, as reflected in the PSNR and SSIM values under various sampling rates. In future research, we will apply our strategy for interactive temperature-based MRI reconstruction for real-time diagnostics and therapy.