# Dual-Domain Reconstruction Network Incorporating Multi-Level Wavelet Transform and Recurrent Convolution for Sparse View Computed Tomography Imaging

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

## 1. Introduction

- (1)
- A new CT reconstruction model is proposed, wherein multi-level wavelet transform and recurrent convolution unit construction are integrated. To accurately correct the errors in each frequency component of the interpolated sinograms and remove the artifacts in different directions in each frequency component of the CT images, multi-level wavelet transform is employed to decompose the sinograms and CT images into distinct frequency components, which are subsequently individually recovered through separate network branches.
- (2)
- To capture the global redundant texture information of sinograms and the global artifact features in CT images in distinct frequency components, a recurrent convolution unit (RCU) embedded with convolutional long and short-term memory (Conv-LSTM) is proposed. The RCU consists of a basic feature extraction module “3 × 3 convolution-batch normalization-ReLU activation function” (CBR) and a Conv-LSTM, where CBR is used to integrate the output features of the previous layer and adjusts the number of channels, while the Conv-LSTM weights the hidden and memory state features from the output of the previous RCU layer into the output of the current layer by combining forget gate, input gate, and output gate operations to model long-distance dependencies between the feature map in different layers, as well as the information flow of contextual textures in distinct frequency dimensions.
- (3)
- In the high-frequency component recovery network branch, an improved multi-level frequency feature normalization fusion (MFNF) block is designed to assist in the recovery of high-frequency components by aggregating low-frequency components through the self-attention-based normalization strategy. Further, the recovery of high-frequency feature information is enhanced by combining an adaptive channel soft thresholding function (ACSTF) to filter out noise and useless features in the channel dimension.
- (4)
- In the image domain loss function, an additional edge loss regularization term based on the Laplacian of Gaussian (LoG) is designed to improve the fidelity and authenticity of high-frequency edge details and mitigate the structural blurring caused by mean squared error (MSE) loss function.

## 2. Materials and Methods

#### 2.1. Principles of CT Reconstruction

_{0}penetrates the object, the final detected photon intensity I can be expressed as

#### 2.2. Network Structure

#### 2.2.1. Discrete Wavelet Transform (DWT)

_{LL}, f

_{LH}, f

_{HL}, f

_{HH}), which are defined as [17]

_{LL}

_{1}((f

_{LL}⊗x) ↓2), the horizontal high-frequency component x

_{LH}

_{1}((f

_{LH}⊗x) ↓2), the vertical high-frequency component x

_{HL}

_{1}((f

_{HL}⊗x) ↓2), and the diagonal high-frequency component x

_{HH}

_{1}((f

_{HH}⊗x) ↓2). The further two-level wavelet transform continues by applying four filters to convolve with the low-frequency component x

_{LL}

_{1}and then downsample the result to obtain the corresponding wavelet sub-band components. In this work, a two-level wavelet transform is used for the decomposition of the images, and the differentiable DWT and IDWT modules in PyTorch are implemented through the “pytorch_wavelets” library [36]. As illustrated in Figure 1, the inputs to the three branches of the network are the two-level low-frequency component image x

_{LL}

_{2}(LV2-freq

_{L}), two-level high-frequency component images {x

_{LH}

_{2}, x

_{HL}

_{2}, x

_{HH}

_{2}} (LV2-freq

_{H}), and one-level high-frequency component images {x

_{LH}

_{1}, x

_{HL}

_{1}, x

_{HH}

_{1}} (LV1-freq

_{H}), respectively.

#### 2.2.2. Recurrent Convolution Unit (RCU)

_{i}and n

_{o}in Figure 2a are multiples of the number of input and output channels, respectively, and the number of output channels for RCU are adjusted by a 3 × 3 convolution (Conv). The specific values of n

_{i}and n

_{o}for the RCU in different layers depend on multiples of the number of input and output channels (C) for RCU shown in Figure 1 and Figure 2b. To reduce the number of parameters in the Conv-LSTM, we replace the original 3 × 3Conv by combining 1 × 1Conv and 3 × 3 Depthwise separable convolution (DConv). Assuming that the output feature after CBR operation is x

_{t}

_{−1}and the hidden and memory states output by the RCU in the previous layer are h

_{t−}

_{1}and c

_{t}

_{−1}, we first concatenate h

_{t}

_{−1}with x

_{t}

_{−1}in the channel dimension to obtain (h

_{t}

_{−1}, x

_{t}

_{−1}). Subsequently, the forget gate f

_{t}, input gate i

_{t}, and output gate o

_{t}operation processes in Conv-LSTM can be expressed as

_{t}is the output features (x

_{t}) and hidden states of the RCU at the current moment. The final weighted memory states, c

_{t}, from both the previous and current moments exert a proportional influence on the current moment’s output through the control of o

_{t}. As the output of o

_{t}approaches 1, more information about past features is preserved in the output of RCU. This mechanism allows for the modeling of long-distance dependencies between feature maps in different layers through the RCU. Specifically, given that the initial input sinogram or CT image of the network is a single-frame image rather than continuous time-series images, we use only one Conv-LSTM cell in the RCU, corresponding to a time step size of 1. Meanwhile, this operation prevents the excessive increase in the number of model parameters and running time resulting from stacking too many Conv-LSTM cells in a single RCU. Further, the structure of RCU-Att-Resblock is shown in Figure 2b, including four RCU blocks and a self-attention block [38], and their outputs are connected by residual learning to enhance the extraction capability of this branch for low-frequency features.

#### 2.2.3. Multi-Level Frequency Feature Normalization Fusion (MFNF) Block

_{i}and n

_{o}in Figure 3a are the multiples of the number of input and output channels, respectively. The specific value of n

_{i}depends on the multiples of the number of input channels (C) for the low-frequency features in the MFNF block shown in Figure 1, and n

_{o}depends on the multiples of the number of output channels (C) in the MFNF block. The n in Figure 3b,c is the multiple of the number of channels for the low-frequency feature (n

_{i}in Figure 3a) or the residual fusion features of low-frequency features and high-frequency components (n

_{o}in Figure 3a).

_{H}is the output of the SNFM, x

_{L}is the normalized low-frequency feature of the input, and µ

_{L}, σ

_{L}are the mean and standard deviation of x

_{L}along the channel dimension. γ

_{H}, β

_{H}are the normalized modulation parameters corresponding to the output of the two 3 × 3Conv.

#### 2.2.4. Loss Function

_{ref}denotes the full view reference sinograms, respectively. In the IDNet, the recovery of CT images is constrained using the MSE loss function

_{ref}denote the CT images reconstructed by the IDNet and the reference CT images, respectively. Further, to alleviate the problem that the MSE loss function tends to cause over-smoothing of the reconstructed structure, based on the Laplacian of Gaussian (LoG) [43] and MAE loss, an additional edge loss is designed as a regularization term to improve the fidelity and authenticity of high-frequency component details, which is defined as

_{ref}denote the reconstructed image and the reference image, respectively. ⊗ is the convolution operation, and ${\nabla}^{2}{G}_{\sigma}$ represents the second order derivative of the Gaussian kernel. The second order derivative of a 2D Gaussian distribution function with mean 0 and standard deviation σ can be expressed as the following equation

## 3. Results and Analysis

#### 3.1. Dataset and Simulation Setup

_{0}) is the Gaussian noise with mean m = 0 and variance var = 0.05, and P(s) is the Poisson noise with average photon count I

_{0}= 5 × 10

^{6}This noise simulation aims to replicate the noise inherent in the sensor-generated data during the process of acquiring projection data.

_{1}and β

_{2}set to 0.9 and 0.999, respectively. The initial learning rate for the model training was set to 0.001, which was gradually reduced to 0.0005 using a multi-step decay strategy. The training process encompassed 25 epochs, and a mini-batch size of 1 was chosen. All experiments were conducted on a server with a 24G NVIDIA RTX 3090 GPU, an Intel(R) Xeon(R) Gold 5218 CPU @ 2.10 GHz, and 256 GB of RAM.

#### 3.2. Qualitative Evaluation

_{ref}denote the reconstructed image and the reference image, respectively. The rRMSE value is smaller to indicate a smaller error between the reconstructed CT image and the reference image. Figure 7 illustrates the absolute difference images of the reconstructed images obtained using different methods relative to the reference CT images in Figure 4(i)–(iii), Figure 5(i)–(iii), and Figure 6(i)–(iii), respectively. With less views, the absolute difference images obtained from each method are worse. But our method generates the smallest difference compared to other methods, so that the reconstructed result is closer to the original image. In terms of both 128 views and 64 views, our method achieved the lowest rRMSE value on different human tissues.

#### 3.3. Quantitative Evaluation

_{ref}represent the reconstructed image and the reference image, respectively, and μ and σ are the mean and standard variance of the image, respectively. c

_{1}= (0.01 × R)

^{2}, c

_{2}= (0.03 × R)

^{2}are two constant terms to prevent the denominator from equaling 0, and R is the range of pixel values of the image. The SSIM is closer to 1, meaning that the reconstructed image is of better quality. The RMSE directly reflects the pixel distance error between the reconstructed image and the reference image

_{ref}) is the maximum value between the image and the reference image. A higher PSNR means that the image has less noise and is of higher quality.

#### 3.4. Robustness of Noise

_{0}= 1 × 10

^{6}, 5 × 10

^{5}, and 1 × 10

^{5}and Gaussian noise with mean m = 0 and variance var = 0.05 are added to the sinograms, respectively. Subsequently, without retraining the model, the previously trained models are directly used to reconstruct the sinograms with varying intensity levels of mixed noise. Figure 9 illustrates the reconstruction results from various methods for different intensity levels of mixed noise with 64 projection views, including reconstructed CT images and enlarged ROIs.

^{5}, the area of structural blur increases, while both DDPTransformer and our method recover clearer structures. The enlarged ROI results in Figure 9(iv)–(vi) show that DuDoTrans produces many tiny spurious structures with Photon noise level of 1 × 10

^{5}. MIST recovers better results for Poisson noise levels of 1 × 10

^{6}and 5 × 10

^{5}, but its recovery performance decreases rapidly when the Poisson noise level increases to 1 × 10

^{5}. In contrast, with our method, DDPTransformer and RegFormer still recover better results. Moreover, our method maintains the best qualitative results in all cases. Figure 10 shows the absolute difference images and rRMSE value of the reconstruction results obtained using different methods. Further insights from the results in Figure 10(iii) highlight that the reconstructed images obtained based on our method lose less structural information and have the best rRMSE values, even in the case of a Poisson noise level of 1 × 10

^{5}.

^{5}is added. Although DDPTransformer and RegFormer exhibit lower reconstruction performance with Poisson noise levels of 1 × 10

^{6}and 5× 10

^{5}, they maintain more stable reconstruction results compared to other DL-based methods, resulting in better quantitative evaluation metrics than MIST when reconstructing at a Poisson noise level of 1 × 10

^{5}. Notably, our method consistently achieves the best quantitative results in all noise experiments with different intensity levels.

#### 3.5. Ablation Study

#### 3.6. Computational Cost

## 4. Discussion

## 5. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Brenner, D.J.; Hall, E.J. Computed tomography—An increasing source of radiation exposure. N. Engl. J. Med.
**2007**, 357, 2277–2284. [Google Scholar] [CrossRef] [PubMed] - Wang, G.; Yu, H.; De Man, B. An outlook on X-ray CT research and development. Med. Phys.
**2008**, 35, 1051–1064. [Google Scholar] [CrossRef] - Bevelacqua, J.J. Practical and Effective ALARA. Health Phys.
**2010**, 98, S39–S47. [Google Scholar] [CrossRef] [PubMed] - Abdoli, M.; Dierckx, R.A.; Zaidi, H. Metal artifact reduction strategies for improved attenuation correction in hybrid PET/CT imaging. Med. Phys.
**2012**, 39, 3343–3360. [Google Scholar] [CrossRef] [PubMed] - Pan, X.; Sidky, E.Y.; Vannier, M. Why do commercial CT scanners still employ traditional, filtered back-projection for image reconstruction? Inverse Probl.
**2009**, 25, 123009. [Google Scholar] [CrossRef] [PubMed] - Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory
**2006**, 52, 1289–1306. [Google Scholar] [CrossRef] - Elad, M.; Aharon, M. Image Denoising via Sparse and Redundant Representations over Learned Dictionaries. IEEE Trans. Image Process.
**2006**, 15, 3736–3745. [Google Scholar] [CrossRef] [PubMed] - Rudin, L.I.; Osher, S. Total variation based image restoration with free local constraints. In Proceedings of the 1st IEEE International Conference on Image Processing (ICIP 1994), Austin, TX, USA, 13–16 November 1994; Volume 1, pp. 31–35. [Google Scholar]
- Sidky, E.Y.; Pan, X. Image reconstruction in circular cone-beam computed tomography by constrained, total-variation minimization. Phys. Med. Biol.
**2008**, 53, 4777–4807. [Google Scholar] [CrossRef] - Liu, Y.; Ma, J.; Fan, Y.; Liang, Z. Adaptive-weighted total variation minimization for sparse data toward low-dose X-ray computed tomography image reconstruction. Phys. Med. Biol.
**2012**, 57, 7923–7956. [Google Scholar] [CrossRef] - Zhou, X.; Li, C.; Rahaman, M.M.; Yao, Y.; Ai, S.; Sun, C.; Wang, Q.; Zhang, Y.; Li, M.; Li, X.; et al. A Comprehensive Review for Breast Histopathology Image Analysis Using Classical and Deep Neural Networks. IEEE Access
**2020**, 8, 90931–90956. [Google Scholar] [CrossRef] - Boveiri, H.R.; Khayami, R.; Javidan, R.; Mehdizadeh, A. Medical image registration using deep neural networks: A comprehensive review. Comput. Electr. Eng.
**2020**, 87, 106767. [Google Scholar] [CrossRef] - Akcay, S.; Breckon, T. Towards automatic threat detection: A survey of advances of deep learning within X-ray security imaging. Pattern Recognit.
**2022**, 122, 108245. [Google Scholar] [CrossRef] - Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep Convolutional Neural Network for Inverse Problems in Imaging. IEEE Image Process.
**2017**, 26, 4509–4522. [Google Scholar] [CrossRef] [PubMed] - Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Honolulu, HI, USA, 21–26 July 2017; Volume 1, pp. 2261–2269. [Google Scholar]
- Zhang, Z.; Liang, X.; Dong, X.; Xie, Y.; Cao, G. A Sparse-View CT Reconstruction Method Based on Combination of DenseNet and Deconvolution. IEEE Trans. Med. Imaging
**2018**, 37, 1407–1417. [Google Scholar] [CrossRef] [PubMed] - Stanković, R.S.; Falkowski, B.J. The Haar wavelet transform: Its status and achievements. Comput. Electr. Eng.
**2003**, 29, 25–44. [Google Scholar] [CrossRef] - Lee, M.; Kim, H.; Kim, H.J. Sparse-view CT reconstruction based on multi-level wavelet convolution neural network. Phys. Med.
**2020**, 80, 352–362. [Google Scholar] [CrossRef] - Sun, C.; Liu, Y.; Yang, H. Degradation-Aware Deep Learning Framework for Sparse-View CT Reconstruction. Tomography
**2021**, 7, 932–949. [Google Scholar] [CrossRef] - Lee, H.; Lee, J.; Kim, H.; Cho, B.; Cho, S. Deep-Neural-Network-Based Sinogram Synthesis for Sparse-View CT Image Reconstruction. IEEE Trans. Radiat. Plasma Med. Sci.
**2019**, 3, 109–119. [Google Scholar] [CrossRef] - Li, Z.; Cai, A.; Wang, L.; Zhang, W.; Tang, C.; Li, L.; Liang, N.; Yan, B. Promising Generative Adversarial Network Based Sinogram Inpainting Method for Ultra-Limited-Angle Computed Tomography Imaging. Sensors
**2019**, 19, 3941. [Google Scholar] [CrossRef] - Zhu, B.; Liu, J.Z.; Cauley, S.F.; Rosen, B.R.; Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature
**2018**, 555, 487–492. [Google Scholar] [CrossRef] - Ma, G.; Zhu, Y.; Zhao, X. Learning Image from Projection: A Full-Automatic Reconstruction (FAR) Net for Computed Tomography. IEEE Access
**2020**, 8, 219400–219414. [Google Scholar] [CrossRef] - Chen, H.; Zhang, Y.; Chen, Y.; Zhang, J.; Zhang, W.; Sun, H.; Lv, Y.; Liao, P.; Zhou, J.; Wang, G. LEARN: Learned Experts′ Assessment-Based Reconstruction Network for Sparse-Data CT. IEEE Trans. Med. Imaging
**2018**, 37, 1333–1347. [Google Scholar] [CrossRef] [PubMed] - Wang, J.; Zeng, L.; Wang, C.; Guo, Y. ADMM-based deep reconstruction for limited-angle CT. Phys. Med. Biol.
**2019**, 64, 115011. [Google Scholar] [CrossRef] - Zhou, B.; Chen, X.; Zhou, S.K.; Duncan, J.S.; Liu, C. DuDoDR-Net: Dual-domain data consistent recurrent network for simultaneous sparse view and metal artifact reduction in computed tomography. Med. Image Anal.
**2022**, 75, 102289. [Google Scholar] [CrossRef] [PubMed] - Hu, D.; Liu, J.; Lv, T.; Zhao, Q.; Zhang, Y.; Quan, G.; Feng, J.; Chen, Y.; Luo, L. Hybrid-Domain Neural Network Processing for Sparse-View CT Reconstruction. IEEE Trans. Radiat. Plasma Med. Sci.
**2021**, 5, 88–98. [Google Scholar] [CrossRef] - Bai, J.; Liu, Y.; Yang, H. Sparse-View CT Reconstruction Based on a Hybrid Domain Model with Multi-Level Wavelet Transform. Sensors
**2022**, 22, 3228. [Google Scholar] [CrossRef] [PubMed] - Han, K.; Wang, Y.; Chen, H.; Chen, X.; Guo, J.; Liu, Z.; Tang, Y.; Xiao, A.; Xu, C.; Xu, Y.; et al. A Survey on Vision Transformer. IEEE Trans. Pattern Anal. Mach. Intell.
**2023**, 45, 87–110. [Google Scholar] [CrossRef] - Wang, C.; Shang, K.; Zhang, H.; Li, Q.; Zhou, S.K. DuDoTrans: Dual-Domain Transformer for Sparse-View CT Reconstruction. In Proceedings of the Machine Learning for Medical Image Reconstruction (MLMIR 2022), Singapore, 22 September 2022; Volume 13587, pp. 84–94. [Google Scholar]
- Li, R.; Li, Q.; Wang, H.; Li, S.; Zhao, J.; Yan, Q.; Wang, L. DDPTransformer: Dual-Domain with Parallel Transformer Network for Sparse View CT Image Reconstruction. IEEE Trans. Comput. Imaging
**2022**, 8, 1101–1116. [Google Scholar] [CrossRef] - Pan, J.; Zhang, H.; Wu, W.; Gao, Z.; Wu, W. Multi-domain integrative Swin transformer network for sparse-view tomographic reconstruction. Patterns
**2022**, 3, 100498. [Google Scholar] [CrossRef] - Xia, W.; Yang, Z.; Zhou, Q.; Lu, Z.; Wang, Z.; Zhang, Y. A Transformer-Based Iterative Reconstruction Model for Sparse-View CT Reconstruction. In Proceedings of the Medical Image Computing and Computer Assisted Intervention (MICCAI 2022), Singapore, 18-22 September 2022; Volume 13436, pp. 790–800. Available online: https://conferences.miccai.org/2022/papers/021-Paper0893.html (accessed on 10 December 2023).
- Yu, Y.; Zhan, F.; Lu, S.; Ma, F.; Xie, X.; Miao, C. WaveFill: A Wavelet-based Generation Network for Image Inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; Volume 1, pp. 14094–14103. [Google Scholar]
- Ma, J.; Liang, Z.; Fan, Y.; Liu, Y.; Huang, J.; Chen, W.; Lu, H. Variance analysis of X-ray CT sinograms in the presence of electronic noise background. Med. Phys.
**2012**, 39, 4051–4065. [Google Scholar] [CrossRef] - Cotter, F. 2D Wavelet Transforms in Pytorch. Software. Available online: https://github.com/fbcotter/pytorch_wavelets (accessed on 30 July 2022).
- Shi, X.; Chen, Z.; Wang, H.; Yeung, D.Y. Convolutional LSTM Network: A machine learning approach for precipitation nowcasting. In Proceedings of the 28th International Conference on Neural Information Processing Systems (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015; Volume 1, pp. 802–810. [Google Scholar]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Shahbaz, F.; Khan, S.; Yang, M. Restormer: Efficient Transformer for High-Resolution Image Restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), New Orleans, LA, USA, 18–24 June 2022; Volume 1, pp. 5728–5739. [Google Scholar]
- Zhao, M.; Zhong, S.; Fu, X.; Tang, B.; Pecht, M. Deep Residual Shrinkage Networks for Fault Diagnosis. IEEE Trans. Ind. Inform.
**2020**, 16, 4681–4690. [Google Scholar] [CrossRef] - Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR 2021), Vienna, Austria, 4 May 2021. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV 2021), Montreal, QC, Canada, 10–17 October 2021; Volume 1, pp. 10012–10022. [Google Scholar]
- Li, B.; Wu, F.; Weinberger, K.Q.; Belongie, S. Positional Normalization. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS 2019), Vancouver, BC, Canada, 9–14 December 2019; Volume 1, pp. 1622–1634. [Google Scholar]
- Marr, D.; Hildreth, E. Theory of edge detection. Proc. R. Soc. Lond. Ser. B Biol. Sci.
**1980**, 207, 187–217. [Google Scholar] [CrossRef] - Kingma, D.P.; Ba, L.J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- McCollough, C.; Chen, B.; Holmes, D.R., III; Duan, X.; Yu, Z.; Yu, L.; Leng, S.; Fletcher, J. Low Dose CT Image and Projection Data (LDCT-and-Projection-Data) (Version 6) [Dataset]. The Cancer Imaging Archive. Available online: https://www.cancerimagingarchive.net/collection/ldct-and-projection-data (accessed on 3 April 2023).
- Xia, W.; Lu, Z.; Huang, Y.; Shi, Z.; Liu, Y.; Chen, H.; Chen, Y.; Zhou, J.; Zhang, Y. MAGIC: Manifold and Graph Integrative Convolutional Network for Low-Dose CT Reconstruction. IEEE Trans. Med. Imaging
**2021**, 40, 3459–3472. [Google Scholar] [CrossRef] [PubMed] - Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process.
**2004**, 13, 600–612. [Google Scholar] [CrossRef] - Asamoah, F. Discrete Wavelet Analysis of Two-Dimensional Signals. Int. J. Electr.
**2002**, 39, 162–174. [Google Scholar] [CrossRef] - Cai, G.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt. Commun.
**2015**, 341, 199–209. [Google Scholar] [CrossRef] - Ma, N.; Zhang, X.; Zheng, H.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the 15th European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8–14 September 2018; Volume 11218, pp. 122–138. [Google Scholar]
- Mostafavi, S.M. COVID19-CT-Dataset: An Open-Access Chest CT Image Repository of 1000+ Patients with Confirmed COVID-19 Diagnosis (Version 1) [Dataset]. Harvard Dataverse. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/6ACUZJ (accessed on 10 December 2023).

**Figure 1.**Architecture of the proposed dual-domain reconstruction network incorporating multi-level wavelet transform and recurrent convolution.

**Figure 3.**(

**a**) Architecture of the MFNF block; (

**b**) architecture of the SNFM; (

**c**) architecture of the ACSTF.

**Figure 4.**The SVCT reconstruction results from different methods with 128 projection views. (

**a**) The reference image; (

**b**) FBP; (

**c**) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) are the chest CT, abdominal CT, and head CT, respectively, and (

**iv**)–(

**vi**) correspond to the enlarged ROIs marked by the red boxes in the reference images above. The display window is [−160, 240] HU.

**Figure 5.**The SVCT reconstruction results from different methods with 64 projection views. (

**a**) The reference image; (

**b**) FBP; (

**c**) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) are the chest CT, abdominal CT, and head CT, respectively, and (

**iv**)–(

**vi**) correspond to the enlarged ROIs marked by the red boxes in the reference images above. The display window is [−160, 240] HU.

**Figure 6.**The SVCT reconstruction results from different methods with 32 projection views. (

**a**) The reference image; (

**b**) FBP; (c) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) are the chest CT, abdominal CT, and head CT, respectively, and (

**iv**)–(

**vi**) correspond to the enlarged ROIs marked by the red boxes in the reference images above. The display window is [−160, 240] HU.

**Figure 7.**Absolute difference images relative to the reference images with different projection views. (

**a**) The reference image; (

**b**) FBP; (

**c**) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) correspond to the results with 128 projection views, (

**iv**)–(

**vi**) correspond to the results with 64 projection views, and (

**vii**)–(

**ix**) correspond to the results with 32 projection views. The display window for reference images is [−1000, 800] HU, and the display window for absolute difference images is [−180, 180] HU.

**Figure 8.**Intensity profile results for different methods corresponding to the red vertical line with different projection views. (

**a**) 128 projection views; (

**b**) 64 projection views; (

**c**) 32 projection views.

**Figure 9.**The SVCT reconstruction results from different methods for different intensity levels of mixed noise with 64 projection views. (

**a**) The reference image; (

**b**) FBP; (

**c**) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) are the reconstructed CT images with Poisson noise levels of 1 × 10

^{6}, 5 × 10

^{5}, and 1 × 10

^{5}, respectively, and (

**iv**)–(

**vi**) correspond to the enlarged ROIs marked by the red boxes in the reference images above. The display window is [−160, 240] HU.

**Figure 10.**Absolute difference images relative to the reference image with different noise levels. (

**a**) The reference image; (

**b**) FBP; (

**c**) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) correspond to the results with the Poisson noise levels of 1 × 10

^{6}, 5 × 10

^{5}, and 1 × 10

^{5}, respectively. The display window is [−180, 180] HU.

**Figure 11.**Qualitative comparative results of different ablation models. (

**a**) The reference image; (

**b**) baseline; (

**c**) Model 2; (

**d**) Model 3; (

**e**) Model 4; (

**f**) Model 5; (

**g**) Model 6; (

**h**) Model 7; (

**i**) Model 8; (

**j**) Ours. (

**i**)–(

**iii**) are reconstructed CT images, enlarged ROIs, and absolute difference images, respectively. The display window for the reconstructed CT images and enlarged ROIs is [−160, 240] HU; the display window for absolute difference images is [−180, 180] HU.

**Figure 12.**Ablation experiments for edge loss with different weight parameters. (

**a**) Line graph of PSNR value; (

**b**) line graph of SSIM value; (

**c**) line graph of RMSE value; (

**d**) line graph of AG value; (

**e**) line graph of SF value.

**Figure 13.**Qualitative comparative results of different models on the COVID-19-CT dataset with 64 projection views. (

**a**) The reference image; (

**b**) FBP; (

**c**) MWNet; (

**d**) DuDoTrans; (

**e**) DDPTransformer; (

**f**) MIST; (

**g**) RegFormer; (

**h**) Ours. (

**i**)–(

**iii**) are reconstructed CT images, enlarged ROIs, and absolute difference images, respectively. The display window for the reconstructed CT images and enlarged ROIs is [−1200, 300] HU; the display window for absolute difference images is [−180, 180] HU.

**Table 1.**Quantitative evaluation results of different methods. The best results are marked in bold, and the second-best results are underlined.

Method | 128 Projection Views | 64 Projection Views | 32 Projection Views | ||||||
---|---|---|---|---|---|---|---|---|---|

PSNR | SSIM | RMSE | PSNR | SSIM | RMSE | PSNR | SSIM | RMSE | |

FBP | 17.0616 ± 1.6025 | 0.4801 ± 0.0486 | 0.1424 ± 0.0223 | 14.0979 ± 1.2326 | 0.3835 ± 0.0480 | 0.1991 ± 0.0253 | 11.8461 ± 1.1073 | 0.3189 ± 0.0473 | 0.2577 ± 0.0031 |

MWNet | 35.3025 ± 2.0618 | 0.8671 ± 0.0542 | 0.0177 ± 0.0041 | 32.3812 ± 1.7980 | 0.8195 ± 0.0480 | 0.0246 ± 0.0060 | 28.6675 ± 2.1581 | 0.7816 ± 0.0563 | 0.0379 ± 0.0085 |

DuDoTrans | 37.5434 ± 2.9086 | 0.8968 ± 0.0407 | 0.0138 ± 0.0034 | 33.5129 ± 2.1423 | 0.8489 ± 0.0443 | 0.0216 ± 0.0042 | 29.3935 ± 1.9850 | 0.7978 ± 0.0493 | 0.0347 ± 0.0071 |

DDPTrans | 37.9856 ± 1.2154 | 0.9034 ± 0.0081 | 0.0138 ± 0.0037 | 33.3891 ± 2.1731 | 0.8443 ± 0.0511 | 0.0216 ± 0.0047 | 30.6977 ± 1.8773 | 0.8175 ± 0.0607 | 0.0298 ± 0.0056 |

MIST | 36.7952 ± 1.9551 | 0.9059 ± 0.0374 | 0.0148 ± 0.0029 | 34.6944 ± 2.0515 | 0.8768 ± 0.0482 | 0.0189 ± 0.0040 | 30.9089 ± 1.5734 | 0.8286 ± 0.0489 | 0.0289 ± 0.0050 |

RegFormer | 37.3118 ± 2.1226 | 0.8897 ± 0.0418 | 0.0141 ± 0.0031 | 33.5659 ± 1.6660 | 0.8471 ± 0.0463 | 0.0214 ± 0.0038 | 30.7162 ± 1.6209 | 0.8059 ± 0.0535 | 0.0296 ± 0.0050 |

Ours | 39.7360 ± 3.1812 | 0.9279 ± 0.0356 | 0.0108 ± 0.0034 | 36.4065 ± 2.5655 | 0.8922 ± 0.0468 | 0.0157 ± 0.0040 | 31.9482 ± 1.8559 | 0.8438 ± 0.0514 | 0.0258 ± 0.0052 |

**Table 2.**Quantitative evaluation results of different methods with different intensity noise levels. The best results are marked in bold, and the second-best results are underlined.

Method | Noise-L1 (1 × 10^{6}) | Noise-L2 (5 × 10^{5}) | Noise-L3 (1 × 10^{5}) | ||||||
---|---|---|---|---|---|---|---|---|---|

PSNR | SSIM | RMSE | PSNR | SSIM | RMSE | PSNR | SSIM | RMSE | |

FBP | 14.0077 ± 1.1170 | 0.3427 ± 0.0348 | 0.2009 ± 0.0239 | 13.9236 ± 1.0353 | 0.3082 ± 0.0281 | 0.2027 ± 0.0229 | 13.2971 ± 0.7244 | 0.2017 ± 0.0228 | 0.2171 ± 0.0180 |

MWNet | 32.1469 ± 1.3300 | 0.8141 ± 0.0491 | 0.0250 ± 0.0038 | 31.6998 ± 1.1159 | 0.7972 ± 0.0469 | 0.0262 ± 0.0033 | 28.4697 ± 0.8459 | 0.6143 ± 0.0371 | 0.0379 ± 0.0040 |

DuDoTrans | 33.3070 ± 2.9086 | 0.8410 ± 0.0428 | 0.0221 ± 0.0042 | 33.0065 ± 1.8279 | 0.8275 ± 0.0401 | 0.0228 ± 0.0041 | 30.6072 ± 1.2689 | 0.6887 ± 0.0377 | 0.0298 ± 0.0036 |

DDPTrans | 33.3621 ± 2.1675 | 0.8441 ± 0.0596 | 0.0220 ± 0.0047 | 33.3207 ± 2.1778 | 0.8437 ± 0.0584 | 0.0221 ± 0.0047 | 32.8704 ± 1.8773 | 0.8405 ± 0.0590 | 0.0232 ± 0.0056 |

MIST | 34.4409 ± 1.9623 | 0.8699 ± 0.0473 | 0.0194 ± 0.0039 | 34.0791 ± 2.0515 | 0.8575 ± 0.0448 | 0.0202 ± 0.0038 | 32.1646 ± 1.3665 | 0.7811 ± 0.0366 | 0.0250 ± 0.0035 |

RegFormer | 33.4319 ± 1.8588 | 0.8465 ± 0.0463 | 0.0218 ± 0.0044 | 33.3905 ± 1.8442 | 0.8451 ± 0.0463 | 0.0219 ± 0.0044 | 33.0737 ± 1.6209 | 0.8339 ± 0.0535 | 0.0227 ± 0.0050 |

Ours | 36.2235 ± 2.4382 | 0.8915 ± 0.0356 | 0.0159 ± 0.0039 | 35.7969 ± 2.5655 | 0.8822 ± 0.0436 | 0.0167 ± 0.0038 | 34.3397 ± 2.0856 | 0.8436 ± 0.0359 | 0.0197 ± 0.0041 |

**Table 3.**Quantitative evaluation results for different ablation models. The best results are marked in bold, and the second-best results are underlined.

Baseline | Model 2 | Model 3 | Model 4 | Model 5 | Model 6 | Model 7 | Model 8 | Ours | ||
---|---|---|---|---|---|---|---|---|---|---|

Network Structure | Conv-LSTM | CBR | CBR | √ | 3 × 3Conv-based | √ | √ | √ | √ | √ |

ACSTF | × | √ | × | √ | √ | √ | √ | √ | √ | |

SNFM | √ | √ | √ | √ | SwinT | × | × | √ | √ | |

Summation fusion | × | × | × | × | × | √ | × | × | × | |

Concatenation fusion | × | × | × | × | × | × | √ | × | × | |

Haar wavelet transform | √ | √ | √ | √ | √ | √ | √ | Daubechies wavelet transform | √ | |

PSNR | 29.8588 ± 1.2407 | 31.0563 ± 1.5195 | 35.9344 ± 2.1782 | 36.2666 ± 2.3966 | 35.4864 ± 2.1166 | 36.1024 ± 2.2599 | 33.2770 ± 1.9848 | 36.2853 ± 2.5442 | 36.4065 ± 2.5655 | |

SSIM | 0.7239 ± 0.0437 | 0.7429 ± 0.0307 | 0.8733 ± 0.0467 | 0.8883 ± 0.0467 | 0.8727 ± 0.0495 | 0.8864 ± 0.0430 | 0.8586 ± 0.262 | 0.8918 ± 0.0472 | 0.8922 ± 0.0468 | |

RMSE | 0.0325 ± 0.0045 | 0.0285 ± 0.0057 | 0.0164 ± 0.0037 | 0.0159 ± 0.0038 | 0.0173 ± 0.0038 | 0.0162 ± 0.0038 | 0.0222 ± 0.0024 | 0.0159 ± 0.0047 | 0.0157 ± 0.0040 | |

Param (M) | 16.68 | 17.13 | 20.03 | 54.01 | 20.90 | 14.26 | 22.29 | 20.47 | 20.47 |

**Table 4.**Quantitative evaluation results for different sub-networks. The best results are marked in bold, and the second-best results are underlined.

Method | PSNR | SSIM | RMSE |
---|---|---|---|

SDNet | 34.3578 ± 2.6077 | 0.8661 ± 0.0499 | 0.0199 ± 0.0049 |

IDNet | 32.9327 ± 1.7221 | 0.8496 ± 0.0512 | 0.0230 ± 0.0044 |

Ours | 36.4065 ± 2.5655 | 0.8922 ± 0.0468 | 0.0157 ± 0.0040 |

**Table 5.**Computational costs of different DL-based reconstruction methods. The best results are marked in bold, and the second-best results are underlined.

MWNet | DuDoTrans | DDPTrans | MIST | RegFormer | Ours | |
---|---|---|---|---|---|---|

Param (M) | 31.21 | 0.62 | 0.95 | 27.30 | 2.41 | 20.47 |

FLOPs (G) | 87.52 | 125.51 | 182.24 | 615.13 | 631.70 | 751.59 |

Time (ms) | 17.53 | 159.86 | 60.95 | 262.82 | 334.70 | 284.77 |

**Table 6.**Quantitative evaluation results for different models on the COVID-19-CT dataset with 64 projection views. The best results are marked in bold, and the second-best results are underlined.

FBP | MWNet | DuDoTrans | DDPTrans | MIST | RegFormer | Ours | |
---|---|---|---|---|---|---|---|

PSNR | 20.0494 ± 1.0089 | 28.8013 ± 1.5426 | 30.2586 ± 1.4001 | 29.8582 ± 1.3986 | 30.3650 ± 1.3708 | 30.4801 ± 1.9848 | 31.9899 ± 1.8273 |

SSIM | 0.3179 ± 0.0312 | 0.7356 ± 0.0747 | 0.7608 ± 0.0671 | 0.7325 ± 0.0916 | 0.7683 ± 0.0729 | 0.7584 ± 0.0764 | 0.7918 ± 0.724 |

RMSE | 0.1001 ± 0.0012 | 0.0369 ± 0.0090 | 0.0311 ± 0.0031 | 0.0325 ± 0.0057 | 0.0307 ± 0.0053 | 0.0303 ± 0.0031 | 0.0257 ± 0.0072 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Lin, J.; Li, J.; Dou, J.; Zhong, L.; Di, J.; Qin, Y.
Dual-Domain Reconstruction Network Incorporating Multi-Level Wavelet Transform and Recurrent Convolution for Sparse View Computed Tomography Imaging. *Tomography* **2024**, *10*, 133-158.
https://doi.org/10.3390/tomography10010011

**AMA Style**

Lin J, Li J, Dou J, Zhong L, Di J, Qin Y.
Dual-Domain Reconstruction Network Incorporating Multi-Level Wavelet Transform and Recurrent Convolution for Sparse View Computed Tomography Imaging. *Tomography*. 2024; 10(1):133-158.
https://doi.org/10.3390/tomography10010011

**Chicago/Turabian Style**

Lin, Juncheng, Jialin Li, Jiazhen Dou, Liyun Zhong, Jianglei Di, and Yuwen Qin.
2024. "Dual-Domain Reconstruction Network Incorporating Multi-Level Wavelet Transform and Recurrent Convolution for Sparse View Computed Tomography Imaging" *Tomography* 10, no. 1: 133-158.
https://doi.org/10.3390/tomography10010011