Deep Learning Driven Noise Reduction for Reduced Flux Computed Tomography

Deep neural networks have received considerable attention in clinical imaging, particularly with respect to the reduction of radiation risk. Lowering the radiation dose by reducing the photon flux inevitably results in the degradation of the scanned image quality. Thus, researchers have sought to exploit deep convolutional neural networks (DCNNs) to map low-quality, low-dose images to higher-dose, higher-quality images, thereby minimizing the associated radiation hazard. Conversely, computed tomography (CT) measurements of geomaterials are not limited by the radiation dose. In contrast to the human body, however, geomaterials may be comprised of high-density constituents causing increased attenuation of the X-rays. Consequently, higher-dose images are required to obtain an acceptable scan quality. The problem of prolonged acquisition times is particularly severe for micro-CT based scanning technologies. Depending on the sample size and exposure time settings, a single scan may require several hours to complete. This is of particular concern if phenomena with an exponential temperature dependency are to be elucidated. A process may happen too fast to be adequately captured by CT scanning. To address the aforementioned issues, we apply DCNNs to improve the quality of rock CT images and reduce exposure times by more than 60%, simultaneously. We highlight current results based on micro-CT derived datasets and apply transfer learning to improve DCNN results without increasing training time. The approach is applicable to any computed tomography technology. Furthermore, we contrast the performance of the DCNN trained by minimizing different loss functions such as mean squared error and structural similarity index.


Introduction
Computed tomography has been recognized as an indispensable technology not only in the healthcare domain but also with respect to industrial applications such as reverse engineering [1,2], flaw detection [3] and meteorology to name a few [4,5]. The non-destructive nature of CT scanning has also proven to be tremendously valuable in the case of geomaterials, allowing to elucidate transport phenomena in porous media; visualize deformation and strain localization in soils, rocks or sediments; or perform fracture and damage assessment in asphalt, cement and concrete [6]. The three-dimensional data obtained helps to better inform numerical models improving their predictive power and enables delineation of physical properties of the specimen under investigation. Both qualities are particularly valued in the area of digital rock physics [7,8].
The technology has, however, shortcomings, in particular with respect to monitoring dynamic processes. The limitation of prolonged acquisition times is distinctively more including results not only for synthetic data created using the ASTRA toolbox [35] but also for actual micro-CT data generated by a FEI Heliscan microCT operating with a cone-beam. Importantly, we not only aim to reduce scanning time but also seek to improve the quality of the reconstructed images compared to the high dose training images, simultaneously.
In short, in this paper, we study two deep learning architectures for improving degraded rock images resulting from reduced exposure time micro-CT scans. Furthermore, we investigate the applicability of transfer learning to minimize the number of training images needed. In addition, we explore the impact of mean-squared error (MSE) and structural similarity index measure (SSIM) loss functions on the reconstructed image quality. While both loss functions are capable of considerably improving the respective quality metrics, PSNR and SSIM, they tend to emphasize different structural features. These findings are crucial for improving digital rock physics applications where rock properties, such as porosity and permeability, are to be estimated from computed tomography data only.

Methods
Convolutional neural networks (CNNs) constitute a subset of artificial neural networks (ANNs), heavily relying on digital filter operations (kernel/convolution matrix), where the weights of the filters are informed during the training process to minimize a particular loss metric between the predictions and the training (true) samples. CNNs are especially suitable for computer vision applications given that the filters can capture the spatial relation between individual pixels or image elements.
The main components of CNNs are convolution layers and activation functions. Convolution layers consist of filters that slide across the input feature map (e.g., image). Each element of a filter is multiplied by the overlapping element of the input feature map and subsequently summed to yield one element of the output feature map. This operation is followed by activation functions to add non-linearity, empowering a CNN to learn the complex relationship between inputs and their corresponding labels. A common activation function is the rectified linear unit (ReLU), designed to set negative values to zero and linearly map inputs to outputs in case of positive values.

Details of Network Architectures Investigated
For the transfer learning aspect of this work, we take advantage of the Very Deep Super Resolution (VDSR) architecture and the associated pretrained VDSR by Kim et al. [36]. Both were obtained from the MathWorks website [37]. The VDSR architecture consists of 20 weighted convolution layers followed by a ReLU. Each convolution layer, except the final and input layers, accommodates 64 filters of size 3 × 3. Figure 1 shows the network architecture. The second architecture investigated constitutes a deep convolutional neural network (DCNN) based on a residual encoder-decoder architecture, known as the U-Net network. The encoder or feature extractor component consists of three blocks, with each block comprising three consecutive convolution layers where all layers are followed by a ReLU activation function. At the end of each block, a sample-based discretization process in the form of a max pooling operation is executed to reduce the size of the feature map. Similarly, the decoder incorporates a transposed convolution layer followed by three consecutive convolution layers to be terminated by ReLU activation functions. In addition, skip connections between each encoder block and its corresponding decoder block are included to concatenate the output of the transposed convolution layers with the feature map from each encoder block. All convolution layers, except the last layer, are of size 3 × 3. In the encoder part, the number of filters increases for each block (32, 64 and 128, sequentially). Equivalently, in the decoder part, the number of filters decreases for each block (128, 64, and 32, sequentially). The network configuration is outlined in Figure 2.

Loss Functions
The MSE is a commonly used loss function for image restoration tasks and is defined as follows: where p constitutes the index of the pixel in patch P, t(p) is the pixel value in the trained patch, r(p) corresponds to the pixel value of the reference image and N is the number of pixels in a given patch [38].
The structural similarity index measure (SSIM), however, is often regarded as a more pragmatic metric for evaluating image quality, particularly with respect to human visual perception [39]. For a pixel p, the SSIM is given as follows: where µ and σ 2 reflect the average and variance of the training and reference patch, respectively, and σ tr is the associated covariance. c 1 and c 2 are constants required to avoid division with a weak denominator and partially depend on the dynamic range of the pixels. Given that the SSIM range is bounded, SSIM ≤ 1, with 1 being indicative that the training image is identical to the reference image, the loss function needs to be written as follows: For training of the VDSR and the pretrained VDSR, an image patch size of 41 × 41 with 128 patches per image was utilized. The Adam optimizer was configured for a learning rate of 0.0001, 5 epochs and a mini-batch size of 32. For this case we only applied the MSE loss function as defined in Equation (1).
Both loss functions were exploited to train the DCNN (U-Net), employing an image patch size of 512 × 512, 1600 training images, 400 testing images, the Adam optimizer configured for a learning rate of 0.0001, 50 epochs and a mini-batch size of 8.
Naturally, for both architectures, the low-quality/low-exposure images served as input to be trained on the corresponding high-quality/high-exposure scans.

Data Acquisition Details
Using a FEI Heliscan microCT, configured to perform 1800 projections per revolution at a tube voltage of 85 kV and a current of 72 mA, two datasets were acquired at an exposure time of 0.5 and 1.4 a, respectively. Consequently, for 1800 projections per revolution, the reduction in exposure time reduces acquisition time per revolution from 42 to 15 min, a decrease of more than 60%.
As mentioned above, an increased exposure time translates to a greater image quality given that more photons are collected at the detector, thereby decreasing the noise. Henceforth, we refer to the scans collected at 1.4 s as high quality images and 0.5 s data as low quality images. The scanned specimen was a carbonate of unknown origin measuring 1.5 inches in diameter and about 2 inches in length. The sample geometry dictated a minimum voxel size of about 14 µm. The sample was chosen because the pore space is sufficiently large to be adequately resolved by the micro-CT.
During all scans, an approximately 100 µm thick aluminum sheet was mounted at the tungsten target window to soften the X-rays in an effort to minimize beam hardening artifacts. The amorphous-silicon, large-area, digital flat-panel detector with 3072 × 3072 pixels, is capable of supporting a pixel array of nine mega-pixels with a dynamic range of 16 bits. The effective scan resolution was 2884 × 2884 pixels. To accommodate the network architecture, the individual slices needed to be split into tiles of size 512 × 512 pixels and the gray-scale values were normalized to a range between zero and one. In a first-order approximation, the gray values may be interpreted as density values where brighter areas indicate larger density values and darker areas suggest lower density values. Hence, pore space is represented by shades of black (see, e.g., Figure 3).

ASTRA Toolbox
As shown below, the images predicted by the network are of significantly greater quality than the training images collected at an exposure time of 1.4 s. Consequently, it became necessary to create an artificial case based on the images predicted by the network to verify that the architecture is indeed predicting the ground truth.
The ASTRA toolbox is an open-source software for tomographic projections and reconstruction, available for MATLAB© and Python [35,40]. Throughout this study, MAT-LAB© 2020a in combination with the ASTRA toolbox V1.9 was utilized. The cone beam projection module in the ASTRA toolbox offers the following three reconstruction algorithms: FDK by Feldkamp et al. [41], simultaneous iterative reconstruction technique (SIRT) by Gilbert [42] and conjugate gradient least squares (CGLS) by Frommer and Maass [43].
The toolbox allowed us to simulate artificial low and high exposure time images based on images predicted by the VDSR network utilizing a cone beam geometry with 1800 projections per revolution. Details explaining the toolbox are provided elsewhere [44][45][46][47].
Given that the toolbox does not model noise sources and, effectively, assumes a perfect detector, Poisson noise mimicking the signal-to-noise ratio (SNR) values of the CT images was added to the projections to simulate the physical process at the detector. Subsequently, the sinograms were reconstructed by means of the FDK algorithm to yield artificial low and high exposure time scans. To summarize, at this point, the following image series are available:

1.
A 0.5 s exposure time series obtained from the micro-CT, constituting the low quality data (see, e.g., right-hand-side in Figure 3) 2.
A 1.4 s exposure time series obtained from the micro-CT, representing the high quality data (see e.g., left-hand-side in Figure 3 The images predicted by the network that are of greater quality (see, e.g., Figure 4) compared to the 1.4 s exposure training images (labels) (these images subsequently serve as the ground truth from which artificial low and high exposure time series were obtained) 4.
An artificial low quality image series derived from predicted images using the ASTRA toolbox mimicking the results obtained from the micro-CT in terms of SNR at an exposure time of 0.5 s 5.
An artificial higher quality image series, delineated from predicted images using the ASTRA toolbox resembling the results obtained from the micro-CT in terms of SNR at an exposure time of 1.4 s The artificially created series was used to validate the predictive power of the trained network, as detailed in Section 3.4.

Results and Discussion
In this section, we benchmark the proposed DCNNs to restore low quality micro-CT images as a result of reduced exposure times. We begin by highlighting the problem and its adverse consequences on the scanned image quality. Next, we explore the applicability of transfer learning to help expedite the training of the PVDSR network.
Exploiting a pre-trained VDSR network, we substantiate that optimal performance can be obtained more quickly than relying on a randomly initialized VDSR network. In addition, we also compare the reconstruction performance of different loss functions, including MSE and SSIM. Finally, we prove the predictive power of the DCNNs by testing them against simulated low-and high-exposure images from the ASTRA toolbox [35].

Reduced Exposure
In the context of rock imaging or imaging of materials in general, the reduction of exposure time offers three main advantages.
Firstly, micro-CT scanning, if offered as a commercial service, from experience, is charged on an hourly basis ranging from hundreds to thousands of dollars per hour. Evidently, high quality scans necessitate a longer exposure time, consequently being more costly. Thus, a decrease in scan time while maintaining image quality is beneficial to both parties: it allows the provider to offer the service to the client at a reduced cost and, at the same time, increase the throughput.
Secondly, any reduction in exposure time will result in a more economic use of the filament life time. Generally, a single filament costs about 700 to 1000 dollars and is rated for about 300 working hours. Assuming the particular scan time reduction achieved in this work, roughly 60%, the filament lifetime may be increased up to 480 working hours. In addition, as shown below, the network also performs exceedingly well at denoising the image without the need for user intervention.
Thirdly, and most importantly, a reduction in exposure time renders the technology available to elucidate processes previously out of reach due to the associated dynamics.
As mentioned above, a reduction in exposure time increases the noise level due to photon starvation at the detector. For example, Figure 3 illustrates how reduced exposure time CT data (0.5 s) yields a considerably noisier image compared to a 1.4 s exposure time scan.
The noise present in the low exposure, and, noticeably, also in the high-exposure image, is problematic if rock properties such as porosity and permeability are to be estimated. Accurate porosity values strongly rely on the ability to distinguish between the solid phase and the pore space, precisely. With respect to Figure 3, this can constitute a daunting task, especially for the low exposure time case. Commonly, a median or smoothing filter followed by, for example, a histogram or watershed based segmentation is applied [48].
Estimation of permeability is significantly more involved as it necessitates a computational fluid dynamics study on the segmented data [49]. In addition, permeability is particularly dependent on fine scale features and mineralogy in the case of wetting fluids.
To address the aforementioned challenges, we sought to train DCNNs to denoise low-exposure micro-CT images without the need for expert knowledge with respect to filter design.

Transfer Learning
While DCNNs have shown remarkable performance for myriad scientific problems, they are well-known for being data-and resource-intensive due to a large number of trainable parameters. Lack of training samples or computational resources may hurt the performance of these networks in either of these situations. A pragmatic approach to address this issue is to take advantage of transfer learning, a machine learning technique seeking to apply previously gained knowledge to speed up finding the solution to a different yet related problem.
In this particular study, we explore the applicability of transfer learning using the VDSR network, as illustrated in Figure 1, and compare the reconstruction performance of the VDSR network initialized as per [50] with a pre-trained VDSR network by minimizing the MSE as defined in Equation (1). We train both the pre-trained VDSR network and the VDSR network for a range of the number of training images, starting with 50 training images up to a maximum of 300 training images. For each particular number of training images, we measure the reconstruction performance of the two networks by comparing the average SSIM and peak signal-to-noise ratio (PSNR) values for the pre-trained and for the randomly initialized network) derived from 400 test images. Figures 5 and 6 show that the pre-trained VDSR network always yields a better overall reconstruction performance for a given number of training images. This holds true for both considered metrics to quantify reconstruction quality (SSIM and PSNR), demonstrating the inherent advantage of transfer learning. Figure 7 exemplifies the reconstruction quality achieved by the pre-trained VDSR network, based on 300 training images, improving average SSIM and PSNR values of the low-exposure image from 0.54 ± 0.02 and 23 ± 0.57 dB to 0.78 ± 0.02 and 34 ± 0.57 dB, respectively.
Examining the predicted image in Figure 7, it should be noted that it is indeed of greater quality (less noisy) than the high quality reference image or training label. The noise is greatly reduced and the grain boundaries show a sharper delineation. This is somewhat surprising given that, from a conventional signal processing point of view, both edges and noise constitute high frequency content. Often, filters designed to remove high frequency content are often found to smear out edges and subtle details [51]. Granted, median filters, or filters utilizing local statistics, in general, perform well in preserving them, yet it is remarkable that the network learned to differentiate between discontinuities in the form of edges and noise. This particular aspect is addressed in more detail in Section 3.4.

Loss Functions
Proper selection of the loss, objective or fitness function is crucial in guiding the learning process of the network. The MSE loss function, as defined in Equation (1), is a preferred metric to optimize the weights owing to its simplicity and well-behavedness with respect to gradient calculations. Notably, the minimization of the MSE indirectly maximizes the PSNR.
Given the particular problem of image prediction, we seek to compare the impact of the MSE loss function against the SSIM loss function, as defined in Equation (2), on the image quality metrics PSNR and SSIM, respectively. For this purpose, we trained the U-net-derived DCNN on 1600 training images for each metric. The trained networks were benchmarked using 400 test images. In general, we obtain considerable improvements for both the SSIM and PSNR values of the reconstructed images, as shown in Figures 8-11. For the MSE optimized network, the PSNR increased, on average, from about 22.6 to 34.5 dB (see Figure 8) and the SSIM from 0.56 to 0.79 (see Figure 9). In the case of the SSIM optimized network, the PSNR increased, on average, to 34.6 dB (see Figure 10) and the SSIM to 0.79 (see Figure 11).  Both loss functions perform remarkably well in restoring fine scale features, as exemplified in Figure 4, and yield similar image quality improvements. With respect to Figure 12, however, it seems they tend to emphasize different features of the data. The MSE optimized network predicts coarser grain textures and boundaries and seems to be more sensitive to fine scale pore space. Conversely, the SSIM optimized network suggests smoother textures, sharper grain boundaries and appears to be less sensitive to fine scale pore space. Plotting the cross sections indicated in form of white lines in Figure 4, Figure 13 evidences the remarkable ability of the network to remove noise.
Surprisingly, the quality of the predicted image is clearly superior to the quality of the long exposure time image. As mentioned above, the network is seemingly able to distinguish between high frequency noise and discontinuities in the form of edges. At this point, it became necessary to verify the predictive power of the networks, and it was decided to create artificial cases where the ground truth is known. The approach is detailed in the next section. Figure 10. Histogram of the PSNR values obtained for the 400 test images. "Before filtering" refers to the low exposure scans. "After filtering" refers to the DCNN (U-Net) denoised scans where the network was optimized with respect to the SSIM loss function.   Figure 13. Intensity profiles taken along the white lines shown in Figure 12 that demonstrate the remarkable ability of the network to remove noise: (a) the profiles taken along the cross sections depicted in Figure 12a that represent the high-and low-exposure images respectively; and (b) the corresponding profiles that represent the MSE reconstructed (left) and SSIM reconstructed (right) images.

ASTRA Toolbox
As elaborated in the previous section, the images predicted by the DCNNs (images on the right of Figures 7 and 12b) are not only of superior quality compared to the lowexposure images but also exhibit less noise than their corresponding high-exposure images or the training labels. As discussed in the Introduction and substantiated by Figure 12b, noise can be reduced by increasing the exposure time or flux in general. In addition, the choice of the reconstruction algorithm is also critical. Iterative reconstruction algorithms such as Simultaneous Iterative Reconstructive Technique (SIRT) or Conjugate Gradient Least Squares (CGLS) are well known to suppress noise compared to classic filtered backprojection (FDP) via Feldkamp-type (FDK) reconstruction algorithms [52,53]. The particular algorithm employed by the FEI Heliscan microCT is proprietary.
Given the surprising results, we sought to verify them by creating an artificial dataset for which the ground truth is known. For this purpose, the VDSR network's denoised images were fed into the ASTRA toolbox to create noisy projections mimicking low and high exposure time images. Next, the projections were reconstructed using FDK, SIRT and CGLS. As summarized in Figure 14, SIRT and CGLS performed well in removing the noise, whereas FDK failed to do so. Hence, we decided to solely focus on FDK for the creation of the artificial dataset. Subsequently, the artificial dataset was tested utilizing the trained networks. Figure 15 shows results for the VDSR network trained on the artificial datasets, i.e., it was trained to map the artificial low exposure to its corresponding artificial high exposure (training example/label). The average SSIM and PSNR values of the predicted images from the network (SSIM = 0.86, PSNR = 25 dB) are better than the artificial high-(SSIM = 0.28, PSNR = 18 dB) and low-exposure images (SSIM = 0.17, PSNR = 14 dB) according to 200 test images, where the reference images (ground truth) were used to calculate these values. It is, again, surprising that the output images of the network yield greater quality results compared to the training examples (artificial high-exposure images). This substantiates, however, the results reported in the previous section where the ground truth was unknown.

Conclusions
In this work, we successfully demonstrate the value of DCNN to improve the quality of micro-CT scans of a carbonate rock sample. The proposed method has the potential to reduce the exposure time without compromising the scan quality. Assuming 1800 projections per revolution, the acquisition time per revolution reduces from 42 to 15 min, a decrease of more than 60%. We found that the networks are able to predict images of superior quality compared to the long exposure time training images (labels). In particular, the networks are seemingly able to distinguish between unwanted high frequency content such as noise and actual high frequency features of the data such as discontinuities in the form of grain boundaries. Crucially, we verified the predictive power of the networks by creating a synthetic dataset, allowing us to compare against a known ground truth.
Given the substantial time requirements for training the networks, we also investigated the applicability of transfer learning. Using a pre-trained VDSR network, we found that high quality images can be obtained for a smaller number of training epochs compared to training from scratch.
Additionally, we highlighted the impact of MSE and SSIM based loss functions on the DCNN predictions. Both yield similar improvements with respect to PSNR and SSIM. They tend to, however, emphasize different structural aspects of the specimen. The MSE optimized network predicts coarser grain textures and boundaries and seems to be more sensitive to fine-scale pore space. Conversely, the SSIM optimized network suggests smoother textures, sharper grain boundaries and appears to be less sensitive to fine scale pore space.
To summarize, the proposed method enables substantial savings in acquisition time while simultaneously improving the scan quality. The reduction in scan time is an important aspect if dynamic processes are to be elucidated or higher sample throughput is required. Importantly, the approach is applicable to any computed tomography technology (medical CT, micro-CT and industrial CT). The vast improvement in image quality, without the need for expert intervention, is crucial for digital rock physics applications where rock properties such as porosity and permeability are estimated solely from computed tomography data. Inevitably, the accuracy of the estimation is dictated, in part, by the scan quality.
Nevertheless, we have to be cognizant of the limitations of the approach. The networks constitute, to a great extent, black boxes, making it challenging to implement improvements.
In addition, we cannot discount the fact that a substantial amount of data is required to train the network even though transfer learning can help to reduce the volume required.