Low-Dose Computed Tomography Image Super-Resolution Reconstruction via Random Forests

Aiming at reducing computed tomography (CT) scan radiation while ensuring CT image quality, a new low-dose CT super-resolution reconstruction method based on combining a random forest with coupled dictionary learning is proposed. The random forest classifier finds the optimal solution of the mapping relationship between low-dose CT (LDCT) images and high-dose CT (HDCT) images and then completes CT image reconstruction by coupled dictionary learning. An iterative method is developed to improve robustness, the important coefficients for the tree structure are discussed and the optimal solutions are reported. The proposed method is further compared with a traditional interpolation method. The results show that the proposed algorithm can obtain a higher peak signal-to-noise ratio (PSNR) and structural similarity index measurement (SSIM) and has better ability to reduce noise and artifacts. This method can be applied to many different medical imaging fields in the future and the addition of computer multithreaded computing can reduce time consumption.


Introduction
Computed tomography (CT) uses precisely collimated X-rays, gamma rays, ultrasonic waves, or other types of beams in concert with highly sensitive detectors to sequentially scan individual sections of the human body. CT has a fast scan time and results in clear images. Thus, CT is used in examinations for a variety of diseases. CT scanners are one of the most commonly installed types of medical imaging diagnostic equipment and are widely used in various clinical fields. Various types of radiation can be used for CT; however, radiation can cause certain damage to the patient's body, such as to the head, which may lead to headaches or insomnia [1]. Therefore, the ideal radiation dose for medical applications should be minimized [2]. Many methods currently exist for reducing radiation doses, such as reducing the voltage, the current, the clinical scanning time and so on. However, these approaches cause increased noise, granularity and serious artifacts in the resulting CT images, which can result in misdiagnoses [3]. Many methods to reduce these disadvantages of low-dose CT images have emerged in the super-resolution field in recent years [4][5][6].
Super-resolution (SR) reconstruction is a classical image recovery technique usually divided into three categories. The first category is the traditional interpolation method [7][8][9]. Simple interpolation methods such as bicubic interpolation can produce a smoother image that achieves a certain denoising effect and preserves edges in the zoomed image but it is powerless for removing artifacts. When dealing with visually complex real images (such as CT images) the effect of traditional interpolation is limited and can even generate artifacts. The second category is based on models [10][11][12][13]. Model-based techniques perform image reconstruction by projecting features of the image based on the degradation process of the simulated image. When a priori knowledge of the model image is effectively applied, these techniques can guarantee the quality of the reconstructed image [10,13]. However, when no a priori knowledge is available, they tend to result in an ill-posed problem because of an insufficient number of low-resolution images. Conversely, using excessive numbers of images in training can lead to long runtimes and lengthy computation.
The third category of SR reconstruction is based on machine learning [14]. Machine learning algorithms learn a nonlinear mapping of a training database consisting of low-resolution (LR) and high-resolution (HR) image pairs to obtain connections between the LR images and HR images [4,[15][16][17][18][19][20][21]. In recent years, the academic community has become increasingly interested in implementing SR based on sparse representation methods because this approach robustly preserves image features and suppresses noise and artifacts [15,18,21]. For example, Dong et al. [22] used adaptive sparse domain selection and adaptive regularization to cluster the training data and create a compact dictionary. This approach obtained a good SR result. Yang et al. [15] proposed a novel coupled dictionary training method for SR based on patchwise sparse recovery. Jiang et al. [18] proposed a single CT image SR reconstruction scheme. However, these methods require sparse coding in both the training and inference phases; therefore, their processing speeds are slow. To solve the above problems, Timofte et al. [23,24] proposed an instance-based neighbourhood regression SR algorithm and Samuel et al. [25] proposed a fast and accurate SR method based on a random forest classification mapping relationship.
Random forest (RF) is suitable for the problem framework of local linear multiple regression [26][27][28]. RF has highly nonlinear learning characteristics, is usually very fast during training and evaluation and can easily adapt to inputs consisting of noisy low-resolution images; thus, RF is widely applied in the computer vision field. Inspired by coupled dictionary learning and RF, a similar method to solve the SR of low-dose CT (LDCT) and obtain reconstructed CT images with similar quality to high-dose CT (HDCT) images is proposed here. In addition, during the SR imaging process, a series of iterations are added to improve the quality of the final reconstructed image. The proposed method is also compared with the traditional interpolation method and important indicators are evaluated. This paper is organized as follows. Section 2 provides background information concerning the related sparse representation and dictionary learning techniques. In Section 3, a random forest-based solution for SR was proposed. Section 4 presents the experimental results. Finally, Section 5 provides discussions and future works and concludes the paper.

Sparse Representation
According to the principle of compressed sensing [29,30] and sparse representation [31], an image vector x can be represented as a sparse linear combination of a dictionary D and it is mathematically expressed as follows: where α is the sparse representation coefficient and the content ||α|| 0 K, where K is the dimension of x, represents an image block. The matrix D is a dictionary with K × n dimensions. An overcomplete dictionary, that is, where the number of atoms n is larger than the dimension of the image block K, is often used for sparse representation; the sparse coefficient α can be obtained by an optimized estimation of the cost function. Generally, the cost function is expressed as follows: where λ is a constant parameter. The sparse representation is extended to the SR problem via the following function: where the vector y is the LR image block and H is the sampling matrix. Using the matrix H, the degradation of the geometric shift, blur, or down-sampling operator can be determined for the LR image y. The cost function is minimized as follows: When solving the optimal vector problem in Equation (4), how the dictionary is established is highly important for mapping the LR and HR images.

Coupled Dictionary Learning
The main method for dictionary-based single-image super-resolution was based on coupled dictionary learning. The most effective method was proposed by Yang et al. [15,16]. N samples sampled from the LR and HR images are denoted X L ∈ R D L ×N and X H ∈ R D H ×N , respectively. The symbols X L and X H represent the LR and HR data matrices, respectively and each column represents a sample x L and x H . The coupled dictionary learning method can be defined as follows: where D L ∈ R D L ×B represents the LR dictionaries and D H ∈ R D H ×B represents the HR dictionaries. The code sparse matrix connecting these two dictionaries is E ∈ R B×N . The regularization term Γ(E) is usually a sparse specification constraint of E using the 0 -norm or 1 -norm. In Equation (5), in coupled dictionary learning, the mapping relationship between LR and HR image is critical, as defined below: Equation (6) shows that dictionary training can be performed only when the mapping relation function W(X L ) is known. Using a random forest, the method of learning this mapping is discussed below.

Mapping Relation Function Learning
This section discusses learning the mapping relation function W(X L ). First, consider a two-paradigm objective function, as follows: According to different basis functions ψ(x), Equation (7) is converted to The goal of this paper is to find the regression matrix W j X n L for each γ + 1 basis function. While one option is to choose a linear basis function, such as ψ j (x) = x, a polynomial function, such as ψ j (x) = x j , can also be chosen. Different parameter settings has different effects. In either case, the target linear and nonlinear parameters can be learned through their dependencies.
This paper used random forests to create data dependence. A random forest is a binary tree and multivariate regression is performed using the dimension of the dictionary D H ; that is, each tree independently separates the data space, the leaf nodes are determined and then, the nodes are overlapped by using multiple trees and multiple forests so that each leaf node learns a linear model: However, to find all the matrices W l j , the regularized least squares problem must be solved, which can be solved by the formula Therefore, all the data are stacked into the matrix W l , Ψ(X L ) and X H and the user specifies the regularization parameter η. Because all the binary trees are used for prediction during the inference process, the data dependency matrix W(X L ) can be described as follows: where l(t) is the leaf node of tree t generated by sampling point x L and T is the number of trees.

Tree Structure Learning
We obtain the leaf node model using Equation (9) and then train the tree to find the optimal solution of the mapping relation function. N samples x n L , x n H ∈ X × Y are taken, where X and Y represent the LR and HR images, respectively. A single random tree is trained by finding the split function and using recursion to segment the training data into disjoint subsets. The split function is For all internal tree nodes, the split starts at the root node and continues down the tree in a greedy manner until it reaches the maximum depth ξ max , at which point the leaf nodes are created.
To find a good parameter Θ for the split function δ(x L , Θ), the general method is to sample the random group by a quality metric to obtain the parameter value Θ k and choose the best one. The quality of the splitting function δ(x L , Θ) is defined as follows: where Le f t and Right are the left and right sub-nodes, respectively and |·| is the cardinal operator. According to the split function in Equation (11), two new domains are defined: The function E(X L , X H ) is used to measure the purity of the data, causing similar data to fall into the same leaf node to achieve the random forest classification goal.
A new regularization expression is thus defined: where m x n L is the prediction of sample x n L , x L is the mean of sample x n L and k is a hyperparameter. Here, x n H − m x n L 2 2 is the label space operation and k· x n L − x L 2 2 is the data space operation (different k values produce different results as discussed in the next section). This regularization (similar to the E(X L , X H ) in Equation (15)) can simplify the calculation of the linear regression model m x n L . After the data in the current node are split and forwarded to the left and right child nodes, respectively, the tree continues to grow until the last leaf node has been created. Finally, classification is accomplished through voting to determine the optimal solution.

The Method Scheme
This section provides a brief description of the logic in the proposed algorithm, both the basic scheme for SR and the tree-structure construction algorithm for the random forest. These basic schemes are summarized in Table 1 and Figure 1 for clarity.  1 Input: an LDCT image x 2 Output: the final processed image y 3 LDCT image and HDCT image N-sample points x n L , x n H in the training set 4 Train individual random forest trees and then combine the trained trees into a random forest 5 The dependence matrix function W(x L ) is obtained by Equation (10) 6 Compute the mapping relationship function W(X L ) using Equations (7) and (8)  7 The relationship between the data matrixX L of the LR and the data matrix X H of the HR is obtained by Equation (6) 8 Coupling dictionary learning of the LR dictionaryD L and the HR dictionary D H is completed by Equation (5) 9 Implement the inverse of image down-sampling by Equation (4) and obtain the final image y by Equation (3) The first stage is the training stage (the red block). In this module, using the LDCT image and the corresponding HDCT image as a training set, according to the third section, a decision tree is generated by the training set and a random forest is trained to find the mapping relationship W(X L ) between the two images. The second stage is the test stage (the blue block). A non-training set LDCT image is used as the input image and using the developed mapping function and the LDCT image matrix X L , the new image matrix X H is reconstructed. Finally, the coupled dictionary learning of D L and D H is performed according to Equation (5), the inverse process of image down-sampling is performed according to Equation (4) to obtain the final reconstructed image.
Steps 3 and 4 of Table 1 mention training an individual tree and a random forest. Table 2 provides the algorithm for generating the random forest. 1 for k = 1 to K 2 Randomly extract N-samples to construct feature vector sets 3 while (tree depth is below the minimum) (1) randomly select n eigenvectors from the set of feature vectors (2) select the optimal vector and the optimal split point from the feature vectors (3) split the optimal split point into left and right child nodes (4) update tree depth 4 end while 5 create a tree T k (x) 6 end for 7 return the collection of trees {T k (x)} K k=1

Experiments and Results
In this section, experiments based on clinical data are performed using the proposed random forest solution for SR. All the experiments were executed in MATLAB 2016a on an Ubuntu 18.04 operating system with an Intel ® Core TM i5-7500 CPU @ 3.40 GHz and 64.0 GB of RAM.
All the CT images in the following experimental sections were provided by the United Imaging company. For this experiment, 100 LDCT images and the corresponding HDCT images are selected as low-resolution image training sets and high-resolution image training sets, respectively, for training and the mapping relationship is determined. This step constitutes the training phase. Here, HDCT denotes a full-dose CT image and LDCT denotes a quarter-dose CT image. In the testing phase, a non-training set LDCT image is used as the input image, combined with the training mapping relationship and then, a new CT image is obtained by reconstructed by coupled dictionary learning. Finally, the CT image reconstructed by the method of the present invention is compared with the input LDCT image, the original HDCT image and the image reconstructed by the conventional interpolation method. The findings prove that the proposed method has strong robustness in reducing noise and artifacts.

Experimental Parameters and Evaluation Function
In the experiment, the main parameters include the number of trees T in the system, the maximum tree depth ξ max , the regularization parameter λ for linear regression in the leaf nodes and the regularization parameter k of the last splitting target. When no special values are provided, the above parameters are set to T = 10, ξ max = 15, η = 0.01 and k = 1.
The resulting reconstructed image was evaluated using the peak signal-to-noise ratio (PSNR) and the structural similarity index measurement (SSIM) of the image as evaluation criteria.
The definition of PSNR is as follows: where MSE is the mean square error, height and width are the height and width of the image, respectively, I orig is the source image and I tar is the image to be evaluated. The PSNR reflects the loss of high-frequency components from the image: higher PSNR values indicate smaller loss and a better reconstruction effect. The SSIM is defined as follows [32]: SSI M(x, y) = 2u x u y + C 1 2σ xy + C 2 where u x , u y and σ x , σ y are the mean and standard deviation of the image at x, y, respectively; σ xy is the covariation of x, y; and C 1 and C 2 are constants (set to 1 in the experiment). The SSI M is the structural similarity index of x and y images and is used to measure the similarity between two images. The SSI M is more similar to the human eye's evaluation of image quality; its value ranges from 0 to 1. The closer the SSI M value is to 1, the more similar the two images are.

Clinical Data Experiments
In this experiment, clinical data were used for validation and to test the performance and robustness of the proposed method. Taking a low-dose CT image of the non-training set as input, the method proposed in this paper and the bicubic interpolation method are applied to reconstruct the input CT images. Figure 2 compares the image quality between the two methods based on the PSNR and SSIM metrics mentioned in the previous section. Figure 2a shows the original image, a high-dose CT (HDCT) image, for reference and Figure 2b shows the corresponding input image, a low-dose CT (LDCT) image. Figure 2c   The profile and residual images are also compared in Figures 3 and 4. It can be concluded that the effect and performance of the proposed method in image reconstruction is superior to those of the traditional bicubic interpolation method.  Different numbers of iterations were employed in the proposed method and the reconstructed images obtained in Figure 3 were compared. To be more convincing, three representative parts were selected for comparison in Figures 5 and 6 and the related data are shown in Tables 3-5.    As shown in Figure 7a,b and Tables 3-5, the image quality clearly changes as the number of iterations changes. The correlation image quality parameters PSNR and SSI M are optimal when iterating twice, after which these parameters have a downward trend.

Parameter Evaluation
According to the analysis in Section 2, factors that affect the random forest include the objective function for evaluating the potential segmentation function and the inherent randomness. Therefore, during the statistical analysis of the reconstruction results, two factors are considered here: the number of trees T in the random forest and the maximum depth of each tree structure ξ max .
To control the variables and ensure authenticity during this experiment, all the following experiments involve only one iteration.
Random forest classifiers function similarly to voting. The construction of random forest classifiers [27] involves first generating a decision tree; then, multiple decision trees form the random forest. Each decision tree functions as a ballot; all the trees vote to yield the final result.
A larger number of trees tends to produce a better final result but increases the time required to reach a final decision; therefore, an optimal solution must be found. Here, ξ max = 15 is set as the default. Figure 8a shows the effect of the parameter T on the experiment. The PSNR value increases steadily and eventually becomes saturated as T increases. As shown in Figure 8a, the PSNR is saturated when T = 10. Figure 8b shows the relationship between the number of trees T and the total calculation time.
According to the graphs in Figure 8, it can be concluded that T = 10 is optimal, that is, the algorithm achieves good results and completes in a reasonable amount of time when 10 trees are used. After determining the optimal number of trees (T = 10), the maximum depth of each decision tree can be discussed. Decision tree classification starts from the root node, classifies new subnodes according to their characteristics and then classifies the subnodes as new root nodes; consequently, the subclasses are sorted down to the maximum depth to obtain the final result. The maximum depth principle is the same as that for the number of trees: greater depth provides a better classification effect but requires more time to generate the tree. Therefore, finding the best solution for tree depth is also crucial. Figure 9a shows the relationship between the maximum tree depth ξ max and the experimental outcome. Tree depth has a strong influence on the training. Figure 9a shows that a steady state is reached when the depth ξ max = 15, that is, the selected sample image is saturated. This relationship is reflected by Equation (15), which directly affects the training of LDCT and HDCT images. Figure 9b shows the relationship between the maximum depth of the tree ξ max and the training time. It is concluded that the maximum depth of the tree is ξ max = 15. The regularization parameter η of the linear regression in the leaf node mentioned in Section 2.2 and the regularization parameter k of the splitting target mentioned in Equation (15) also have a certain influence on the final random forest result but their influences are not as obvious as those of the first two factors; consequently, comparisons are provided herein but detailed explanations are omitted. As shown in Figure 10a, when η > 10 −2 , the declining PSNR trend is obvious and in Figure 10b, a k value between 0.5 and 1 is most appropriate; that is, the PSNR value remains the highest within this interval.

Conclusions
In this paper, a new method for low-dose CT image SR reconstruction is proposed that avoids using sparse coding dictionaries to learn the mapping from LR images to HR images, as in general sparse representation of compressed sensing. Instead, the problem of mapping HDCT image blocks to LDCT image blocks is solved by using a random forest and combined with coupled dictionary learning to complete LDCT image reconstruction. CT images acquired from various parts of the human body have similar features and therefore, CT images of different parts of the body are included in the training set. To obtain a better reconstruction effect for a specific part of the test, CT images of that specific body part can be used as the training set. An iterative capability is also incorporated in this paper to improve the robustness of the method. Compared with traditional interpolation methods, the proposed method greatly reduces noise and artifacts. The algorithm proposed in this paper improves the resolution of noisy images and produces larger PSNR values and SSIM values. The method proposed in this paper can be applied in different CT fields, such as dual-source CT (DSCT) and can also be applied to other medical imaging fields, such as positron emission computed tomography (PET). In the training process, computer multithread computing is used to reduce the training time. Compared with the deep learning-based CT super-resolution reconstruction method, which is of great interest in the academic world, this method has a substantial advantage in terms of running time but cannot handle large training sets because of CPU and computer memory limitations. In the future, the method proposed herein will be combined with deep learning in the field of super-resolution imaging and a larger database will be trained to improve the reconstruction effect.