3D Tensor Based Nonlocal Low Rank Approximation in Dynamic PET Reconstruction

Reconstructing images from multi-view projections is a crucial task both in the computer vision community and in the medical imaging community, and dynamic positron emission tomography (PET) is no exception. Unfortunately, image quality is inevitably degraded by the limitations of photon emissions and the trade-off between temporal and spatial resolution. In this paper, we develop a novel tensor based nonlocal low-rank framework for dynamic PET reconstruction. Spatial structures are effectively enhanced not only by nonlocal and sparse features, but momentarily by tensor-formed low-rank approximations in the temporal realm. Moreover, the total variation is well regularized as a complementation for denoising. These regularizations are efficiently combined into a Poisson PET model and jointly solved by distributed optimization. The experiments demonstrated in this paper validate the excellent performance of the proposed method in dynamic PET.


Introduction
Positron emission tomography (PET) seeks to obtain radioactivity distributions by collecting numerous photons emi ed by the annihilations of positrons that come from the isotope-labeled tracer injected in living tissue. Correspondently, considering the unique traits of physiological and molecular imaging, PET is universally and irreplaceably required in clinic imaging, especially in cancer diagnoses [1], lesion detection [2], and functional supervision in vivo. However, despite its predominance for functional imaging, PET is dwarfed in resolution due to limitations either from acquisition time or the injection dose when compared with other structural medical imaging systems such as magnetic resonance imaging (MRI) and computed tomography (CT). Moreover, in dynamic PET imaging, where the time-varying radioactivity concentration at each spatial location is obtained, the structural information and denoising performance are more critical, along with increasing demands for time activity curves (TAC) for different regions of interest (ROI). Under this framework, reliable algorithms for dynamic PET reconstruction have been discussed and debated for decades.
Initial a empts at PET reconstruction included the famous analytic filtered back-projected (FBP) [3] based algorithms, least squares (LS) [4], and the maximum likelihood-expectation maximization (ML-EM) [5] method. Although milestones, problematic conditions [6] always accompany the optimization of the mentioned algorithms (i.e., the solution might be sensitive to trivial fluctuation and thus consequently increase noise along the iterations). Thus, much effort has The rank distribution for the matrices which is formulated from given images. Up: the monarch; Down: Positron emission tomography (PET) scan of an Alzheimer's patient's brain.

Dynamic PET Imaging Model
Typically, during the detection procedure, the PET scanner collects the emitted photons and then The main contributions are listed as follows: (1) An innovative form of a nonlocal low rank tensor constraint is adopted in the Poisson's model, which captures data correlation in multiple dimensions in dynamic PET, beyond just spatiotemporal correlation. For one thing, without any additional information, Poisson's model exploits the temporal information among the frames themselves, effectively complementing the structures in low-active frames and recovering severely corrupted data. For the other, it exploits the spatial information from nonlocal self-similarities within each frame, thereby enhancing the structured sparsity for each image. (2) As an additional regularization, the total variation (TV) constraint is employed to extract local structure and further complement the denoising function. On the other hand, the expectation-maximization method is employed as a fidelity term to incorporate hidden data in the objective function and thus increase efficiency in optimization. (3) In the optimization procedure, we develop a distributed optimization framework inspired by the alternative direction method of multipliers (ADMM) [30]. In this way, the mentioned terms can be explicitly organized in a united objective function and effectively handled as three subproblems during the iterations.

Dynamic PET Imaging Model
Typically, during the detection procedure, the PET scanner collects the emi ed photons and then pre-processes them into so-called sinogram data as the input of the reconstruction. For dynamic PET, let the sinogram matrix Y = [ y 1 , y 2 , … , y , … , y ] ∈ R × denote the collection of sinogram data vectors, where = 1, 2, … , denotes the index of the frames; and sinogram vector y = { , = 1, … , } ∈ R denotes the sum of the photons collected in the t-th frame, where q represents the index of the total M pairs of detectors. Correspondently, X = [ x 1 , x 2 , … , x , … x ] ∈ R × denotes the collection of the images that are supposed to be recovered, where vector x ∈ R represents the t-th frame. Since Poisson distribution uses inherited PET systems, the reconstruction of each frame can be successfully modeled by the affine transformation: where y is the expectation of y ; G ∈ R × is the system matrix; and r and s represent the random coincidence and sca er coincidence, which inevitably contain heavy noise. In this way, we can obtain the likelihood function of y as Instead of maximizing Label (2), we estimate x by minimizing the negative log-likelihood version for the convenience of optimization: where the constant term log( !) is left out. Therefore, in the scale of dynamic reconstruction, Equation (3) can be transformed into min X P(X) = min X ∑ ∑ − log( ) . . y = Gx + r + s .
However, optimization merely by Equation (4) is ill-conditioning-i.e., the reconstruction is vulnerable to accumulated iterative noise accompanying the iteration. The predominant solution to this shortcoming is to include the regularization terms in Equation (4) as the image prior. Thus, the object function of PET reconstruction can be wri en as where P(X) denotes the fidelity term defined in Equation (4), and R(X) denotes the regularization term.

Tensor Decomposition
Just like the matrix, the low-rank approximation of the tensor is also inevitably based on tensor decomposition.
At present, the strategies for tensor decomposition mainly fall into three groups: the CANDECOMP/PARAFAC [31] (CP) decomposition methods, Tucker decomposition [32,33] methods, and tensor-singular value decomposition [34][35][36] (t-SVD) methods. Because of the stability and efficiency inherited in their optimization, t-SVD has aroused increasing interest among the community.
However, optimization merely by Equation (4) is ill-conditioning-i.e., the reconstruction is vulnerable to accumulated iterative noise accompanying the iteration. The predominant solution to this shortcoming is to include the regularization terms in Equation (4) as the image prior. Thus, the object function of PET reconstruction can be written as where P( ) denotes the fidelity term defined in Equation (4), and R( ) denotes the regularization term.

Tensor Decomposition
Just like the matrix, the low-rank approximation of the tensor is also inevitably based on tensor decomposition. At present, the strategies for tensor decomposition mainly fall into three groups: the CANDECOMP/PARAFAC [31] (CP) decomposition methods, Tucker decomposition [32,33] methods, and tensor-singular value decomposition [34][35][36] (t-SVD) methods. Because of the stability and efficiency inherited in their optimization, t-SVD has aroused increasing interest among the community.
As shown in Figure 2, for a three-way tensor ∈ ℝ × × , t-SVD shares the same form as the matrix SVD: where ∈ ℝ × × and ∈ ℝ × × are orthogonal tensors [37]; represents the conjugate transpose [34] of ; ∈ ℝ × × is a f-diagonal tensor in which each frontal slice ( ) is a diagonal matrix [35]; and * denotes the t-product [37,38]. In this paper, we proposed a nonlocal tensor low-rank framework for dynamic PET reconstruction, where the tensor low-rank approximation is efficiently and effectively conducted by a t-SVD based method. The details of this method will be illustrated in Section 3.

Method
In this paper, we proposed a novel reconstruction framework that can jointly recover, denoise, and (mostly) critically complete the structures in the dynamic PET imaging system. The overall procedure is illustrated in Algorithm 1. In this paper, we proposed a nonlocal tensor low-rank framework for dynamic PET reconstruction, where the tensor low-rank approximation is efficiently and effectively conducted by a t-SVD based method. The details of this method will be illustrated in Section 3.

Method
In this paper, we proposed a novel reconstruction framework that can jointly recover, denoise, and (mostly) critically complete the structures in the dynamic PET imaging system. The overall procedure is illustrated in Algorithm 1.

Nonlocal Low Rank Tensor Approximation
The nonlocal tensor regularization consists of two parts: forming the nonlocal tensor within the recovered frames and formulating the low-rank property in the formed tensors.

Tensor Formulation
During the optimizing procedure, a temporary estimated image sequence X = [ x 1 , x 2 , … , x , … x ] ∈ R × will be obtained after each iteration, where x ∈ R represents the t-th frame of image vector as mentioned in Section 2.1. Similar to Figure 1, numerous W × W sized overlapping patchesx ∈ R (n = W × W) can be extracted from each x , where i denotes the index of the image patch. According to the nonlocal self-similar properties, within the image, there are plentiful patches that share the same structure with eachx . Based on the Euclidean distance, we choose the m nearest patches for eachx : where is the index set of similar patches for the i-th positioned patchx ; and is the threshold value defined by the distance betweenx and its m-th nearest patch. Thus, for each exemplarx , we formulate a matrix: As shown in Figure 1, due to the nonlocal self-similarity, it is fairly assumed that each X is low-rank.
From a dynamic recovery perspective, sets for = 1, 2, … , are identical, since the structure in dynamic PET is unchanged. Therefore, based on the patch position i, we can construct a 3D tensor ∈ R × × , whose frontal slices are calculated as (:, :, ) = X , = 1, 2, … , . Figure 3 illustrates the overall procedure for forming a nonlocal tensor. Within each iteration, numerous s for various i positions will be constructed and then approximated by a low-rank property.
Sensors 2019, 19, x 6 of 21 Figure 3. Illustration of tensor construction. The first row represents the temporary recovered image sequence. For the randomly chosen patch position i, similar patches can be found within its own frame and then grouped into matrices in the second row. By stacking the matrices along the frames, we managed to construct a feature tensor ∈ ℝ × × for position i.

Low Rank Tensor Approximation
The next step is the low rank approximation for constructed tensors , ∀ . Traditionally, the primal model of rank regularization searches for the tensor , where = arg ( ) s.t. = .
Nevertheless, this model is non-deterministic polynomial-time (NP) hard, and its direct optimization is nonconvex. Taking this situation into account, we choose a surrogate form of Equation (10) and turn it into a convex optimization issue: Here, ‖⋅‖ * denotes the tensor nuclear norm [39] and ‖⋅‖ denotes the Frobenius norm. Under this circumstance, it is feasible to get a closed-form solution by adopting tensor singular value thresholding (t-SVT) [39]: with = * * representing the mentioned t-SVD of and where ( ) = ( , 0); ( ) ( ) denotes the fast Inverse Fast Fourier Transform (IFFT) of X across dimension 3; and = ( ) ( ) denotes the Fast Fourier Transform (FFT) of across dimension 3.

Total Variation Regularization in Dynamic PET
Apart from nonlocal low-rank tensor regularization, we also incorporate the total variation [40] (TV) as a complementary constraint into the dynamic PET reconstruction framework. Unlike the nonlocal tensor, the adopted TV regularization focuses on inherited local information within each frame. Moreover, this pixel-based regularization compromises patch-based regularization and thus improves the denoising performance of the proposed algorithm.
For each image frame ∈ ℝ in the estimated sequence = [ , , . . . , , . . . ] ∈ ℝ × , we adopt an l2 formed TV regularization, which is properly formulated in accordance with augmented Figure 3. Illustration of tensor construction. The first row represents the temporary recovered image sequence. For the randomly chosen patch position i, similar patches can be found within its own frame and then grouped into matrices in the second row. By stacking the matrices along the frames, we managed to construct a feature tensor ∈ R × × for position i.

Low Rank Tensor Approximation
The next step is the low rank approximation for constructed tensors , ∀ . Traditionally, the primal model of rank regularization searches for the tensor , where Nevertheless, this model is non-deterministic polynomial-time (NP) hard, and its direct optimization is nonconvex. Taking this situation into account, we choose a surrogate form of Equation (10) and turn it into a convex optimization issue: Here, ‖ ⋅ ‖ * denotes the tensor nuclear norm [39] and ‖ ⋅ ‖ denotes the Frobenius norm. Under this circumstance, it is feasible to get a closed-form solution by adopting tensor singular value thresholding (t-SVT) [39]: with = * * representing the mentioned t-SVD of and where ( ) + = max( , 0); (3) ( ) denotes the fast Inverse Fast Fourier Transform (IFFT) of X across dimension 3; and = (3)( ) denotes the Fast Fourier Transform (FFT) of across dimension 3.

Total Variation Regularization in Dynamic PET
Apart from nonlocal low-rank tensor regularization, we also incorporate the total variation [40] (TV) as a complementary constraint into the dynamic PET reconstruction framework. Unlike the nonlocal tensor, the adopted TV regularization focuses on inherited local information within each frame. Moreover, this pixel-based regularization compromises patch-based regularization and thus improves the denoising performance of the proposed algorithm.
For each image frame x ∈ R in the estimated sequence X = [ x 1 , x 2 , … , x , … x ] ∈ R × , we adopt an l 2 formed TV regularization, which is properly formulated in accordance with augmented Lagrangian optimization [41,42]: where = x denotes the discrete gradient of x at position j; and denotes the j-th element of the corresponding differential operator D. Correspondingly, the augmented Lagrangian function can be wri en as: where Ω = [ 1 , 2 , … , ] ∈ R × represents the collection of discrete gradient vectors Ω = Dx for each recovered frame; represents the tunable weighting parameters of the quadratic term; and represents the updatable multiplier vector. Consequently, this method guarantees the convexity of TV regularization, and hence equips the proposed method with global convergence.

Expectation Maximization for Fidelity Term
Indisputably, solving the fidelity term Equation (4), to a large extent, is an essential mission in estimating the reconstruction images . Regardless of regularization, Equation (4) can be further wri en as: where denotes the j-th pixel in image x ; is the qj-th entry in system matrix G, representing the contribution of the j-th pixel given to the q-th detector; and the hidden variable represents the photon count from the j-th pixel to the q-th detector pair in the t-th frame.
The main challenge lies in the solving procedure, especially handling the hidden variable . In this work, we adopt the well-known expectation maximization (EM) [5,43], which introduces 'complete data' into the model and thus facilitates optimization. In order to solve Equation (18), there are two essential steps: • E-step: This step employs the expectation̂= ( |x , ) as the substitute for the hidden variable in Equation (18):̂= • M-step: This step maximizes the likelihood by zeroing the derivative of Equation (19): The EM algorithm, which will be illustrated in greater detail in the next section, makes our proposed framework readily and efficiently solvable.

The Overall Optimization Framework
Based on Equation (5), the overall reconstruction model can be represented as where P(X) is the fidelity term in the reconstruction model Equation (16); TNL(X) represents the tensor formed nonlocal low rank constraint in Section 3.1; DTV(X) represents the dynamic PET adopted total variation term in Section 3.2; and and denote the weighting parameters. By taking the mentioned terms into account, the objective function can be formulated as: (20) is a complex process. For this process, we refer to the alternative direction method of multipliers (ADMM) [30] and divide the model into three subproblems of ℒ, and X in a distributed optimization way.
(1) ℒ-subproblem. Let ( ) ( ) denote the updated variable after the k-th iteration; e.g., X ( ) represents the computed image sequence after the k-th iteration. In the (k + 1)-th iteration, nonlocal low-rank approximation is first implemented. After the formulation of nonlocal feature tensors ( +1) , ∀ , as illustrated in Section 3.1, we can obtain the function related to each low rank tensor: (2) -subproblem. Unlike updating ℒ , the discrete gradient is updated frame by frame. As shown in Equation (15), we update Ω = [ 1 , 2 , … , ] ∈ R × by employing the shrinkage operator [44]: Correspondingly, the multiplier vector updates by: (3) X-subproblem. After the update of ℒ and Ω, the last critical process is to update X = In addition to the fidelity term in Equation (18), the former mentioned regularizations must be considered. In this procedure, we adopt the EM algorithm and hence reform Equation (20) into a joint function relevant to X (or ): . .̂= ( ) Here, ℒ ( ) (or ℒ (:, :, )) denotes the t-th frontal slice of tensor ℒ ; Γ denotes the contribution weight for pixel x to tensor ; denotes the j-th element of Ω , and denotes the j-th column of the differential operator D; and is the j-th element of multiplier v .
According to the M-step in Equation (18), we can harvest a unitary quadratic equation: Therefore, ( +1) can be readily solved as the positive root of (25): In this X sub-problem, given that we do not have a close-formed solution for the PET reconstruction model [5], the X-subproblem is not fully solved by the employment of EM. However, due to the implementation of ADMM based optimization, the overall framework will converge, even if its sub-problems are not carried out exactly [30].
Algorithm 1 demonstrates the overall procedure of our proposed algorithm. It is noteworthy that, in the initialization step, we employed the filtered backward projection (FBP) [3] method to produce a 'warm start'. By doing so, as we will show in the next section, iterative performance is notably improved.

Experiments and Results
In this section, we perform various experiments, in order to validate the qualitative and quantitative measures of the proposed method. Data on diversified photon counts, sizes, tracers, and structures are recovered by our proposed method and compared with the results of representative and state-of-the-art algorithms.

Evaluation Criteria
For the qualitative evaluation, randomly selected reconstructions are shown in this part, where the structural detail, noise level, and so on will be intuitively illustrated. For the quantitative evaluation, other than the peak signal-to-noise ratio (PSNR)), we also employ the relative bias and variance [45] as the indicators of the resolution and smoothness, respectively: wherêdenotes the ground truth in the j-th pixel; denotes the mean value of the ROI; and denotes the total number of pixels in the given ROI. Unlike the PSNR, the smaller the relative bias and variance are, the be er the reconstruction is.
We also conduct the multiple simulations experiment. In this study, we employ the contrast recovery coefficient (CRC) and the standard deviation (STD) [16,46]: Here, = 50 represents the number of realizations in the simulation. In Equation (29), represents the mean value of the ROI in r-th realization and represents the mean value of the background region in r-th realization. In Equation (30), denotes the total number of pixels in the background region; = (1/ ) ∑ =1 , denotes the mean value of j-th pixel in background region across R realizations.
For the real data, we adopt the contrast to noise ratio (CNR) [47]: where the and represent the intensity of the ROI and background region respectively, and is the standard deviation of the background region.

Dataset
As shown in Figure 4, we mainly adopt a 64 × 64 sized Zubal brain phantom as the template and employ the 11C-dihydrotetrabenazine (denoted as DTBZ) as the tracer. The scanning procedure in simulated into 18 frames for a duration of 20 min with the corresponding TAC presented in Figure 1. Moreover, to validate the performance of the algorithms under a diversified tracer dose, the data are generated in 3 × 10 6 , 10 7 and 3 × 10 7 total photon counts over 18 frames. Furthermore, in the multiple realizations experiment, we generate total 100 realizations for 3 × 10 6 and 3 × 10 7 counted data (R = 50 for each simulation). In addition, all the simulated se ings correspond to real cases. As shown in Figure 4, we mainly adopt a 64 × 64 sized Zubal brain phantom as the template and employ the 11C-dihydrotetrabenazine (denoted as DTBZ) as the tracer. The scanning procedure in simulated into 18 frames for a duration of 20 min with the corresponding TAC presented in Figure  1. Moreover, to validate the performance of the algorithms under a diversified tracer dose, the data are generated in 3 × 10 6 , 10 7 and 3 × 10 7 total photon counts over 18 frames. Furthermore, in the multiple realizations experiment, we generate total 100 realizations for 3 × 10 6 and 3 × 10 7 counted data (R = 50 for each simulation). In addition, all the simulated settings correspond to real cases. Furthermore, 111 × 111 sized Zubal head phantoms [16] are recovered to validate the effectiveness of the proposed method on different tracers (18F-FDG) and image sizes. Moreover, real cardiac data are tested in this section. The data are scanned over 60 min by a Hamamatsu SHR-22000 (Hamamatsu Photonics K.K., Hamamatsu City, Japan). There are, overall, 19 frames, and each are scanned by 130 detector pairs from 192 angles.

Comparative Algorithms
To evaluate the performance of the proposed algorithm, we introduce five representative algorithms in comparison (the maximum likelihood-expectation maximization (ML-EM) algorithm [5], the penalized weighted least square (PWLS) method [8], the total variation optimized by augmented Lagrangian (TV-AL) method [48], the penalized likelihood incremental optimization method regularized by hyperbolic potential function [45,49] (denoted as PLH-IO), and the spatialtemporal total variation (ST-TV) method [50]) proposed for dynamic PET reconstruction.

Parameters Setting
After deliberate examination, we set the weighting parameters as follows: the nonlocal tensor weight parameter = 1.7; TV weight parameter = 0.9; tensor thresholding parameter = 2.5; Furthermore, 111 × 111 sized Zubal head phantoms [16] are recovered to validate the effectiveness of the proposed method on different tracers (18F-FDG) and image sizes. Moreover, real cardiac data are tested in this section. The data are scanned over 60 min by a Hamamatsu SHR-22000 (Hamamatsu Photonics K.K., Hamamatsu City, Japan). There are, overall, 19 frames, and each are scanned by 130 detector pairs from 192 angles.

Comparative Algorithms
To evaluate the performance of the proposed algorithm, we introduce five representative algorithms in comparison (the maximum likelihood-expectation maximization (ML-EM) algorithm [5], the penalized weighted least square (PWLS) method [8], the total variation optimized by augmented Lagrangian (TV-AL) method [48], the penalized likelihood incremental optimization method regularized by hyperbolic potential function [45,49] (denoted as PLH-IO), and the spatial-temporal total variation (ST-TV) method [50]) proposed for dynamic PET reconstruction.

Parameters Se ing
After deliberate examination, we set the weighting parameters as follows: the nonlocal tensor weight parameter = 1.7; TV weight parameter = 0.9; tensor thresholding parameter = 2.5; and and parameter = 50. Another critical issue is the tensors' sizes. As shown in Figure 5, the optimal patch size is 3 × 3. Thus, if the number of feature patches is set to 10 in the 18-frame sequence, each selected tensor's size is 9 × 10 × 18. We set the maximum iteration to 500 for all methods in comparison.

Experiment Description
We evaluate our method both qualitatively and quantitatively on simulation and real PET data. We firstly compare the reconstructions for 64 × 64 sized Zubal brain data under 3 × 10 7 total photon counts, and demonstrate 5th, 11th, and 17th frame in Figure 6. In Figure 7, we further compare the 11th reconstructed frames under lower-counted sequences: 3 × 10 6 and 1 × 10 7 photon counts. In addition, we recover the 111 × 111 sized Zubal head phantom to validate the performance under different sizes and TACs, as shown in Figure 8. For the real patient study, we demonstrate the first frame of dynamic cardiac PET reconstructions in Figure 9, and compute the CNR for each method.

Experiment Description
We evaluate our method both qualitatively and quantitatively on simulation and real PET data. We firstly compare the reconstructions for 64 × 64 sized Zubal brain data under 3 × 10 7 total photon counts, and demonstrate 5th, 11th, and 17th frame in Figure 6. In Figure 7, we further compare the 11th reconstructed frames under lower-counted sequences: 3 × 10 6 and 1 × 10 7 photon counts. In addition, we recover the 111 × 111 sized Zubal head phantom to validate the performance under different sizes and TACs, as shown in Figure 8. For the real patient study, we demonstrate the first frame of dynamic cardiac PET reconstructions in Figure 9, and compute the CNR for each method.

Experiment Description
We evaluate our method both qualitatively and quantitatively on simulation and real PET data. We firstly compare the reconstructions for 64 × 64 sized Zubal brain data under 3 × 10 7 total photon counts, and demonstrate 5th, 11th, and 17th frame in Figure 6. In Figure 7, we further compare the 11th reconstructed frames under lower-counted sequences: 3 × 10 6 and 1 × 10 7 photon counts. In addition, we recover the 111 × 111 sized Zubal head phantom to validate the performance under different sizes and TACs, as shown in Figure 8. For the real patient study, we demonstrate the first frame of dynamic cardiac PET reconstructions in Figure 9, and compute the CNR for each method.      In the quantitative evaluations, we firstly present the PSNR, relative bias and variance for data under 3 × 10 7 , 3 × 10 6 and 1 × 10 7 photon counts in Table 1; furthermore, we compute the relative bias and variance in each ROI for 3 × 10 6 counted data and demonstrate them in Table 2. In Figure 10a-c, we compute the PSNR, relative bias, and variance for each frame in 1 × 10 7 counted data. In Figure 10d, we demonstrate the performance of the trade-off between resolution and smoothness, by altering the parameters in each algorithm and plo ing the bias and variance in each reconstruction. In the quantitative evaluations, we firstly present the PSNR, relative bias and variance for data under 3 × 10 7 , 3 × 10 6 and 1 × 10 7 photon counts in Table 1; furthermore, we compute the relative bias and variance in each ROI for 3 × 10 6 counted data and demonstrate them in Table 2. In Figure  10a-c, we compute the PSNR, relative bias, and variance for each frame in 1 × 10 7 counted data. In Figure 10d, we demonstrate the performance of the trade-off between resolution and smoothness, by altering the parameters in each algorithm and plotting the bias and variance in each reconstruction.  Moreover, we conduct the experiment on multiple simulations. In our experiment, we run 100 realizations for 3 × 10 7 and 3 × 10 6 counted data, and draw the CRC-STD curves, as shown in   To better validate the proposed method, we further implement the experiments on multiple realizations. As demonstrated in Figure 11a-b, we run the experiments under high-count and low-count scenarios (3 × 10 7 and 3 × 10 6 counted data) and draw the correspondent CRC-STD curves, which illustrate the performance of compared methods. In these curves, each point corresponds to a certain setting for parameters in the relative method. According to the figures, the proposed method manages to recover higher CRC while keeping the background STD in a low level, which validates the stability of our method in reducing the noise while keeping the contrast distinctive. Moreover, we analyze the statistical performance of the Wilcoxon rank sum test on the multiple realizations data. As shown in Figure 11c-d, we compute the p-value between the proposed method and compared methods, in terms of PSNR on multi-simulation. In this study, the proposed method significantly outperforms the ST-TV in 3 × 10 6 dataset, at p-value < 0.01; more distinctively, the proposed method outperforms other methods in 3 × 10 6 dataset and all methods in 3 × 10 7 dataset, at a p-value < 0.001. Moreover, we conduct the experiment on multiple simulations. In our experiment, we run 100 realizations for 3 × 10 7 and 3 × 10 6 counted data, and draw the CRC-STD curves, as shown in Figure 11a-b. Accordingly, the CRC is computed from ROI 2, and the background STD is computed from ROI 4, which represents the white ma er. In addition, we run the Wilcoxon rank sum test on the multiple realization test, as demonstrated in Figure 11c-d. In addition, we further examine the universality of the proposed method by reconstructing data under wider range of photon counts, as demonstrated in Figure 11a, from 1 × 10 6 to 1 × 10 8 . Further discussions on convergence are illustrated in Figure 11b and the discussion section.

Robustness and Convergence Analysis
The robustness and convergence experiments are demonstrated in this section. As we can see from Figure 12a, the proposed method presents better performance and robustness than other methods under a broad range of photon counts.
For the convergence, Figure 12b demonstrates the PSNR for iterations of all tested methods. Specifically, we individually test our methods with and without a warmstart, which is mentioned in Section 3. Our warmstart-equipped method surpasses other methods in its convergence performance.

Qualitative Evaluation
The initial experiment focuses on the resolution and the denoising performance throughout the temporal dimension. According to the TAC in Figure 4, the distinction of activity between different ROIs is inconspicuous in the early imaging stage, which consequently hampers the recovery of the structural information in the corresponding image frames. As demonstrated in Figure 6, the reconstructions of ML-EM and PWLS suffer from severe iterative noise and fail to recover a clear boundary between regions. On the other hand, the TV-AL and PLH-IO show more acceptable results for the 17th frame. However, when it comes to former frames, neither of these two methods are able to recover clear structures. Moreover, the TV-AL suffers from the staircase effect and artifacts, and PLH-IO tends to over-smooth the image. Although ST-TV improves the resolution by incorporating temporal information, it is still limited by recovering more detailed structures. In contrast, our proposed method manages to recover more detailed structures and less noise in reconstructing the brain phantom sequence under 3 × 10 7 photon counts. This contrast is more distinctive in simulated low-dose images. As we can see in Figure 7, when recovering early frames in the low-count data, our proposed method is able to recover substantially clearer structures than those of other methods under similar noise levels.
Meanwhile, our method also shows its universality in recovering sequences under different sizes and TACs. In this experiment, 111 × 111 sized Zubal head phantom data were tested in an 18F-FDG environment. In Figure 8, the 15th frame is randomly selected out of 24 frames. In addition, real patient data are tested. In Figure 9, the second sequence is shown, and the photon counts are around 2.2 × 10 5 . Obviously, our proposed method yields clearer boundaries and more conspicuous contrast between ROIs and the background.

Quantitative Evaluation
The quantitative measurements are also meticulously implemented. Table 1 demonstrates the average statistical values for the dynamic image sequences under diversified photon counts. According to the table, the proposed method enjoys a higher PSNR and lower relative bias and variance, revealing solid and substantial merits in structural enhancement, resolution improvement, and image denoising. Table 2 provides more detailed measurements for each ROI reconstruction. As shown in the table, the proposed method reconstructs images at a lower bias and variance in each ROI to vary the count level, which demonstrates be er resolution and smoothness than those of comparable methods.
Other than spatial information, temporal trends are also considered in Figure 10, where Figure 10a-c presents the PSNR, bias, and variance for each frame. It can be easily observed that the proposed method, overall, has be er results for each frame, which will facilitate the exploitation of temporal information. In addition, given the fact that regular reconstruction methods are largely based on the trade-off between the resolution and the noise level, we also implemented an experiment of this trade-off, correspondingly represented by the relative bias and the relative variance in Figure 10d. As we can see, apart from the relatively poorer performances of ML-EM and PWLS, both TV-AL and PLH-IO show a negative correlation between these two indexes. In contrast, the marks of our proposed method are densely concentrated in the bo om-left of this graph, which shows a be er image quality and be er compromise for the mentioned trade-off.
To be er validate the proposed method, we further implement the experiments on multiple realizations. As demonstrated in Figure 11a-b, we run the experiments under high-count and low-count scenarios (3 × 10 7 and 3 × 10 6 counted data) and draw the correspondent CRC-STD curves, which illustrate the performance of compared methods. In these curves, each point corresponds to a certain se ing for parameters in the relative method. According to the figures, the proposed method manages to recover higher CRC while keeping the background STD in a low level, which validates the stability of our method in reducing the noise while keeping the contrast distinctive. Moreover, we analyze the statistical performance of the Wilcoxon rank sum test on the multiple realizations data. As shown in Figure 11c-d, we compute the p-value between the proposed method and compared methods, in terms of PSNR on multi-simulation. In this study, the proposed method significantly outperforms the ST-TV in 3 × 10 6 dataset, at p-value < 0.01; more distinctively, the proposed method outperforms other methods in 3 × 10 6 dataset and all methods in 3 × 10 7 dataset, at a p-value < 0.001.

Robustness and Convergence Analysis
The robustness and convergence experiments are demonstrated in this section. As we can see from Figure 12a, the proposed method presents be er performance and robustness than other methods under a broad range of photon counts.

Discussion
Other than merely denoising, the proposed method simultaneously provides enhancement and completion of structural sparsity by introducing a 3D tensor based nonlocal low-rank constraint. Unlike the tracer kinetics based dynamic reconstruction method, the proposed method spontaneously exploits the inner temporal correlation within the sequence without the need for tracer information and model fitting. In fact, our proposed method not only managed to suppress noise while recovering at a high resolution, it also enhanced, and even completed, the structural information in the reconstruction sequence.
This result is attributed to the tensor based low rank approximation. As presented in Figure 3, the third dimension of the ∈ ℝ × × exists alongside the temporal information. By implementing the tensor based low rank constraint on each , the spatial information is spontaneously infiltrated from high-counted frames to low-counted frames, while keeping the voxels' relative intensities and edge arrangements fixed. Considering this feature, a dynamic PET sequence is considered ideal for this framework, given the unchanged boundary and structures along the dynamic sequences. On the other hand, since noise is randomly and sparsely arranged in the PET sequence, it is ruled out as the sparse component in the low rank approximation.
Furthermore, we demonstrate the contribution of different regularization components in Figure 13 by setting various hyper-parameters. According to the figure, tensor constraints can successfully recover detailed structures but are slightly limited in smoothing an image under low counts. Fortunately, the employment of a TV constraint compensates for this issue, as demonstrated in Figure 13c.
Nevertheless, there are still several concerns in this study. Firstly, for the data in unwilling but conspicuous motion, the proposed method is limited in harvesting temporal correlation, though other dynamic PET algorithms also suffer from this issue, to the best of our knowledge. Currently, the optimal solution is to conduct motion correction before reconstruction. Secondly, since 3D structural nonlocal features are not well proven in the computer vision community, this method is not guaranteed to function well in 3D PET reconstruction. However, we still provide two solutions for applying the proposed method in 3D data at the current stage: (1) Conduct the method slice by slice; and (2) rebin the data into a 2D form before reconstruction. For the convergence, Figure 12b demonstrates the PSNR for iterations of all tested methods. Specifically, we individually test our methods with and without a warmstart, which is mentioned in Section 3. Our warmstart-equipped method surpasses other methods in its convergence performance.

Discussion
Other than merely denoising, the proposed method simultaneously provides enhancement and completion of structural sparsity by introducing a 3D tensor based nonlocal low-rank constraint. Unlike the tracer kinetics based dynamic reconstruction method, the proposed method spontaneously exploits the inner temporal correlation within the sequence without the need for tracer information and model fi ing. In fact, our proposed method not only managed to suppress noise while recovering at a high resolution, it also enhanced, and even completed, the structural information in the reconstruction sequence.
This result is a ributed to the tensor based low rank approximation. As presented in Figure 3, the third dimension of the ∈ R × × exists alongside the temporal information. By implementing the tensor based low rank constraint on each , the spatial information is spontaneously infiltrated from high-counted frames to low-counted frames, while keeping the voxels' relative intensities and edge arrangements fixed. Considering this feature, a dynamic PET sequence is considered ideal for this framework, given the unchanged boundary and structures along the dynamic sequences. On the other hand, since noise is randomly and sparsely arranged in the PET sequence, it is ruled out as the sparse component in the low rank approximation.
Furthermore, we demonstrate the contribution of different regularization components in Figure 13 by se ing various hyper-parameters. According to the figure, tensor constraints can successfully recover detailed structures but are slightly limited in smoothing an image under low counts. Fortunately, the employment of a TV constraint compensates for this issue, as demonstrated in Figure 13c. The last issue focuses on the computational cost. For simulated data in Figure 4, the computational time for each method is demonstrated in Table 3. Here, the computational experiments are implemented under Matlab R2014a (Mathworks, Natick, MA., USA), on the same desktop with an Intel Core i7-4720HQ CPU (Santa Clara, CA, USA) @2.60 GHz and 8 GB RAM. We have to concede that, compared with the traditional pixel-based algorithms, the computational cost is inevitable in the proposed method, due to the multiple decompositions for feature tensors generated by feature cubes (or patches in 2D situation [23,24]). To address this issue, we will continue optimizing the proposed algorithm and employing other algorithms, such as TCTF [51].
In addition, the choice for the tensor decomposition model is still open in our future work. The T-SVD based method is proved effective in our work and [34][35][36], yet, strictly speaking, its tubal based rank is the analogous rank extended from SVD. In our future work, we will further analyze the data-structures, explore the feasibilities of other potential models, i.e., CP and Tucker rank [31], and testify the applicabilities of the latest proposed CP rank based methods [52,53] as well as Tucker rank based methods [54,55]).

Conclusions
In this paper, we provide a novel tensor based nonlocal low-rank framework for dynamic PET reconstruction. By introducing a nonlocal featured tensor and applying the t-SVT in low-rank tensor approximation, the image structures are efficiently enhanced while effectively depressing the noise. More significantly, structural information is further completed by other frames in an interactive way, thereby compromising the conflict between spatial and temporal resolution. On the other hand, accompanied by the TV term denoising (from a local and pixel-based perspective), the regularizations are firstly integrated in the Poisson reconstruction model and efficaciously optimized in a distributed framework.  Nevertheless, there are still several concerns in this study. Firstly, for the data in unwilling but conspicuous motion, the proposed method is limited in harvesting temporal correlation, though other dynamic PET algorithms also suffer from this issue, to the best of our knowledge. Currently, the optimal solution is to conduct motion correction before reconstruction. Secondly, since 3D structural nonlocal features are not well proven in the computer vision community, this method is not guaranteed to function well in 3D PET reconstruction. However, we still provide two solutions for applying the proposed method in 3D data at the current stage: (1) Conduct the method slice by slice; and (2) rebin the data into a 2D form before reconstruction.
The last issue focuses on the computational cost. For simulated data in Figure 4, the computational time for each method is demonstrated in Table 3. Here, the computational experiments are implemented under Matlab R2014a (Mathworks, Natick, MA., USA), on the same desktop with an Intel Core i7-4720HQ CPU (Santa Clara, CA, USA) @2.60 GHz and 8 GB RAM. We have to concede that, compared with the traditional pixel-based algorithms, the computational cost is inevitable in the proposed method, due to the multiple decompositions for feature tensors generated by feature cubes (or patches in 2D situation [23,24]). To address this issue, we will continue optimizing the proposed algorithm and employing other algorithms, such as TCTF [51]. In addition, the choice for the tensor decomposition model is still open in our future work. The T-SVD based method is proved effective in our work and [34][35][36], yet, strictly speaking, its tubal based rank is the analogous rank extended from SVD. In our future work, we will further analyze the data-structures, explore the feasibilities of other potential models, i.e., CP and Tucker rank [31], and testify the applicabilities of the latest proposed CP rank based methods [52,53] as well as Tucker rank based methods [54,55]).

Conclusions
In this paper, we provide a novel tensor based nonlocal low-rank framework for dynamic PET reconstruction. By introducing a nonlocal featured tensor and applying the t-SVT in low-rank tensor approximation, the image structures are efficiently enhanced while effectively depressing the noise. More significantly, structural information is further completed by other frames in an interactive way, thereby compromising the conflict between spatial and temporal resolution. On the other hand, accompanied by the TV term denoising (from a local and pixel-based perspective), the regularizations are firstly integrated in the Poisson reconstruction model and efficaciously optimized in a distributed framework.