Similarity-Driven Fine-Tuning Methods for Regularization Parameter Optimization in PET Image Reconstruction

We present an adaptive method for fine-tuning hyperparameters in edge-preserving regularization for PET image reconstruction. For edge-preserving regularization, in addition to the smoothing parameter that balances data fidelity and regularization, one or more control parameters are typically incorporated to adjust the sensitivity of edge preservation by modifying the shape of the penalty function. Although there have been efforts to develop automated methods for tuning the hyperparameters in regularized PET reconstruction, the majority of these methods primarily focus on the smoothing parameter. However, it is challenging to obtain high-quality images without appropriately selecting the control parameters that adjust the edge preservation sensitivity. In this work, we propose a method to precisely tune the hyperparameters, which are initially set with a fixed value for the entire image, either manually or using an automated approach. Our core strategy involves adaptively adjusting the control parameter at each pixel, taking into account the degree of patch similarities calculated from the previous iteration within the pixel’s neighborhood that is being updated. This approach allows our new method to integrate with a wide range of existing parameter-tuning techniques for edge-preserving regularization. Experimental results demonstrate that our proposed method effectively enhances the overall reconstruction accuracy across multiple image quality metrics, including peak signal-to-noise ratio, structural similarity, visual information fidelity, mean absolute error, root-mean-square error, and mean percentage error.


Introduction
Positron emission tomography (PET) is a non-invasive imaging technique that enables the visualization of biochemical processes in the patient body by using a radioactive substance known as a radiotracer [1,2]. The patient undergoing the PET scan is injected with the radiotracer, which travels through the body and gets absorbed by the targeted organ or tissue. Once the radiotracer is injected into the body, it begins to emit positrons. When a positron collides with an electron in the surrounding tissue, it annihilates and produces two gamma rays in opposite directions. These gamma rays are detected by a ring of detectors surrounding the patient. The aim of image reconstruction in this case is to accurately map the distribution of the radioactive material in the patient's body, which can provide valuable information about various physiological and biochemical processes. However, PET images are often characterized by low spatial resolution and high noise, which can limit their diagnostic accuracy. To address these limitations, various image reconstruction methods have been developed over the last decades, which aim to improve the spatial resolution and signal-to-noise ratio of PET images and reduce the amount of radiation exposure required for accurate imaging [3].
Among the various reconstruction methods, the penalized-likelihood (PL) approach, which is also known as the model-based iterative reconstruction method, has been shown Our work is inspired by the well-known non-local means approach [25], which has been widely used for image denoising [25][26][27][28][29] and restoration/reconstruction [30][31][32][33][34] by exploiting the measure of similarity between the image patches. The nonlocal means approach is based on the idea that in an image, pixels that are similar to each other tend to have similar values. Therefore, instead of averaging the values of neighboring pixels to obtain an estimate of the value of a particular pixel, the nonlocal means approach takes into account the similarity between the patches centered on each pixel in the image. The weighted average of the patch values is then used to obtain an estimate of the value of the pixel of interest. While our work is inspired by the non-local means approach, our method for fine-tuning the control parameter differs from the non-local means denoising approach. Instead of using the similarity measure between patches to calculate the weighted average for edge-preserving smoothing, our approach applies the similarity measure to calculate the optimal value for the control parameter for each pixel. The experimental results demonstrate that our proposed method enables adaptive selection of the optimal control parameter for each pixel, leading to enhanced image quality in the reconstruction process.
The remainder of this paper is organized as follows: Section 2 first describes the PL approach to PET image reconstruction and illustrates the two representative edge-preserving convex non-quadratic (CNQ) penalty functions, which involve the hyperparameters controlling the sensitivity of edge preservation. The details about our main idea of using the similarity-driven method for hyperparameter tuning are then described. The optimization method for the PL reconstruction algorithm with the CNQ penalty functions is also derived. Section 3 shows our experimental results using both digital and physical phantoms, where our proposed method effectively enhances the overall reconstruction accuracy across multiple image quality metrics. Finally, Section 4 draws a conclusion.

Penalized Likelihood Approach
The PL approach to PET image reconstruction is to seek the estimatef of the underlying source image f from the emission measurement g by using the following minimization: where L(g|f) is the log-likelihood term represented by the log of a Poisson distribution, R(f) is the regularization term to penalize the image roughness, λ is the smoothing parameter that controls the balance between the two terms. The regularization term is usually defined in such a way that it penalizes the roughness of the estimate by the intensity difference between neighboring pixels, which is given by where ϕ(·) is the penalty function, f j is the j-th pixel in an image, f j is the neighbor of f j , and N j is the neighborhood system of the pixel f j . In this work, among many different convex non-quadratic (CNQ) penalty functions, we consider the following two most popular CNQ functions proposed by Lange [35] (denoted as LN hereafter) and Huber [36] (denoted as HB hereafter): ϕ HB (ξ) = ξ 2 , 2σ|ξ| − σ 2 , where δ and σ are the positive hyperparameters that control the sensitivity of edge preservation by modifying the shape of the penalty functions ϕ LN (·) and ϕ HB (·), respectively. The typical shapes of the LN and HB penalty functions are shown in Figure 1, where they are compared with the quadratic (QD) penalty function ϕ QD (ξ) = ξ 2 .
where δ and σ are the positive hyperparameters that control the sensitivity of edge preservation by modifying the shape of the penalty functions  Based on the observation in Figure 1a, it can be seen that the CNQ penalty functions exhibit lower penalization than the QD penalty for significant intensity differences between the adjacent pixels. This characteristic enables the CNQ penalties to effectively preserve edges. The first-order derivative of the penalty function in Figure 1b indicates the strength of smoothing. Comparing it to the QD penalty function, which shows a linear increase in the magnitude of the derivative with increasing intensity difference, the LN penalty demonstrates a slower increase beyond a large value of the intensity difference. Furthermore, the HB penalty remains constant once the intensity difference reaches a large value. Therefore, both the LN and HB penalty functions satisfy the necessary condition for a CNQ penalty function to preserve edges, which is summarized as   is the first-order derivative of the penalty function and K is a positive constant [37]. For a given intensity difference between the adjacent pixels, as the hyperparameter δ (or σ) decreases, K also decreases, which results in more edges, and vice versa. To effectively preserve edges while suppressing noise, selecting an appropriate value for the hyperparameter is crucial. In this work, we assume that all hyperparameters (λ, δ, and σ) are preselected for the entire image before the reconstruction process begins. We aim to refine the value of δ (or σ) for each pixel during the reconstruction process by using the patch similarities within the neighborhood of a pixel to be updated. This approach enables us to fine-tune the hyperparameter value on a per-pixel basis, optimizing edge preservation in the reconstructed image.

Similarity-Driven Hyperparameter Tuning
In this work, inspired by the well-known non-local mean (NLM) approach [25], which has shown great potential in removing noise while preserving image details such as edges and textures by exploiting the redundancy and self-similarity of the image structure, we propose a new method of fine-tuning the hyperparameter δ (or σ) by using the self-similarity of the underlying image structure. The NLM approach is based on the idea that in an image, pixels that are similar to each other tend to have similar values. Therefore, instead of averaging the values of neighboring pixels to obtain an estimate of the Based on the observation in Figure 1a, it can be seen that the CNQ penalty functions exhibit lower penalization than the QD penalty for significant intensity differences between the adjacent pixels. This characteristic enables the CNQ penalties to effectively preserve edges. The first-order derivative of the penalty function in Figure 1b indicates the strength of smoothing. Comparing it to the QD penalty function, which shows a linear increase in the magnitude of the derivative with increasing intensity difference, the LN penalty demonstrates a slower increase beyond a large value of the intensity difference. Furthermore, the HB penalty remains constant once the intensity difference reaches a large value. Therefore, both the LN and HB penalty functions satisfy the necessary condition for a CNQ penalty function to preserve edges, which is summarized as lim is the first-order derivative of the penalty function and K is a positive constant [37]. For a given intensity difference between the adjacent pixels, as the hyperparameter δ (or σ) decreases, K also decreases, which results in more edges, and vice versa. To effectively preserve edges while suppressing noise, selecting an appropriate value for the hyperparameter is crucial. In this work, we assume that all hyperparameters (λ, δ, and σ) are preselected for the entire image before the reconstruction process begins. We aim to refine the value of δ (or σ) for each pixel during the reconstruction process by using the patch similarities within the neighborhood of a pixel to be updated. This approach enables us to fine-tune the hyperparameter value on a per-pixel basis, optimizing edge preservation in the reconstructed image.

Similarity-Driven Hyperparameter Tuning
In this work, inspired by the well-known non-local mean (NLM) approach [25], which has shown great potential in removing noise while preserving image details such as edges and textures by exploiting the redundancy and self-similarity of the image structure, we propose a new method of fine-tuning the hyperparameter δ (or σ) by using the self-similarity of the underlying image structure. The NLM approach is based on the idea that in an image, pixels that are similar to each other tend to have similar values. Therefore, instead of averaging the values of neighboring pixels to obtain an estimate of the value of a particular pixel, the NLM approach takes into account the similarity between the patches centered on each pixel in the image and computes a weighted average of patches centered around each pixel in the image. In the NLM approach, the similarity between the two patches is defined by [25] where ρ jj is the patch difference and h is a positive parameter. The patch difference ρ jj is defined as where ρ(N j ) and ρ(N j ) are the patches centered at the pixel j and j , respectively, P is the total number of pixels in a patch, and f j(p) and f j (p) are the p-th pixels in the patches ρ(N j ) and ρ(N j ), respectively. For a 3 × 3 patch window, ρ jj defined in (6) can be calculated by visiting each of the 9 pixels (p = 1, 2, . . . , 9). Figure 2 shows how the similarity matrix W j is calculated when the neighborhood system N j consists of four neighbors (north, south, east, and west) and one (W jj = 1) in the center. To avoid a sudden change of the sign of ' jj  , we define j  using the following modified Butterworth polynomial: where t is the turning point of the r-th order polynomial, j z is the j-th pixel in the image z standing for the pixel-wise roughness in the estimated image obtained from the previous iteration of the reconstruction process. Various pixel-wise roughness measures may be used for z. In this work, we compare the three different roughness measures: gradient (GR), standard deviation (SD), and mean of patch similarity (PS). The GR of an image is a vector field that represents the magnitude and direction of the change in intensity at each pixel in the image. To measure the pixel-wise roughness only, the magnitude of the GR is used. The pixel-wise SD image is calculated as follows: Note that the NLM approach in image denoising uses the similarity between the two patches defined by (5) for weighted smoothing, which can be expressed as In contrast, our method uses the similarity in (5) to adjust the initially tuned value of δ (or σ). The basic strategy to refine the initially tuned parameter δ = δ 0 is that the value of δ at a pixel may be increased or decreased depending on the degree of the patch similarity. To incorporate the patch similarity W jj into the adjustment of δ, we use the following formula: where δ jj is the fine-tuned value of δ using the patch similarity between the two patches centered at the pixels j and j , w is the mean of W jj evaluated for all pixels in the estimated image obtained from the previous iteration of the PL reconstruction process, and α j is in [−1,1] which is also determined from the previous iteration by measuring the degree of roughness within the neighborhood of the pixel j. (The value of α j approaches −1 when the pixel roughness is very low, whereas it approaches 1 when the roughness is very high.) In (8), a negative value of α j decreases δ jj , whereas a positive value of α j increases δ jj . In an extreme case, where α j = −1 due to the irregular edges with relatively low similarities, the value of δ jj can be smaller than δ 0 . On the other hand, when α j = −1 due to the regular edges with high similarities, δ jj can be close to δ 0 . When α j = 1 in a flat region, δ jj is larger than δ 0 . In summary, δ jj adaptively varies around δ 0 depending on the patch similarities in the neighborhood of the pixel j.
To avoid a sudden change of the sign of δ jj , we define α j using the following modified Butterworth polynomial: where t is the turning point of the r-th order polynomial, z j is the j-th pixel in the image z standing for the pixel-wise roughness in the estimated image obtained from the previous iteration of the reconstruction process. Various pixel-wise roughness measures may be used for z. In this work, we compare the three different roughness measures: gradient (GR), standard deviation (SD), and mean of patch similarity (PS). The GR of an image is a vector field that represents the magnitude and direction of the change in intensity at each pixel in the image. To measure the pixel-wise roughness only, the magnitude of the GR is used. The pixel-wise SD image is calculated as follows: where s j is the j-th pixel in the SD image calculated within the 3 × 3 neighborhood system N j of the pixel f j and L = 9 in this case. The mean of patch similarity for the j-th pixel is defined by Figure 3 shows shapes of α z j in (9)

Derivation of PL Reconstruction Algorithm
To derive a PL reconstruction algorithm that employs the similarity-driven fine-tun ing method for hyperparameter optimization, we first use an accelerated version of the r = 2 r = 100

Derivation of PL Reconstruction Algorithm
To derive a PL reconstruction algorithm that employs the similarity-driven finetuning method for hyperparameter optimization, we first use an accelerated version of the maximum likelihood (ML) algorithm, namely, the complete data ordered subsets expectation-maximization (COSEM) [38] algorithm and extend it to a PL algorithm to include the regularization term. While the well-known ordered subsets expectationmaximization (OSEM) [39] algorithm accelerates the original expectation maximization (EM) algorithm [40] by subdividing the projection data into several subsets (or blocks) and then progressively processing each subset by performing projection and back-projection operations in each iteration, it is not provably convergent due to the lack of its objective function. On the other hand, the COSEM algorithm is fast and convergent with an objective function.
The COSEM algorithm applies the idea of ordered subsets used in the OSEM algorithm on the "complete data" C rather than on the projection data g. The complete data C, whose elements are denoted as C ij , represents the number of coincidence events that originated at the j-th pixel in the underlying source and recorded by the i-th detector pair so that the following relationship holds: The COSEM-ML algorithm can be expanded to the COSEM-PL algorithm by including the regularization term. For our COSEM-PL algorithm, if C is fixed to C = C (n) at the n-th iteration in an alternative updating procedure, the overall energy function with the regularizer in (2) can be expressed as: where S q , q = 1, . . . , Q, is the q-th subset of the detector pairs, and C (n,q) ij denotes the update of C ij at outer iteration n and subset iteration q. When the regularization term in (11) takes a CNQ form as described by (3) or (4), it is not possible to obtain a closed-form solution. Therefore, we employ the method of optimization transfer using paraboloidal surrogates [41][42][43] that can efficiently find a global minimum of a convex function by using the following surrogate function for the penalty term [42]: where the ϕ (ξ) is the first-order derivative, ξ n−1 denotes the value of ξ at the (n − 1)-th iteration, and ψ(ξ) = ϕ (ξ)/ξ. By dropping the terms that are independent of the variable ξ, (12) can be written asφ To avoid the coupling problem of f j and f j when ξ is substituted with f j − f j in the quadratic term in (13), the regularization term is modified by using the separable paraboloidal surrogate (SPS) function [44,45] as follows: By replacing the regularization term in (11) withR f j ; f n−1 , the overall energy function for each f j is expressed as where f (n,q) j denotes the update of f j at outer iteration n and subset iteration q. Note that after the completion of the subset iteration at the n-th iteration, f (n,Q) is assigned to f (n+1) . By setting the derivative of (15) to zero and solving for the positive root of the quadratic equation, the final update equation is given by where a, b, and c are given by In the COSEM-PL algorithm, the C-update is the same as the C-update in the COSEM-ML algorithm. Therefore, the COSEM-PL algorithm is performed by alternately updating C ij (n,q) and f (n,q) j at outer iteration n and subset iteration q. Figure 4 shows the schematic diagram of the COSEM-PL algorithm, where our parameter fine-tuning method is applied. Note that the control parameter δ (or σ) is updated using one of the three roughness measures (GR, SD, and PS) calculated from the image reconstructed in the previous iteration, and the initial values of the hyperparameters (λ, δ or σ) are preset either manually or using an automated method before the COSEM-PL iteration begins.

Numerical Studies Using Digital Phantom
To test our idea, we first performed numerical studies using a 128 × 128 digital Ho man brain phantom slice shown in Figure 5a. The activity ratio of the phantom is 4:1:0 gray matter, white matter, and cerebrospinal fluid (CSF), respectively. For projection da we used 128 projection angles over 180° with 128 detector pairs. To generate projecti data with noise, we first scaled the phantom so that the total counts of its projection da could be approximately 500,000, and then added independent Poisson noise to the nois less projection data obtained from the scaled phantom. Figure 5b provides a qualitati representation of the typical noise level observed in the 40th iteration of the EM-ML ( the COSEM-ML with a single subset) reconstruction from a noisy sinogram with appro imately 500,000 photon counts.

Numerical Studies Using Digital Phantom
To test our idea, we first performed numerical studies using a 128 × 128 digital Hoffman brain phantom slice shown in Figure 5a. The activity ratio of the phantom is 4:1:0 in gray matter, white matter, and cerebrospinal fluid (CSF), respectively. For projection data, we used 128 projection angles over 180 • with 128 detector pairs. To generate projection data with noise, we first scaled the phantom so that the total counts of its projection data could be approximately 500,000, and then added independent Poisson noise to the noiseless projection data obtained from the scaled phantom. Figure 5b provides a qualitative representation of the typical noise level observed in the 40th iteration of the EM-ML (or the COSEM-ML with a single subset) reconstruction from a noisy sinogram with approximately 500,000 photon counts.
For PL reconstruction, we compared two different methods: the standard PL method, which uses fixed hyperparameter values for all pixels in the entire image, and the similarity-driven PL (SDPL) method, which employs our proposed method of parameter fine-tuning on a per-pixel basis. To ensure convergence, we used 4 subsets and 80 iterations, which effectively corresponds to 320 iterations for a single subset. To assess the effectiveness of the SDPL algorithm across diverse hyperparameter configurations, we employed two distinct (high and low) levels of initial parameter values for both the smoothing parameter λ and the control parameter δ (or σ). Note that our approach can seamlessly integrate with a wide range of existing parameter-tuning methods, thereby eliminating the need for a specific criterion in selecting the initial parameter values.

Numerical Studies Using Digital Phantom
To test our idea, we first performed numerical studies using a 128 × 128 digi man brain phantom slice shown in Figure 5a. The activity ratio of the phantom i gray matter, white matter, and cerebrospinal fluid (CSF), respectively. For project we used 128 projection angles over 180° with 128 detector pairs. To generate p data with noise, we first scaled the phantom so that the total counts of its project could be approximately 500,000, and then added independent Poisson noise to th less projection data obtained from the scaled phantom. Figure 5b provides a qu representation of the typical noise level observed in the 40th iteration of the EM the COSEM-ML with a single subset) reconstruction from a noisy sinogram with imately 500,000 photon counts. For PL reconstruction, we compared two different methods: the standard PL which uses fixed hyperparameter values for all pixels in the entire image, and the ity-driven PL (SDPL) method, which employs our proposed method of parame tuning on a per-pixel basis. To ensure convergence, we used 4 subsets and 80 it which effectively corresponds to 320 iterations for a single subset. To assess the e ness of the SDPL algorithm across diverse hyperparameter configurations, we em two distinct (high and low) levels of initial parameter values for both the smoot rameter λ and the control parameter δ (or σ). Note that our approach can seamles grate with a wide range of existing parameter-tuning methods, thereby elimina need for a specific criterion in selecting the initial parameter values. Figure 6 shows the anecdotal PL and SDPL reconstructions using the LN function. The figure comprises four groups of results, each corresponding to a parameter setting. Specifically, Figure 6a Figure 6 clearly reveals that the SDPL method better preserves fine details than the standard PL method.
To elaborate further, when both λ and δ are excessively large (Figure 6a-d), the PL result in Figure 6a appears over-smoothed, whereas the SDPL results in Figure 6b-d exhibit enhanced detail. By reducing the value of δ while keeping λ fixed, the SDPL result in Figure 6e becomes sharper than its PL counterpart in Figure 6a. Similarly, the SDPL results in Figure 6f-h, like those in Figure 6b-d, demonstrate superior preservation of fine details compared to the result in Figure 6e. Based on the observations from Figure 6a-h, we tentatively conclude that the SDPL method effectively mitigates the over-smoothing issue of the PL method for relatively high λ values. As expected, when the smoothing parameter λ is decreased, the results become sharper and exhibit more details. However, even in this case, the SDPL method further enhances reconstruction accuracy by better preserving fine details, as evident in Figure 6i-p. In an extreme case, where the values of both λ and δ are very small, the results become noisy, a phenomenon that is not specific to the SDPL method but holds true for any regularization method. In conclusion, the SDPL method surpasses the standard PL method in effectively preserving fine details when the hyperparameters are chosen to be sufficiently large, ensuring effective noise suppression.
To evaluate and compare, in an ensemble sense, the quantitative performance of the reconstruction algorithms with the parameter settings used for Figure 6, we generated 50 independent noise realizations of projection data for the phantom shown in Figure 5a. 6m-p with low λ and low δ. Within each row, the reconstruction methods are displayed from left to right as PL-LN and SDPL-LN (GR, SD, and PS), respectively. A qualitative comparison of the results in Figure 6 clearly reveals that the SDPL method better preserves fine details than the standard PL method. To elaborate further, when both λ and δ are excessively large (Figure 6a-d), the PL result in Figure 6a appears over-smoothed, whereas the SDPL results in Figure 6b-d exhibit enhanced detail. By reducing the value of δ while keeping λ fixed, the SDPL result in Figure 6e becomes sharper than its PL counterpart in Figure 6a. Similarly, the SDPL results in Figure 6f-h, like those in Figure 6b-d, demonstrate superior preservation of fine details compared to the result in Figure 6e. Based on the observations from Figure 6a-h, we tentatively conclude that the SDPL method effectively mitigates the over-smoothing issue of the PL method for relatively high λ values. As expected, when the smoothing parameter λ is decreased, the results become sharper and exhibit more details. However, even in this case, the SDPL method further enhances reconstruction accuracy by better preserving fine details, as evident in Figure 6i-p. In an extreme case, where the values of both λ and δ are very small, the results become noisy, a phenomenon that is not specific to the SDPL method but holds true for any regularization method. In conclusion, the SDPL method surpasses the standard PL method in effectively preserving fine details when the hyperparameters are chosen to be sufficiently large, ensuring effective noise suppression.   Table 1 presents a quantitative performance comparison between the PL-LN and SDPL-LN in terms of six different image quality assessments (IQAs): peak signal-to-noise ratio (PSNR); structural similarity (SSIM); visual information fidelity (VIF); mean absolute error (MAE); root-mean-square error (RMSE); and mean percentage error (MPE). All IQA metrics used in this work were evaluated from 50 independent Poisson noise trials. For example, the MPE is defined as wheref k j is the j-th pixel value of the reconstructed image for the k-th noise trial, f j is the j-th pixel value of the noiseless phantom, and K = 50 is the total number of noise trials. The PSNR [46] measures the ratio between the maximum possible peak of the signal and the noise. The SSIM [46,47] measures the similarity between the reconstructed image and the phantom. The VIF [48] evaluates the image quality based on the natural scene statistics and the image notion extracted by the human visual system. The MAE [49] calculates the mean absolute error between the reconstructed image and the phantom. In Table 1, the best results, obtained from the SDPL-LN method using the SD roughness measure, are highlighted in bold.  Figure 7 visualizes the quantitative results for the six IQAs presented in Table 1 through bar graphs, with each IQA depicted individually. The abscissa indexes the group number (1 to 4) for parameter settings (two distinct levels of initial parameter values for both λ and δ) used in Table 1. It is evident that the SDPL-LN methods clearly outperform the PL-LN method in all IQAs.  Table 1 through bar graphs, with each IQA depicted individually. The abscissa indexes the group number (1 to 4) for parameter settings (two distinct levels of initial parameter values for both λ and δ) used in Table 1. It is evident that the SDPL-LN methods clearly outperform the PL-LN method in all IQAs.  Figure 8 presents the anecdotal reconstructions using the HB penalty function, following the same layout as Figure 6 for the LN penalty function. Similar to the findings in Figure 6, the SDPL reconstructions consistently exhibit superior preservation of details compared to the standard PL reconstructions across all hyperparameter settings.  Figure 8 presents the anecdotal reconstructions using the HB penalty function, following the same layout as Figure 6 for the LN penalty function. Similar to the findings in Figure 6, the SDPL reconstructions consistently exhibit superior preservation of details compared to the standard PL reconstructions across all hyperparameter settings.  Table 2 presents a performance comparison between the PL-HB and SDPL-HB methods based on six different IQAs. Again, the SDPL methods demonstrate the best outcomes. Although the best results are distributed across three different roughness measures, the differences among them are practically negligible. Figure 9 presents bar graphs visualizing the quantitative results in Table 2. The results clearly demonstrate that the SDPL-HB methods outperform the PL-HB method across all IQAs.   Table 2 presents a performance comparison between the PL-HB and SDPL-HB methods based on six different IQAs. Again, the SDPL methods demonstrate the best outcomes. Although the best results are distributed across three different roughness measures, the differences among them are practically negligible. Figure 9 presents bar graphs visualizing the quantitative results in Table 2. The results clearly demonstrate that the SDPL-HB methods outperform the PL-HB method across all IQAs.
To evaluate the regional performance of our method, we first selected regions of interest (ROIs) as shown in Figure 10, and performed regional studies using the PL-LN and SDPL-LN reconstructions obtained with the same initial values of λ and δ, respectively. Figure 11 shows the five zoomed-in rectangular regions R1-R5 in Figure 10a, where the images in Figure 11a are zoomed-in regions of the phantom, Figure 11b zoomed-in regions of PL-LN reconstructions, Figure 11c zoomed-in regions of SDPL-LN-GR reconstructions (with the GR roughness measure), Figure 11d zoomed-in regions of SDPL-LN-SD, and Figure 11e zoomed-in regions of SDPL-LN-PS. As already seen in Figure 6, the SPDL-based methods clearly outperform the standard PL method, which is also verified in terms of the regional MPEs represented by the bar graphs shown in Figure 12a.  To evaluate the regional performance of our method, we first selected regions of interest (ROIs) as shown in Figure 10, and performed regional studies using the PL-LN and SDPL-LN reconstructions obtained with the same initial values of λ and δ, respectively Figure 11 shows the five zoomed-in rectangular regions R1-R5 in Figure 10a, where the images in Figure 11a are zoomed-in regions of the phantom, Figure 11b Figure 10b shows the three circular ROIs and one circular background region used for calculating the contrast recovery coefficient (CRC). The CRC is a metric that evaluates how well the algorithm restores the contrast of an ROI with respect to its background. The regional CRC is defined as where CR R = Â R −Â Bg /Â Bg ,Â R = (1/T)∑ j∈Rfj denotes the mean activity in each ROI, A Bg is the mean activity in the background region, and CR R0 is the true contrast in the phantom. Note that the value of CRC indicates the performance of the algorithm, with higher values closer to one indicating better performance. Figure 12b presents the regional mean CRC (MCRC) calculated over K = 50 independent noise trials, which is defined as where CRC k R stands for the regional CRC calculated from the k-th noise realization. It is evident that the SDPL-based methods with GR and SD roughness measures remarkably outperform the standard PL method in terms of the MCRC in all three ROIs.

Qualitative Validation Using Physically Acquired Data
To observe qualitatively the efficacies of our SDPL methods, we acquired physical data using a GE Advance PET scanner, which contains 18 detector rings yielding 35 slices at 4.25 mm center-to-center slice separation. We acquired 2D data from the physical Hoffman brain phantom using the scanner's high sensitivity mode with septa in. The sinogram dimension was 145 detector pairs and 168 angles. The projection data were acquired for 10 min from an 18 FDG scan. The corresponding number of detected coincident counts was approximately 1,000,000. Figure 13 shows the typical noise level observed in the EM-ML reconstruction with 40 iterations, obtained from the physical PET data. Since there is no ground-truth data available for this experiment, the efficacies of using the COSEM-PL and COSEM-SPDL methods can be observed qualitatively by comparing their results with the EM-ML reconstruction. It is important to note that, compared to the reconstructions shown in Figures 6 and 8, which were obtained from the digital phantom, the resolution of the EM-ML reconstruction for the real PET data is significantly low, which may limit our qualitative observation of the efficacies of using the SPDL methods in the real data experiments. Figure 12b presents the regional mean CRC (MCRC) calculated over K = 50 independent noise trials, which is defined as where k R CRC stands for the regional CRC calculated from the k-th noise realization. It is evident that the SDPL-based methods with GR and SD roughness measures remarkably outperform the standard PL method in terms of the MCRC in all three ROIs.

Qualitative Validation Using Physically Acquired Data
To observe qualitatively the efficacies of our SDPL methods, we acquired physical data using a GE Advance PET scanner, which contains 18 detector rings yielding 35 slices at 4.25 mm center-to-center slice separation. We acquired 2D data from the physical Hoffman brain phantom using the scanner's high sensitivity mode with septa in. The sinogram dimension was 145 detector pairs and 168 angles. The projection data were acquired for 10 min from an 18 FDG scan. The corresponding number of detected coincident counts was approximately 1,000,000. Figure 13 shows the typical noise level observed in the EM-ML reconstruction with 40 iterations, obtained from the physical PET data. Since there is no ground-truth data available for this experiment, the efficacies of using the COSEM-PL and COSEM-SPDL methods can be observed qualitatively by comparing their results with the EM-ML reconstruction. It is important to note that, compared to the reconstructions shown in Figures 6 and 8, which were obtained from the digital phantom, the resolution of the EM-ML reconstruction for the real PET data is significantly low, which may limit our qualitative observation of the efficacies of using the SPDL methods in the real data experiments.   Figure 14a,b, Figure 14c,d, and Figure 14e,f, respectively. For the HB penalty, the smoothing parameter values were set to 20, 10, and 5 for Figure 14g,h, Figure 14i,j, and Figure 14k,l, respectively. For each   Figure 14g-l, reconstructed by COSEM-PL with the LN and HB penalty functions, respectively. For the LN penalty, the smoothing parameter values were set to 40, 20, and 10 for Figure 14a,b, Figure 14c,d, and Figure 14e,f, respectively. For the HB penalty, the smoothing parameter values were set to 20, 10, and 5 for Figure 14g,h, Figure 14i,j, and Figure 14k,l, respectively. For each value of λ, a value of δ (or σ) was chosen for the standard PL first, and it was used as an initial value of δ (or σ) for the SDPL. For each value of λ, a close inspection reveals that, as already observed in Figures 6 and 8 using the digital phantom, the SDPL method further improves the reconstruction of fine details. In fact, the visual improvement from the standard PL reconstruction to the SDPL reconstruction in Figure 14 is not as stunning as that in Figures 6 and 8. This is presumably due to the fact that the physical factors that affect the quality of reconstruction were not modeled in our reconstruction algorithms. While the attenuation correction was done by a conventional method that uses the ratio of the measurements in the blank and transmission scans, the factors to model scattered and random coincidences were not included in our reconstruction algorithms. In this case, the measurement is not strictly Poisson. (Our future work includes modeling the physical factors in the likelihood term and expanding accordingly the overall energy function described in (11)) .
Sensors 2023, 23, x FOR PEER REVIEW 17 of 20 attenuation correction was done by a conventional method that uses the ratio of the measurements in the blank and transmission scans, the factors to model scattered and random coincidences were not included in our reconstruction algorithms. In this case, the measurement is not strictly Poisson. (Our future work includes modeling the physical factors in the likelihood term and expanding accordingly the overall energy function described in (11).)

Summary and Conclusions
We have presented similarity-driven hyperparameter fine-tuning methods for penalized-likelihood image reconstruction in PET. Our proposed method aims to optimize the regularization parameter by leveraging similarity information between neighboring patches, leading to improved image quality and quantitative accuracy.
The experimental results obtained from the digital phantom studies demonstrated the effectiveness of the proposed method in achieving superior image reconstruction performance compared to the conventional PL method with fixed hyperparameters. By incorporating similarity information into the hyperparameter optimization process, the proposed method effectively balanced the trade-off between noise reduction and preservation of fine details, resulting in visually enhanced images with reduced noise. Our numerical studies supported the visual comparison by showing better quantitative performance of

Summary and Conclusions
We have presented similarity-driven hyperparameter fine-tuning methods for penalizedlikelihood image reconstruction in PET. Our proposed method aims to optimize the regularization parameter by leveraging similarity information between neighboring patches, leading to improved image quality and quantitative accuracy.
The experimental results obtained from the digital phantom studies demonstrated the effectiveness of the proposed method in achieving superior image reconstruction performance compared to the conventional PL method with fixed hyperparameters. By incorporating similarity information into the hyperparameter optimization process, the proposed method effectively balanced the trade-off between noise reduction and preservation of fine details, resulting in visually enhanced images with reduced noise. Our numerical studies supported the visual comparison by showing better quantitative performance of the proposed method across multiple image quality metrics. Finally, the additional results from the physical experiments using the real PET data also supported the good performance of the proposed method. However, to fully evaluate the clinical potential and generalizability of the proposed method, more comprehensive investigations that incorporate the physical factors, such as attenuation, scattered, and random coincidences, into reconstruction algorithms for real PET scans, are needed.
We acknowledge here that, besides the regularization approach employing CNQ penalties discussed in this study, there exist several other types of regularization methods used in PET reconstruction, which encompass total variation regularization [50][51][52], sparse coding-based regularization [53][54][55], and low-rank/sparse decomposition-based regularization [56]. These regularization methods also involve hyperparameters that significantly impact the quality of the reconstructed image. Since our proposed method specifically focuses on CNQ penalties, further investigation is required to determine the feasibility of integrating it with these diverse regularization methods.
We also note that, as the proposed method requires initially tuned hyperparameters for the entire image, it is not fully automated. For our future work, we would continue to seek a more advanced approach to optimizing the regularization parameter to fully automate the tuning process. One possible approach may be to use our method in conjunction with machine learning-based parameter tuning methods [21][22][23] so that the parameters initially tuned by machine learning for the entire image can be refined by our method for further improvements in reconstruction accuracy. However, we acknowledge the inherent challenge for machine learning methods to incorporate the additional control parameters responsible for adjusting edge reservation sensitivity by modifying the shape of the penalty function. Despite this challenge, we expect that our proposed method, in conjunction with more advanced machine learning-based approaches that can handle the control parameters, will substantially reduce the dependence on subjective trial-and-error hyperparameter tuning in regularized PET reconstruction.

Data Availability Statement:
The data presented in this study are available upon request. Please contact the corresponding author.