Image Compressive Sensing via Hybrid Nonlocal Sparsity Regularization

This paper focuses on image compressive sensing (CS). As the intrinsic properties of natural images, nonlocal self-similarity and sparse representation have been widely used in various image processing tasks. Most existing image CS methods apply either self-adaptive dictionary (e.g., principle component analysis (PCA) dictionary and singular value decomposition (SVD) dictionary) or fixed dictionary (e.g., discrete cosine transform (DCT), discrete wavelet transform (DWT), and Curvelet) as the sparse basis, while single dictionary could not fully explore the sparsity of images. In this paper, a Hybrid NonLocal Sparsity Regularization (HNLSR) is developed and applied to image compressive sensing. The proposed HNLSR measures nonlocal sparsity in 2D and 3D transform domain simultaneously, and both self-adaptive singular value decomposition (SVD) dictionary and fixed 3D transform are utilized. We use an efficient alternating minimization method to solve the optimization problem. Experimental results demonstrate that the proposed method outperforms existing methods in both objective evaluation and visual quality.


Introduction
As a joint framework of sampling and compression, compressive sensing (CS) [1,2] shows that if a signal is sparse in some domains, it can be perfectly reconstructed from fewer samples than Nyquist rate. This characteristic demonstrates its two great potentials in signal acquisition and processing. First, as the number of samples is greatly reduced, this make it possible for some devices with limited sensor size to obtain high definition information using low definition sensors. Figure 1 shows the architecture of the single-pixel camera [3]. With a sensor with only one pixel, this system can get a complete image. Second, the CS framework transfers the computational burden to the decoding side. For some energy limited applications, such as wireless sensor network, this advantage can greatly extend the life cycle of the nodes. As the encoding side is simplified, the performance of the system depends largely on the performance of the decoding side, namely, the "Recovery method" part in Figure 1. This paper focus on the recovery method of image CS. Due to the advantages mentioned above, CS have been applied in many fields, such as digital imaging [3], background subtraction [4], medical imaging [5], and remote sensing [6].
In the framework of compressive sensing, a one-dimensional sparse signal can be reconstructed by solving a L0-norm minimization problem. Since L0-norm minimization is non-convex and NP-hard, L0-norm is often replaced by L1-norm. It has been proved that these two norm are equivalent in most cases [2] and many CS recovery methods are proposed, such as iterative thresholding algorithm [7], orthogonal matching pursuit [8], and split Bregman algorithm [9].
For image compressive sensing, the key issue is how to exploit the intrinsic prior information of images. As the model of prior knowledge has a significant impact on the performance of image compressive sensing algorithms, many kinds of regularizations have been developed. Conventional regularization terms, such as Mumford-Shah (MS) model [10] and total variation (TV) [7,[11][12][13], are established under the assumption that images are locally smooth. For example, Li et al. [13] proposed a TV-based CS algorithm and developed an efficient augmented lagrangian method to solve it. Candes et al. [11] enhanced the sparsity of TV norm via a weighted strategy. However, these regularizations only consider local smoothness of images and cannot restore details and textures well. TV norm also favors piecewise constant solution, resulting in oversmoothing. To overcome this problem and improve performance, many compressive sensing methods utilized the prior information of transform coefficients [14][15][16]. Kim et al. [15] modeled the statistical characteristics between transform coefficients with a Gaussian Scale Mixture (GSM) and achieved better reconstruction performance. In the past few years, sparse representation has begun to emerge and demonstrated good performance in various image processing tasks [17][18][19][20][21]. The purpose of sparse representation is to represent a signal with as few atoms as possible in a learned over-complete dictionary. Compared with fixed dictionary, the learned dictionary can better express sparsity of images. However, dictionaries are generally learned from external clean images, and it may suffer from high computational complexity.
Recently, inspired by nonlocal means (NLM) [22], many algorithms based on nonlocal self-similarity have been proposed [23][24][25][26][27][28][29]. Dabov et al. proposed a Block-Matching and 3D filtering (BM3D) algorithm for image denoising [23]. In BM3D, similar patches in a degraded image are grouped into 3D arrays and collaborative filtering is performed in 3D transform domain. Egiazarain et al. extended BM3D to compressive sensing and proposed BM3D-CS. Zhang et al. [26] proposed a structural group sparsity representation (SGSR) model to enforce image sparsity in an adaptive SVD domain. Dong et al. [28] proposed a nonlocal low-rank regularization (NLR) to exploit the self-similarity, and applied it to the reconstruction of photographic and MRI images. In [29], Zha et al. incorporated a non-convex penalty function to group sparse representation and obtained state-of-the-art reconstruction performance. Gao et al. [30] proposed to use Z-score standardization to improve the sparse representation ability of patch groups. Keshavarzian et al. [31] proposed to utilize the principle component analysis (PCA) to learn a dictionary for each group and introduced non-convex LLp-norm regularization to better promote the sparsity of the patch group coefficients. In [32], internal self-adaptive dictionary and external learned dictionary were used to encode a patch group alternately and achieved better performance than single dictionary.
Another idea is to exploit both local sparsity and nonlocal self-similarity [33][34][35][36][37]. For example, Zhang et al. [33] combined local anisotropic total variation with nonlocal 3D sparsity, and named it Collaborative Sparsity Measure (CoSM). Different from the work in [33], Eslahi et al. [37] used curvelet transform to enforce local patterns. In [34], Dong et al. utilized local patch-based sparsity and nonlocal self-similarity constrain to balance the trade-off between adaptation and robustness. Zhou, et al. [38] proposed a data-adaptive kernel regressor to extract local structure and used nonlocal means filter to enforce nonlocal information.
With the development of deep learning, many convolutional neural network (CNN) based image compressive sensing algorithms have been proposed. For example, Kulkarni et al. [39] proposed a non-iterative and parallelizable CNN architecture to get an initial recovery and fed it into an off-the-shelf denoiser to obtain the final image. Zhang et al. [40] cast the Iterative Shrinkage-Thresholding Algorithm (ISTA) into CNN framework and developed an effective strategy to solve it. In [41], low-rank tensor factor analysis was utilized to capture nonlocal correlation and a deep convolutional architecture was adopted to accelerate the matrix inversion in CS. DR 2 -Net [42] utilized a linear mapping to reconstruct a preliminary image and used residual learning to further promote the reconstruction quality. Yang et al. [43] unrolled the Alternating Direction Method Multipliers (ADMM) to be a deep architecture and proposed ADMM-CSNet. Zhang et al. [44] proposed a optimization-inspired explicable deep network OPINE-Net and all the parameters were learned end-to-end using back-propagation.
In this paper, we propose a Hybrid NonLocal Sparsity Regularization (HNLSR) for image compressive sensing. First, different from the methods mentioned above, two nonlocal self-similarity constrains are applied to exploit the intrinsic sparsity of images simultaneously. Then, fixed dictionaries are universal, and learned dictionaries are more robust to the image itself. To take advantages of them, both fixed 3D transform and 2D self-adaptive dictionary are utilized. Finally, for the non-convex model of HNLSR, we use the split Bregman to divide it into several subproblems, making it easier and more efficient to be solved. The flowchart is illustrated in Figure 2. Experimental results show that compared with both model-based algorithms and deep learning-based algorithms, the proposed HNLSR-CS demonstrates the superiority of its performance. The remainder of this paper is organized as follows. Section 2 introduces the related works. In Section 3, we present the proposed method. The experiment and analysis are elaborated in Section 4. Section 5 concludes the paper.

Compressive Sensing
For a n-dimension signal x ∈ R n , its CS measurements can be expressed as where y ∈ R m and Φ ∈ R m×n (m n). Φ is the measurement matrix which meets the restricted isometry property (RIP) [1]. If x is sparse in a transform domain Ψ ∈ R n×n , namely, x = Ψα, the reconstruction of x can be formulated aŝ where α 0 is the L0-norm that counts the nonzero elements in α.
The unconstrained Lagrangian form of Equation (2) iŝ where λ is the regularization parameter. After getting the solution of Equation (4), x can be restored bŷ For image compressive sensing, the optimization problem can be written aŝ where x stands for an image, Φ is the measurement matrix and R(x) is the regularization item which exploits the intrinsic prior information of images.

Sparse Representation and Group-Based Sparsity
For an image x ∈ R N , it can be divided into many overlapped patches. Suppose a patch x i of size √ n × √ n at location i, i = 1, 2, . . . , N, sparse representation means that this patch can be represented over a redundant dictionary D i Nonlocal self-similarity means that a patch has many similar patches in other positions [18,22,23]. We search its (m − 1) best matched patches and form them into a data matrix x G i ∈ R n×m , where each column of x G i denotes a similar patch, so we have where subscript G i is the number of the group, R G i is an operator that extract all the similar patches and x G i is a patch group. Given a proper dictionary D G i , this group can be expressed as where α G i is the sparse coefficient. After getting α G i , the whole image can be reconstructed via [45] x ≈ ( where 1 (n×m) is a matrix of size n × m with all the elements being 1. Equation (8) means that we can restore the image by putting patches back to their original locations and averaging them on a pixel-by-pixel basis.

Nonlocal Self-Similarity in 3D Transform Domain
Dabov et al. proposed the well-known BM3D [23] for image denoising and the self-similarity in 3D transform domain has attracted great attention since then [24,33,37]. For a patch x i of size √ n × √ n, after searching its (m − 1) similar patches, they are stacked into a 3D array Z of size √ n × √ n × m. Next, a 3D transform is performed to get the transform coefficients where T 3D (·) is a transform operator and Θ are coefficients. Since these coefficients are considered sparse, they are shrunken by some filters (e.g., soft-thresholding or hard-thresholding). Then, the sparse coefficients are inverted to generate the estimated group. These estimates are returned to their original positions. Nonlocal 3D sparsity can explore high degree sparsity of images, and can well preserve details and differences between patches.

Split Bregman Iteration
The split Bregman iteration (SBI) [9] was proposed to solve various optimization problem. Considering a constrained problem: This optimization problem can be efficiently solved by Algorithm 1. According to the SBI framework, as x and z have some relationship, the optimization problem can be split into two subproblem (namely, step 3 and step 4). The rationale behind is that in step 3 and step 4, only one variable is solved at a time, making it much easier than the original problem.

Hybrid Non-Local Sparsity Regularization (Hnlsr)
Integrating two kinds of different nonlocal regularizations, we propose a Hybrid Non-Local Sparsity Regularization (HNLSR), and it can be expressed as where α are the coefficients under certain 2D sparse dictionary and λ and τ are regularization parameters. Z(x) is the 3D form of x. The proposed regularization has two advantages: 1.
It constrains sparsity in both 2D and 3D domains, which means that it can better explore the intrinsic nonlocal similarity of images.

2.
We use a self-adaptive dictionary as the 2D sparse basis and a fixed 3D transform to measure sparsity in high-dimensional space. Two kinds of different dictionaries can improve the robustness of the regularization.
Next, we will apply the proposed HNLSR to image compressive sensing and show how to solve the optimization problem.

Image Cs Via Hnlsr
Incorporating Equation (11) into Equation (4), the proposed optimization problem for image CS is expressed asx where λ and τ are regularization parameters. We use the SBI framework to solve this optimization problem. Define H(x) = 1 2 y − Φx 2 2 and G(z) = λ α 0 + τ Θ 1 , so the corresponding K are sparse dictionaries. Invoking Line 3 in Algorithm 1, we obtain . Splitting the second term in Equation (13), we have Then we apply Line 4 and Equation (12) is transformed into Finally, b and c can be calculated by Therefore, the minimization problem of Equation (12) is divided into several subproblems and the solution to each subproblem will be discussed below.

x-Subproblem
Given α, Θ, b and c, Equation (14) is a convex quadratic function optimization problem and we can use gradient descent method to solve this problem efficiently where g (k) is the gradient direction of Equation (14) g and η (k) is the optimal step-size and calculated via The superscript k of g is omitted for conciseness.

z-Subproblem
Given x, b and c, z-subproblem Equation (15) can be divided into two formulas Let us define , where x (k+1) can be seen as the noisy observation of x (k+1) . Therefore, Equation (21) can be rewritten as As patch group is the basic unit of sparse coding, this problem can be split into divided into several subproblems. Moreover, for each subproblem, the coefficients of each group are the variables to be solved. Therefore, Equation (23) can be solved by where θ = λ is image patch group. D 2D m and α G M are corresponding dictionary and sparse coefficients. For every group, we adopt the singular value decomposition (SVD) to generate the 2D dictionary. Applying the SVD to a group x G m , we have where Σ G m is a diagonal matrix formed by the eigenvalues. Moreover, the dictionary is defined as Therefore, for every optimization problem in Equation (24), it has a close-form solution where hard(·) is hard thresholding function and stands for the element-wise product operator. Similar to the α-subproblem, we define x (k+1) = x (k+1) − c (k) and consider the fact that the probability of every overlapped image patch appearing is equal, we can solve Equation (22) is a 3D patch array. This problem can be seen as a filtering problem in transform domain. Invoking the Bayesian framework [21], the maximum a posterior (MAP) estimation of Θ G m with Assuming that x (k+1) G m is disturbed by Gaussian noise with standard deviation σ n and Θ G m follows i.i.d Laplacian distribution where σ i is the standard deviations of Θ i G m . Substituting Equation (30) into Equation (29), we can obtain arg min From the above analysis, we can know that τ and Equation (31) can be solved by soft thresholding function The proposed method for image compressive sensing is summarized in Algorithm 2.

Implementation Details
This section presents the performance of the proposed HNLSR methods. In our experiment, eight commonly used images are used to test the reconstruction performance of the algorithms (shown in Figure 3). The size of them is 256 × 256. In the measurement phase, a image is divided into blocks of size 32 × 32 and Gaussian matrix is applied to generate measurements for each block. In the reconstruction phase, the size of overlapping patches is 8 × 8.
Step size, i.e., the distance between two image patches in the horizontal or vertical direction, is set as 4. For every image patch, we search its 59 similar patches in a 20 × 20 window. µ 1 and µ 2 are set to (0.0025, 0.0025), (0.0025, 0.00025), and (0.0025, 0.0001) when the sampling rates are 0.1, 0.2, and 0.3, respectively. The 3D dictionary is composed of 2D DCT and 1D Haar wavelet. Maximum iteration number is 120. We use peak signal-to-noise ratio (PSNR)and feature similarity (FSIM) [46] as the performance evaluation indices. All experiments are performed in Matlab R2017a on computer with Intel Core i5-6500 CPU of 3.2 Ghz, 8 GB memory, and Windows 10 operating system.

Comparison with State-of-the-Art Methods
We compare our method with six representative methods: MH-BCS [47], RCoS [33], ALSB [27], GSR [45], JASR [37], and GSR-NCR [29]. MH-BCS uses residual in the measurement domain and multihypothesis predictions to improve reconstruction quality; RCoS utilizes nonlocal 3D sparsity and local 2D sparsity (namely, total variation (TV)) to explore the intrinsic of images; ALSB is a patch-based sparse representation method; JASR employs discrete curvelet transform (DCuT) to constrain local sparsity and combines it with nonlocal 3D sparsity; GSR is an extended version of SGSR [26]. Both GSR and GSR-NCR are group-based method, and their difference is GSR uses L0-norm to constrain the sparse coefficients, while GSR-NCR uses non-convex Lp-norm. GSR and GSR-NCR are known as the stat-of-the-art methods. The PSNR and FSIM results are shown in Tables 1 and 2 respectively, and the best result for each sampling rate is marked in bold.
We can see that compared with MH-BCS, methods based on non-local self-similarity have obvious advantages in performance. As a patch-based algorithm, ALSB is inferior to other methods in most cases. JASR performs better than RCoS since DCuT is better than TV in depicting local characteristics. Compared with methods using fixed dictionaries (namely RCoS and JASR), methods using self-adaptive dictionaries have better performance in general. The proposed method combines fixed dictionary with self-adaptive dictionary and get the best performance in most cases. Some visual comparisons are illustrated in Figures 4-7. In Figure 4, it is obvious that MH-BCS generates the worst result. ALSB, GSR, and GSR-NCR suffer from some artifacts in the water surface area. RCoS and JASR have better results, but the edge of the tripod is a little blurry. In Figure 5, other methods produce some undesirable traces in the blank area, and the proposed method is not only pure in the blank area, but also has relatively sharp leaf edges. MH-BCS, RCoS, and ALSB produce some unexpected noise in the white area around the eyes in Figure 6, and the pattern around the eyes of the proposed method is the clearest. It is evident that in terms of visual quality, the proposed method outperforms other methods.    We also compare the HNLSR-CS with three representative deep learning methods: ReconNet [39], ISTA-Net + [40], and DR 2 -Net [42]. We use pretrained models for testing and the PSNR and FSIM results are reported in Tables 3 and 4. The best results are highlighted in bold. The proposed method obtains the best result in most cases.  Some visual comparisons are shown in Figures 8 and 9. In Figure 8, ReconNet, ISTA-Net + , and DR 2 -Net all suffer from block effects, and the proposed method has the best details. In Figure 9, ReconNet, and DR 2 -Net still have some block artifacts; ISTA-Net + has the best PSNR, but it produces some undesirable artifacts, resulting in worse FSIM than ours. These results also prove the superiority of the proposed method.

Effect of Parameters of Similar Patches
In this section, we discuss how the parameters of similar patches affects the performance of the method. With other variables fixed, we change the number of similar patches at intervals of 10 between 30 and 90. The comparisons are shown in Figure 10. We can see from the figure that all three curves are relatively stable, which means that the performance is not sensitive to the number of image patches. Considering the performance and complexity of the method, we set the number of similar patches to 60.

Convergence
As Equation (12) is non-convex, it is difficult to give a theoretical proof of the convergence of the proposed method, so we only show its stability through empirical evidence. Figure 11 shows the curve of PSNR versus iteration number of four images at the sampling rate of 0.2 and 0.3, respectively. We can see from the figure that with the iteration number increases, PSNR changes drastically at the beginning, and then gradually become stable. This illustrates the good convergence performance of the proposed method.

Conclusions and Future Work
This paper proposes a Hybrid Nonlocal Sparsity Regularization (HNLSR) method for image compressive sensing. Different from existing methods, the proposed HNLSR does not consider the local sparsity of images, but uses two dictionaries to explore the nonlocal self-similarity. The 2D dictionary is self-generated and the 3D dictionary is a fixed dictionary, which can combine the advantages of adaptability and versatility from different dictionaries. An effective framework based on SBI is present to solve the optimization problem. The convergence and stability of the proposed method have also been proven. Experimental results show that compared with methods which are based on local and nonlocal regularizations or single nonlocal regularization, the proposed method performs better than most existing image compressive sensing methods in both quality assessment and visual quality.
As multiple dictionaries can improve the performance, we are considering some research directions.
For example, learning different dictionaries for different areas of the images (e.g., smooth area and textured area). Another direction is to learn multi-scale dictionaries and select them adaptively according to the parameters. Our future work include extending the proposed method to other image processing tasks (e.g., denoising, deblocking, and deblurring) and high-dimensional data (e.g., videos and multispectral images). For high-dimensional or multi-frame data, how to collect similar patches (intra-or inter-frame) is also a problem to be solved.