A General Rate-Distortion Optimization Method for Block Compressed Sensing of Images

Block compressed sensing (BCS) is a promising technology for image sampling and compression for resource-constrained applications, but it needs to balance the sampling rate and quantization bit-depth for a bit-rate constraint. In this paper, we summarize the commonly used CS quantization frameworks into a unified framework, and a new bit-rate model and a model of the optimal bit-depth are proposed for the unified CS framework. The proposed bit-rate model reveals the relationship between the bit-rate, sampling rate, and bit-depth based on the information entropy of generalized Gaussian distribution. The optimal bit-depth model can predict the optimal bit-depth of CS measurements at a given bit-rate. Then, we propose a general algorithm for choosing sampling rate and bit-depth based on the proposed models. Experimental results show that the proposed algorithm achieves near-optimal rate-distortion performance for the uniform quantization framework and predictive quantization framework in BCS.


Introduction
Compressed sensing (CS) is a signal acquisition framework [1][2][3] that acquires the signal's measurements by linear projection at the sub-Nyquist rate. Unlike traditional image coding methods with high computational complexity, the CS-based image coding methods are suitable for resource-constrained application scenarios through simultaneous data acquisition and compression [4][5][6][7].
When CS is applied to an image, the large measurement matrix will cause enormous computation and memory space for the codec. Gan [8] has proposed a block compressed sensing (BCS) method to decrease the measurement matrix's size for images. BCS uses the same measurement matrix to measure the image block's raster scan vector, significantly reducing the sensor's calculation and transmission cost [9]. BCS processes each image block independently and supports parallel encoding, which can quickly obtain the image measurements. However, the real-valued CS measurements need to be combined with quantization and entropy encoder to output bitstreams for transmission or storage [10].
Although the uniform scalar quantization (SQ) is the most straightforward solution for quantizing CS measurements, it is inefficient in rate-distortion performance [6,11]. Therefore, some researchers have proposed different quantization schemes of CS measurements to enhance the rate-distortion performance. For example, Mun et al. [12] have combined the differential pulse-code modulation (DPCM) with uniform scalar quantization (DPCM-plus-SQ) for BCS measurements. The CS-based imaging system with DPCM-plus-SQ and the smoothed projected Landweber (SPL) reconstruction can compete with JPEG in some cases. Wang et al. [13] have proposed a progressive quantization framework of CS measurements, which is slightly better than JPEG in rate-distortion performance. Chen et al. [14] have proposed a progressive non-uniform quantization framework of CS measurements using partial Hadamard matrix together with patch-based recovery algorithm, which can reach the rate-distortion performance of CCSDS-IDC (consultative committee for space data

Problem Statement
CS theory states that a sparse signal can be recovered through its measurements obtained by linear random projection. Many natural images have a sparse representation in a wavelet transform domain or discrete cosine transform domain [24,25], so they can be acquired by CS. Suppose x ∈ R N×1 denotes a raster-scanned vector of an image block. The CS measurements vector y ∈ R M×1 of x can be acquired by the following expression: where Φ ∈ R M×N (M N) is a measurement matrix, and the sampling rate or measurement rate is m = M/N.
Since CS measurements are real, they need to be discretized by the quantization before entropy encoding. Based on the most commonly used quantization schemes of CS measurements, the CS sampling model with quantization can be unified into the following expression: where y Q is the quantized measurement vector, and it also stands for the input of entropy encoder. Q b : R → Q denotes a uniform SQ operation for b-bit (applied element-wise in (2)), which maps f (y) to the discrete alphabet Q with |Q| = 2 b . In this paper, we define is the uniform quantization step size, f max (y) and f min (y) represent the maximum and minimum of f (y), respectively. f (·) represents a reversible transform, which is used to change the distribution type of y. When the CS measurements are quantized by uniform SQ, f (·) is an identity transformation. When the CS measurements are quantized by A-law or µ-Law non-uniform quantization, f (·) is the law function [26]. When the CS measurements are quantized by prediction with uniform SQ, f (·) is the prediction function [12,17]. For example, in the DPCM-plus-SQ framework, f (y (j) ) = y (j+1) − y (j) , where y (j) represents the measurement vector of the j-th image block. The progressive quantization methods [13,14] are also prediction frameworks combined with uniform SQ. In the progressive quantization method, the CS measurements are divided into a basic layer and refinement layer for transmission after uniform SQ quantization with B bit. In the basic layer, all B significant bits of the quantization indexes are transmitted, so the prediction function is equivalent to the identity transformation. In the refinement layer, the least B 1 < B significant bits of the quantization index are transmitted, so the dropped highest B-B 1 bit is equivalent to the predicted value, and the retained B 1 least significant bits are equivalent to the prediction residual.
The CS-based image coding system is composed of CS sampling, quantization, and entropy encoder [15]. The bitstream of the encoded image is used for transmission or storage. The decoder restores the bitstream to an image through the corresponding entropy decoder, dequantization, and CS reconstruction algorithm. Figure 1 shows the flow chart of the CS-based imaging system [10].
ing expression: where Q y is the quantized measurement vector, and it also stands for the input of entropy encoder. b Q → ：  denotes a uniform SQ operation for b-bit (applied elementwise in (2)), which maps ( ) f y to the discrete alphabet  with . In this paper, tively. ( ) f  represents a reversible transform, which is used to change the distribution type of y . When the CS measurements are quantized by uniform SQ, ( ) f  is an identity transformation. When the CS measurements are quantized by A-law or μ-Law non-uniform quantization, ( ) f  is the law function [26]. When the CS measurements are quantized by prediction with uniform SQ, ( ) f  is the prediction function [12,17]. For example, in the DPCM-plus-SQ framework, y represents the measurement vector of the j-th image block. The progressive quantization methods [13,14] are also prediction frameworks combined with uniform SQ. In the progressive quantization method, the CS measurements are divided into a basic layer and refinement layer for transmission after uniform SQ quantization with B bit. In the basic layer, all B significant bits of the quantization indexes are transmitted, so the prediction function is equivalent to the identity transformation. In the refinement layer, the least B1<B significant bits of the quantization index are transmitted, so the dropped highest B-B1 bit is equivalent to the predicted value, and the retained B1 least significant bits are equivalent to the prediction residual.
The CS-based image coding system is composed of CS sampling, quantization, and entropy encoder [15]. The bitstream of the encoded image is used for transmission or storage. The decoder restores the bitstream to an image through the corresponding entropy decoder, dequantization, and CS reconstruction algorithm. Figure 1 shows the flow chart of the CS-based imaging system [10]. The average number of bits per pixel [21] of the encoded image can be calculated by the following expression: where L is the average codeword length of the quantized CS measurements Q y after entropy encoding. There is a positive correlation between average codeword length and quantization bit-depth. When the bit-rate is constrained, sampling rate and quantization bit-depth have The average number of bits per pixel [21] of the encoded image can be calculated by the following expression: where L is the average codeword length of the quantized CS measurements y Q after entropy encoding. There is a positive correlation between average codeword length and quantization bit-depth. When the bit-rate is constrained, sampling rate and quantization bit-depth have a competitive relationship with each other. We can minimize the distortion to optimize the sampling rate and bit-depth for a given bit-rate R goal , i.e., where R(m, b, X) and D(m, b, X), respectively, represent bit-rate and distortion of the image X at the sampling rate m and the bit-depth b. The bit-rate R(m, b, X) is the average number of bits per pixel of the encoded image, which can be obtained according to (3). Distortion refers to the dissimilarity between the reconstructed imageX and the original image X. The distortion measures mainly include the mean square error (MSE), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM) [27]. The PSNR between the reconstructed imageX and the original image X is used as a measure of distortion in our paper. The mathematical definition of PSNR is PSNR = 10 × log 10 255 2 /MSE(X,X) , where MSE(X,X) is the mean square error between the reconstructed imageX and the original image X. The calculation of distortion and bit rate depends on the original image and decoded image, and the cost of obtaining decoded image is very expensive.
To avoid calculating the bit-rates and distortions, we first propose a new bit-rate model and an optimal bit-depth model. Then, we propose a general method to optimize the sampling rate and bit-depth for CS-based image coding. Figure 2 is the CS-based encoding system with RDO [21,23]. Our CS framework contains two CS processes. The first one is partial sampling, which aims to extract image features by a few CS measurements for RDO. The second one is to increase the number of CS measurements to achieve optimal sampling and compression by using the sampling rate optimized by RDO. image X at the sampling rate m and the bit-depth b . The bit-rate ( , , ) R m b X is the average number of bits per pixel of the encoded image, which can be obtained according to (3). Distortion refers to the dissimilarity between the reconstructed image X and the original image X . The distortion measures mainly include the mean square error (MSE), the peak signal-to-noise ratio (PSNR), and the structural similarity index measure (SSIM) [27]. The PSNR between the reconstructed image X and the original image X is used as a measure of distortion in our paper. The mathematical definition of PSNR is ( ) MSE X X is the mean square error between the reconstructed image X and the original image X . The calculation of distortion and bit rate depends on the original image and decoded image, and the cost of obtaining decoded image is very expensive.
To avoid calculating the bit-rates and distortions, we first propose a new bit-rate model and an optimal bit-depth model. Then, we propose a general method to optimize the sampling rate and bit-depth for CS-based image coding. Figure 2 is the CS-based encoding system with RDO [21,23]. Our CS framework contains two CS processes. The first one is partial sampling, which aims to extract image features by a few CS measurements for RDO. The second one is to increase the number of CS measurements to achieve optimal sampling and compression by using the sampling rate optimized by RDO.

Image
Partial sampling Rate-distortion optimization for sampling rate and bit-depth

Bit-Rate Model
Based on (3), the bit-rate R depends on the average codeword length of the quantized CS measurements Q y after entropy encoding. The average codeword length can be approximated by information entropy of Q y before entropy encoding [28]. The information entropy is closely related to the distribution characteristics of the CS measurements, so we extract the distribution characteristics from the CS measurements of the first sampling to estimate the information entropy. However, the information entropy is only the lower boundary of the average codeword length. There is an error between the average codeword length and the information entropy estimated by a few measurements. Therefore, we modify the coefficients of the information entropy estimation model by fitting the offline data of the average codeword length and then take it as the average codeword length model.

Generalized Gaussian Distribution Model of the Quantized CS Measurements
According to (2), the quantized CS measurements can be considered to be obtained by ( ) Q  does not change the distribution type, while ( ) f  determines how to change the distribution type of CS measurements. The CS measurements using random Gaussian matrix obey Gaussian distribution [13]. When the structurally

Bit-Rate Model
Based on (3), the bit-rate R depends on the average codeword length of the quantized CS measurements y Q after entropy encoding. The average codeword length can be approximated by information entropy of y Q before entropy encoding [28]. The information entropy is closely related to the distribution characteristics of the CS measurements, so we extract the distribution characteristics from the CS measurements of the first sampling to estimate the information entropy. However, the information entropy is only the lower boundary of the average codeword length. There is an error between the average codeword length and the information entropy estimated by a few measurements. Therefore, we modify the coefficients of the information entropy estimation model by fitting the offline data of the average codeword length and then take it as the average codeword length model.

Generalized Gaussian Distribution Model of the Quantized CS Measurements
According to (2), the quantized CS measurements can be considered to be obtained by f (·) and Q b [·]. Q b [·] does not change the distribution type, while f (·) determines how to change the distribution type of CS measurements. The CS measurements using random Gaussian matrix obey Gaussian distribution [13]. When the structurally random matrix (SRM) is used for CS measurement, the CS measurements corresponding to the first row of SRM are uniformly distributed, and the remaining CS measurements are Laplacian distributed with zero mean [10]. The distribution of DPCM predictive errors without conditioning on contexts is very close to a Laplace distribution [29]. The experiments of [30] show that the prediction errors of DPCM-plus-SQ satisfy the generalized Gaussian distribution. The Gauss distribution, the uniform distribution, and the Laplace distribution belong to the generalized Gauss distribution with specific shape parameters. In order to describe the distribution of CS measurements more generality, we use the generalized Gauss distribution to describe f (y) and y Q . The generalized Gaussian distribution density function with zero mean can be expressed as: where α(σ, β) = σ −1 Γ(3/β) 2Γ(1/β) . σ is the standard deviation. β is the shape parameter, which determines the attenuation rate of the density function. ∞ 0 e −u u t−1 du. The Laplace distribution and Gaussian distribution correspond to generalized Gaussian distribution when β = 1 and β = 2, respectively. Based on the generalized Gaussian distribution, the information entropy [31] of f (y) can be estimated as where σ f and β are the standard deviation and distribution shape parameters of f (y), respectively. In Equation (2), y Q is the discretization of f (y) quantized by the quantization step size ∆, so the information entropy [31] of y Q can be estimated by:

Average Codeword Length Estimation Model
In Equation (7), σ f and β are keys to estimating information entropy H. However, σ f and β cannot be calculated directly because the CS measurements are unknown before the sampling rate and bit-depth are assigned. Since the number of CS measurements required for a high-quality reconstructed signal must satisfy a lower limit, the number of CS measurements used for compression will exceed the lower limit regardless of the goal bit-rate. Therefore, we can acquire a small number of CS measurements by the first sampling and then extract features for RDO.
The CS measurements with different sampling rates are subsets of the measurement population for the same image, so a small number of measurements can be used to estimate the features of measurements with a higher sampling rate. In this paper, m 0 represents the sampling rate of the first sampling.
The entropy-matching method is usually used to estimate the shape parameter of the generalized Gaussian distribution [31]. To simplify the estimation, we assume that there is a proportional relationship between − log 2 β[Γ(3/β)] where H 0 and β 0 represent the information entropy and shape parameter of f (y) at sampling rate m 0 and bit-depth b. c is an undetermined parameter. Combined with the Formula (8), the information entropy of y Q can be estimated by the following expression: where σ f 0 is the standard deviation of f (y) for measurements obtained by the first partial sampling, , y 0 is the measurement vector obtained by the first sampling. In statistical theory, the statistic s 2 = M M−1 σ 2 of the sample variance is an unbiased estimation of the population's variance. Since the CS measurements with different sampling Entropy 2021, 23, 1354 6 of 21 rates have the same population, we assume that the unbiased variance estimates of CS measurements at different measurement rates are approximately equal, that is: where M= round(mN 2 ), M 0 = round(m 0 N 2 ). The expression (10) can be converted into Then, we can obtain the following expression: Since −m 0 (m 0 N 2 −1) → −1 N 2 is very small, and the range of the sampling rate m is limited; the range of −m 0 (m 0 N 2 −1) × 1 m is also very limited. Therefore, we use a simple linear function to estimate it: where c and c are undetermined parameters. Moreover, we substitute the expression of quantization step into (11) to obtain the following expression: We used the maximum f max (y 0 ) and minimum f min (y 0 ) of the first sampled CS measurements to predict the maximum f max (y) and minimum f min (y) of the CS measurements with sampling rate m. Therefore, Based on (14), we replaced (1 − c), c , c, ( 1 4 − 1 4 c), (c − 1) and c + log 2 (2) with c 1 , c 2 , c 3 , c 4 , c 5 and c 6 , respectively, and establish a model for estimating the average codeword length as follows: To improve the estimation accuracy of the average codeword length, we utilize the model coefficients c 1 ∼c 6 learned from offline data. Combining (3) with (15), we can establish the bit-rate model, as follows:

Optimal Bit-Depth Model
If we first predict the optimal bit-depth b * , the sampling rate can be estimated based on the bit-rate model (16), i.e., where R goal is the target bit-rate, C = c 3 H 0 + c 4 log 2 σ 2 f 0 + c 5 log 2 ( f max (y 0 ) − f min (y 0 )) + c 6 represents the feature of X at bit-depth b * . In this section, we propose an optimal bit-depth model, which can directly predict the optimal bit-depth for a given bit-rate.

Function Mapping Relationship between Optimal Bit-Depth and Bit-Rate
Chen et al. [15] tested the reconstruction performance of some images at different quantization bit-depths. They show that low quantization bit-depths can reconstruct high PSNRs at a low bit-rate, and the high quantization bit-depths can reconstruct high PSNRs at high bit-rate. However, they only give the fixed selection of quantization bit-depths for some bit-rates of all images, and do not give a method for selecting the optimal bit-depth. To find the relationship between the different quantization bit-depths and the PSNRs, we simulated eight test images, as shown in Figure 3. We obtain the optimal bit-depths of eight testing images by traversing different sampling rates (m ∈ [0.05, 0.06, . . . Bit-rate (bpp) The optimal bit-depths The optimal bit-depths The optimal bit-depths The optimal bit-depths The optimal bit-depths Bit-rate (bpp) The optimal bit-depths The optimal bit-depths The optimal bit-depths The optimal bit-depths The optimal bit-depths It can be found from Figures 4 and 5 that the rate-distortion performance of the DPCM-plus-SQ framework (represents the CS-based coding system with DPCM-plus-SQ) is better than that of the uniform SQ framework (represents the CS-based coding system with uniform SQ), which indicates that the quantization scheme has a significant influence on the rate-distortion performance. However, the current rate-distortion optimization methods for CS are only suitable for a single uniform SQ framework. As far as we know, little attention has been paid to study the rate-distortion optimization method suitable for the prediction framework.
Although the optimal bit-depth of different quantization frameworks is different, Figures 4 and 5 have the following common characteristics: (1) low bit-depths have high PSNRs at low bit-rates, and high bit-depths have high PSNRs at high bit-rates. (2) The optimal bit-depth of almost all images is 4 when the bit-rate is around 0.1 bpp. (3) With the increase of bit-rate, the optimal bit-depth shows a nondecreasing trend. (4) The optimal bit-depth is the same in a bit-rate range, but the range is different for different images. There is a functional relationship between the optimal bit-depth and the bit-rate, which can be expressed as: where [ ]  represents the rounding operation, and ( ) g R represents a continuous function of the bit-rate. Since the optimal bit-depth increases with the increases of bit-rate, the Bit-rate (bpp) The optimal bit-depths Bit-rate (bpp) The optimal bit-depths Bit-rate (bpp) The optimal bit-depths Bit-rate (bpp) The optimal bit-depths Bit-rate (bpp) The optimal bit-depths Bit-rate (bpp) The optimal bit-depths It can be found from Figures 4 and 5 that the rate-distortion performance of the DPCM-plus-SQ framework (represents the CS-based coding system with DPCM-plus-SQ) is better than that of the uniform SQ framework (represents the CS-based coding system with uniform SQ), which indicates that the quantization scheme has a significant influence on the rate-distortion performance. However, the current rate-distortion optimization methods for CS are only suitable for a single uniform SQ framework. As far as we know, little attention has been paid to study the rate-distortion optimization method suitable for the prediction framework.
Although the optimal bit-depth of different quantization frameworks is different, Figures 4 and 5 have the following common characteristics: (1) low bit-depths have high PSNRs at low bit-rates, and high bit-depths have high PSNRs at high bit-rates. (2) The optimal bit-depth of almost all images is 4 when the bit-rate is around 0.1 bpp. (3) With the increase of bit-rate, the optimal bit-depth shows a nondecreasing trend. (4) The optimal bit-depth is the same in a bit-rate range, but the range is different for different images. There is a functional relationship between the optimal bit-depth and the bit-rate, which can be expressed as: where r 1 ∼r 6 are the endpoints of the bit-rate ranges. It can be found that the bit-rate range increases with the increases of b best . The model (18) is equivalent to the following model: where [·] represents the rounding operation, and g(R) represents a continuous function of the bit-rate. Since the optimal bit-depth increases with the increases of bit-rate, the firstorder derivative of g(R) is required to be no less than 0. The increasing rate of the optimal Entropy 2021, 23, 1354 9 of 21 bit-depth becomes slower with the increase of bit-rate, so the second-order derivative of g(R) is required to be less than 0, that is, Based on the above discussion, we set g(R) = k 1 ln(R) + k 2 . The model of the optimal bit-depth is established as follows: where k 1 and k 2 are the model parameters, which are learned by a neural network in the Section 4.2. In order to collect offline data samples of k 1 and k 2 for the proposed neural network training, we establish the following optimization problem: where i is the sample index of the offline data. b (i) best represents the actual value of the optimal bit depth of the i-th sample. ω i represents the weight, which is the difference between the PSNR quantized with b (i) best and the PSNR quantized with g(R (i) ) at the same bit-rate. In order to obtain the PSNR at the same bit rate, we perform linear interpolation on the sample data. The regularization term ∑ i b (i) best − g(R (i) ) 2 2 guarantees the uniqueness of the solution. λ is a constant coefficient, which takes 0.01 in this work. We take q = 10, which avoids an error of more than 2 bits between the predicted value and the actual value.
In (22), the first item ensures the accuracy of the optimal bit-depth model, and the second item ensures the uniqueness of the model coefficient. Since it is difficult to deal with the gradient of the rounding operation, (22) cannot be solved by the traditional gradientbased optimization method. We use the particle swarm optimization algorithm [32,33] to optimize the problem (22). The number of particle swarm is 100 and iterated 300 times. In each iteration, 30 particle swarms in the population are randomly generated in the [−0.5, 0.5] range of the optimal point. Figures 6 and 7 show the fitted results of the model (21) for the uniform SQ framework and DPCM-plus-SQ framework, respectively. It can be seen that the fitted bit-depths are in good agreement with the actual bit-depths. The errors between the predicted value and the actual value are only one bit at most. The errors of one bit are mainly concentrated between the two adjacent optimal bit-depths, which has little difference on the PSNR for the two bit-depths.

Model Parameter Estimation Based on Neural Network
It is challenging to design a function for estimating the model parameters accurately. Therefore, we use a four-layer feed-forward neural network [34,35] to learn the mapping relationship between the model parameters and image features rather than designing the function relationship by hand [36,37]. We can imagine that the model (21) would be beneficial if the model parameters could be predicted based on some content features derived from the compressed sampled image. As model (21) is closely related to the bit-rate, we directly use the image features in the proposed bit-rate model as the characteristics of estimating the parameters. The image features of the proposed bit-rate model are σ 0 2 , H 0 , f max (y 0 ) f min (y 0 ). A finite set of real numbers usually needs to be quantized before calculating the information entropy. The optimal bit-depth of many images is low when the bit-rate is low, so we choose the information entropy H 0,bit=4 with a quantization bit-depth of 4 as a feature. Since the CS measurement of the image is sampled block by block, we take the image block as the video frame and design two image features according to the video features in reference [23]. For example, block difference (BD): the mean (and standard deviation) of the difference between the measurements of adjacent blocks, i.e., µ BD and σ BD . We also take the mean of measurements y 0 as a feature.

Model Parameter Estimation Based on Neural Network
It is challenging to design a function for estimating the model parameters accurately. Therefore, we use a four-layer feed-forward neural network [34,35] to learn the mapping relationship between the model parameters and image features rather than designing the function relationship by hand [36,37]. We can imagine that the model (21) would be The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit)  The optimal bit-depth (bit)

Actual value Predicted value g(R)
The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit)

Model Parameter Estimation Based on Neural Network
It is challenging to design a function for estimating the model parameters accurately. Therefore, we use a four-layer feed-forward neural network [34,35] to learn the mapping relationship between the model parameters and image features rather than designing the function relationship by hand [36,37]. We can imagine that the model (21) would be The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit)

Actual value Predicted value g(R)
The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) The optimal bit-depth (bit) We designed a network including an input layer of seven neurons and an output layer of two neurons to estimate the model parameters [k 1 , k 2 ], as shown in Formula (23) and Figure 8.
where g(v) is the sigmoid activation function, u j is the input variable vector at the j-th layer, F is the parameters vector [k 1 , k 2 ]. W j , d j are the network parameters learned from offline data. We take the mean square error (MSE) as the loss function.
the bit-rate is low, so we choose the information entropy 0, 4 bit H = with a quantization bitdepth of 4 as a feature. Since the CS measurement of the image is sampled block by block, we take the image block as the video frame and design two image features according to the video features in reference [23]. For example, block difference (BD): the mean (and standard deviation) of the difference between the measurements of adjacent blocks, i.e., BD μ and BD σ . We also take the mean of measurements 0 y as a feature.
We designed a network including an input layer of seven neurons and an output layer of two neurons to estimate the model parameters 1 2 [ , ] k k , as shown in Formula (23) and Figure 8.

Sampling Rate Modification
The model (16) obtains the model parameters by minimizing the mean square error of all training samples. Although the total error is the smallest, there are still some samples with significant errors. To prevent excessive errors in predicting sampling rate, we propose the average codeword length boundary and sampling rate boundary.

Average Codeword Length Boundary
When the optimal bit-depth is determined, the average codeword length usually decreases with the sampling rate increase. Although the average codeword length of different images varies with the sampling rate, the variation is finite. Therefore, we design an average codeword length boundary.

Sampling Rate Modification
The model (16) obtains the model parameters by minimizing the mean square error of all training samples. Although the total error is the smallest, there are still some samples with significant errors. To prevent excessive errors in predicting sampling rate, we propose the average codeword length boundary and sampling rate boundary.

Average Codeword Length Boundary
When the optimal bit-depth is determined, the average codeword length usually decreases with the sampling rate increase. Although the average codeword length of different images varies with the sampling rate, the variation is finite. Therefore, we design an average codeword length boundary.
As the information entropy H 0 is the input of the optimized sampling rate and is very close to the average codeword length L 0 with the sampling rate m 0 , we take H 0 as the reference of the average codeword length to estimate variation. The average codeword length variation is expressed as L − H 0 . We only take the bit-depth and sampling rate as factors for influencing the upper and lower bound. According to model (16), we establish the upper and lower bound model of the average codeword length variation as follows: where L u and L l describe the upper and lower bounds of average codeword length, respectively. a 1 ∼a 6 are the model coefficients fitted by offline samples. According to (17), we first estimate the sampling rate as The corresponding average codeword length is L = R goal /m (1) . Then, we calculate the upper L u = a 1 b * + a 2 /m + a 3 + H 0 and the lower bound L l = a 4 b * + a 5 /m + a 6 + H 0 based on (24). L > L u means that the sampling rate is too low; we should increase the sampling rate. So, we take the bit-rate model as R = mL u , the sampling rate is updated to m u = (R goal − a 2 )/(H 0 + a 1 b * + a 3 ); if L < L l , we take the bit-rate model as R = mL l , the sampling rate is updated to m l = (R goal − a 5 )/(H 0 + a 4 b * + a 6 ). It is summarized as follows:

Sampling Rate Boundary
The average codeword length boundary uses the information entropy of partial measurements to restrict the estimated value of the average codeword length, so as to modify a sampling rate that is too large or too small. To modify the sampling rate more directly, we establish a linear boundary model of the sampling rate for different bit-depths as follows: m u = a 7 R + a 8 m l = a 9 R + a 10 (27) where R is the bit-rate, a 7 ∼a 10 are the model coefficients fitted by offline samples. When the assigned sampling rate exceeds the boundaries in (27), it will be modified by the following expression:

Rate-Distortion Optimization Algorithm
Based on the proposed bit-rate model and the optimal bit-depth model, we propose an algorithm to assign the bit-depth and sampling rate for a given target bit-rate R goal , as follows.
(1) Partial sampling. The partial CS measurements are sampled with the sampling rate m 0 .
(4) Features extraction. The partial measurements are quantized with bit-depth b * , and then the information entropy H 0 is calculated.
(5) The optimal sampling rate prediction. The optimal sampling rate is estimated by Formula (25).

Computational Complexity Analysis
The extra calculation of the sampling rate and quantization bit-depth optimization comes from three processes, namely feature extraction, the optimal bit-depth prediction, and the sampling rate estimation.
In feature extraction, the extra calculation mainly comes from σ 2 0 , y 0 , f max (y 0 ), f min (y 0 ), µ BD , σ BD , H 0,bit=4 of the measurements with sampling rate m 0 . The number of measure-ments is m 0 × N 2 . We assume that the calculation of one subtraction is equivalent to that of one addition. The calculation of y 0 needs m 0 × N 2 − 1 additions and one multiplication. The calculation of σ 2 0 needs m 0 × N 2 × 2 − 1 additions and m 0 × N 2 +1 multiplications. f max (y 0 ) and f min (y 0 ) need m 0 × N 2 −1 × 2 judgment operations. The calculation of block errors needs m 0 × ( multiplications. The extra calculation of H 0,bit=4 comes from quantization with bit-depth 4, which requires m 0 × N 2 multiplications. The remaining calculation of H 0,bit=4 mainly comes from counting the number of symbols and calculating the entropy. The calculation of counting the number of symbols requires m 0 × N 2 judgments, m 0 × N 2 additions. As the maximum number of symbols is 2 4+1 + 1 = 33, the calculation of entropy needs 66 multiplications, 33 logarithms, and 33 additions at the most. In the optimal bit-depth prediction, the calculation mainly comes from the neural network model. There are seven neurons in the input layer, two neurons in the output layer, four neurons in the first hidden layer, and three neurons in the second hidden layer. When the activation function is not considered, the calculation of the network includes 7 × 4 + 4 × 3 + 3 × 2 = 46 multiplications and 6 × 4 + 4 + 3 × 3 + 3 + 2 × 2 + 2 = 46 additions. In the sampling rate estimation, the amount of calculation mainly comes from the calculation of (25) and (26).
Compared with the computations of the CS measurements, a fixed number of operations can be ignored. The extra calculation includes m 0 × N 2 × 8 additions, m 0 × N 2 × 3 multiplications, and m 0 × N 2 × 3 judgments. Assuming that two additions are needed for one judgment operation, the total amount of additional computation requires m 0 × N 2 × 14 additions and m 0 × N 2 × 3 multiplications.
The computations of all CS measurements requires m × N 2 × B 2 multiplications and m × N 2 × (B 2 − 1) additions. B is the size of the image block, which is at least 16. When B = 16, the optimization process needs to increase 3/B 2 × (m 0 /m) ≤ 3/256 ≈ 1.17% multiplications and 14/(B 2 − 1) × (m 0 /m) ≤ 14/255 ≈ 5.49% additions. In computer operations, the amount of calculation of addition is at least ten times faster than multiplication. The computations of rate-distortion optimization will not exceed 2% of the computations of the partial measurements. Furthermore, with the increase of image block size or sampling rate m, the percentage of computation in the optimization process will be further reduced.

Experimental Results
The proposed method is tested on some images for the DPCM-plus-SQ framework and uniform SQ framework, respectively. The model parameters are obtained by offline training some images of the BSDS500 database [38]. Several images, including eight images (shown in Figure 3), and the BSD68 dataset [39], are tested in our simulations. We take 100 images randomly selected from the BSDS500 database as the training set and the BSD68 dataset (68 images) as the test set. Since the size of the images varies, the images were cropped to a size of 256 × 256 from the center. All the numerical experiments are performed via MATLAB (R2018b) on a Windows 10 (64 bit) platform with an Intel Core i5-8300H 2.30 GHz processor and 16 GB of RAM.

Model Parameters Estimation
To obtain the model parameters of the proposed bit-rate model and the optimal bit-depth model, we take 100 images from the BSDS500 database [38] to collect training samples. The training data adopts the way of traversing bit-depths and sampling rates. The bit-depths include {3, 4, . . . , 10}; the set of sampling rate includes 37 samples in {0.04, 0.05, . . . , 0.4} and 7 samples in {3/256, 4/256, . . . , 9/256}. If the average codeword length compressed by entropy encoding is greater than the quantized bit-depth, we take the average codeword length equal to the quantized bit-depth. One image collects 352 samples of the average codeword length and PSNR. The image block size adopts the optimal size of the corresponding quantization method, in which the DPCM quantization framework uses 16×16 blocks and uniform quantization uses 32 × 32 blocks. The orthogonal random Gaussian matrix is used for BCS sampling in this work. The entropy encoder adopts arithmetic coding [40]. In the decoder, the SPL-DWT algorithm [41] is used for image reconstruction. We take the first partial sampling rate m 0 = 0.05.
We use the least-square method to fit the model (15). Table 1 shows the trained parameters for DPCM-plus-SQ framework and uniform SQ framework. To quantify the accuracy of the fitting, we calculate the mean square error (MSE) and Pearson correlation coefficient (PCC) [42] between the actual value and predicted value. The closer the PCC is to 1, the better the fit of the model. The closer the MSE is to 0, the better the fit of the model. For the DPCM-plus-SQ framework, the MSE and PCC are 0.022 and 0.995, respectively. For the uniform SQ framework, the MSE and PCC are 0.027 and 0.996, respectively. Table 1 shows that the proposed model (15) can well describe the relationship between average codeword length L and bit-depth, sampling rate, and image features. The results show that model (15) can well describe the relationship between the average codeword length, sampling rate, and bit-depth. The optimal bit-depth model depends on the model parameters estimated by the proposed neural network. The samples of the model parameters are obtained by solving the problem (22) and then are used to train the neural network. Due to the random initialization of neural network parameters, the prediction performances of the different trained networks are different. The best network from several trained networks is chosen to estimate the parameters of the proposed optimal bit-depth model. Table 2 shows the prediction performance of the optimal bit-depth model in the training set image and test set. As shown in Table 2, for DPCM-plus-SQ framework, 80.7% and 70.7% are the accuracies of predicting the optimal bit-depth in the training set (BSDS500) and the test set (BSD68), respectively. For uniform SQ framework, 76.5% and 70.4% are the accuracies of predicting the optimal bit-depth in the training set (BSDS500) and the test set (BSD68), respectively. In the training set, the differences between the optimal bit-depth and the predicted bit-depth are no more than one bit. In the test set, 99.7% of the samples have a difference of no more than one bit between the optimal bit-depth and the predicted bit-depth. In most cases, the influence of 1-bit error on PSNR is limited, so it is effective to utilize a neural network to learn the optimal bit-depth model parameters.
In the training set, we take the maximum and minimum sampling rate corresponding to the given bit-rates as the training sample of the model (27). The parameters are obtained by the least-square method. Through experiments, we found that the optimized sampling rates beyond the boundary are mainly near the low bit-rate of 0.1-0.3, and the corresponding optimal bit-depths are mostly 4 bit or 5 bit. So, we impose boundary constraints on the sampling rates when the optimal bit-depths are 4 and 5. The parameters are fitted offline by the least square method. For DPCM-plus-SQ framework with bit-depth of 4, we obtain a 7 = 4.9164 × 10 −1 , a 8 = −7.1258 × 10 −3 . For DPCM-plus-SQ framework with bit-depth 5, we obtain a 7 = 3.4874 × 10 −1 , a 8 = −6.1371 × 10 −3 . For uniform SQ framework with bit-depth 4, we obtain a 7 = 3.3181 × 10 −1 , a 8 = −1.3050 × 10 −3 . For uniform SQ framework with bit-depth 5, we obtain a 7 = 2.3433 × 10 −1 , a 8 = 2.3347 × 10 −3 .

Rate-Distortion Optimization Performance
To verify the accuracy of the bit-rate model, we tested the BSD68 dataset and eight images in Figure 3, respectively. We first use the proposed algorithm to assign the sampling rate and bit-depth for target bit-rates, including 0.1, . . . 1 bit per pixel (bpp). Then, we calculate actual bit-rates and PSNRs of the reconstructed image for the estimated sampling rate and bit-depth. Tables 3 and 4 show the optimized bit-rate of BSD68 for the uniform SQ framework and DPCM-plus-SQ framework, respectively. The absolute error percentage denotes the percentage of the absolute error in the target bit-rate, where the absolute error is the absolute of the difference between target bit-rate and actual bit-rate. As shown in Table 3, the bit-rate average absolute error percentages of BSD68 are between 1.65% and 3.23%, which indicates that the proposed bit-rate model is useful for uniform SQ. As shown in Table 4, the bit-rate average absolute error percentages of BSD68 are between 2.09% and 3.17%, which indicates that the proposed bit-rate model is useful for DPCM-plus-SQ. Tables 5 and 6 show the actual bit-rate of the eight testing images for uniform SQ and DPCM-plus-SQ, respectively. The results exhibit that actual bit-rates are very close to the target bit-rates. To test the validity of the optimal bit-depth model, we compare the predicted optimal bit-depth with the best bit-depths by traversing different bit-depths and different sampling rates, as shown in Tables 7 and 8.  In Tables 7 and 8, the optimal percentage shows the percentage of images whose predicted bit-depth is consistent with the actual best bit-depth. The one-bit error percentage is the percentage of images with the one-bit error between the predicted bit-depth and the actual best bit-depth. We encode and decode the images according to the predicted parameters (sampling rate and bit-depth) and calculate the bit-rate and PSNR. The PSNR error is the PSNR minus the maximum PSNR, where the PSNRs are obtained by the nearest interpolation method for a bit-rate.
In Table 7, the sum of optimal bit-depth and one-bit error bit-depth obtain a percentage of between 98.53% and 100% for the SQ framework. When the target bit-rates are 0.1~0.8 bpp, the sum of the optimal bit-depth and one-bit error bit-depth percentage is 100%. When the target bit-rates are 0.9 and 1 bpp, the sum of optimal bit-depth and one-bit error bit-depth percentage is 98.53%. As the difference of PSNR between different bit-depths is small at the high target bit-rates, there is an error in estimating the bit-depth. Although only 54.41% to 91.18% of the predicted bit-depths are consistent with the optimal bit-depth, the average PSNR errors are between 0.04 dB and 0.013 dB, which shows that the error of predicted bit-depth has little influence on the reconstruction performance.
In Table 8, the sum of optimal bit-depth and one-bit error bit-depth obtain a percentage of between 98.53% and 100% for the DPCM-plus-SQ framework. When the target bit-rates are 0.1~0.7 bpp, the sum of optimal bit-depth and one-bit error bit-depth obtain a percentage of 100%. When the target bit-rates are 0.8~1 bpp, the sum of optimal bit-depth and one-bit error bit-depth obtains a percentage of 98.53%. Although only 55.88% to 82.35% of the predicted bit-depths are consistent with the optimal bit-depth, the average PSNR errors are between 0.07 dB and 0.029 dB, which shows that the error of predicted bit-depth has little influence on the reconstruction performance.
To demonstrate the performance of the proposed method in detail, we give the optimized rate-distortion curves of the eight testing images, as shown in Figures 9 and 10. We first encode the image for the bit-rates according to the optimized sampling rates and bit-depths, then calculate the PSNRs of the reconstructed image to obtain the rate-distortion curve. All bit-rates and PSNRs obtained by traversing different sampling rates and bitdepths are also shown in Figures 9 and 10. As far as we know, the optimization of sampling rate and bit-depth in the CS-based coding system is mainly focused on the uniform SQ framework, so we compared the proposed method with the latest methods [21] for the uniform SQ framework, as shown in Figure 9.
first encode the image for the bit-rates according to the optimized sampling rates and bitdepths, then calculate the PSNRs of the reconstructed image to obtain the rate-distortion curve. All bit-rates and PSNRs obtained by traversing different sampling rates and bitdepths are also shown in Figures 9 and 10. As far as we know, the optimization of sampling rate and bit-depth in the CS-based coding system is mainly focused on the uniform SQ framework, so we compared the proposed method with the latest methods [21] for the uniform SQ framework, as shown in Figure 9.
The proposed  Figure 9 shows the proposed algorithm's rate-distortion curves of the eight test images encoded by the CS-based coding system with uniform SQ. The rate-distortion curve of the proposed algorithm is very close to the optimal rate-distortion curve. The PSNRs of the proposed algorithm are slightly worse than the optimal PSNRs only at a few bit-rates. When the bit-rate is 0.3 bpp for Monarch, the predicted optimal bit-depth is 5 bit, while the actual optimal bit-depth is 4 bit. The PSNR of the proposed algorithm is 0.52 dB less than the optimal PSNR at bit-rate of 0.3 bpp. When the bit-rate is 0.3 and 0.8 bpp for Parrots, the predicted optimal bit-depth is 4 and 6, while the actual optimal bit-depth is 5 bit. The PSNRs of the proposed algorithm are about 0.25 and 0.3 dB less than the optimal PSNR at bit-rate of 0.3 and 0.8 bpp. When the bit-rate is 0.3 and 0.9 bpp for Cameraman, the predicted optimal bit-depth is 5 and 6, while the actual optimal bit-depth is 4 and 5. The PSNRs of the proposed algorithm are about 0.48 and 0.12 dB less than the optimal PSNR at bit-rate of 0.3 and 0.9 bpp. When the bit-rate is 0.4 bpp for Foreman, the predicted optimal bit-depth is 5 bit, while the actual optimal bit-depth is 6 bit. The PSNR of the proposed algorithm is 0.41 dB less than the optimal PSNR at bit-rate of 0.4 bpp. The optimal bit-depth model mainly causes these deviations, whereas the maximum deviation is 1 bit.
The proposed algorithm's rate-distortion curves are very close to the results of [21] on Barbara, Boats, House, Lena, and are better than [21] on Monarch, Parrots, Cameraman, Foreman. It can be seen from Figure 9a that the optimal bit-depth is 7 at the bit-rate of 0.7 bpp or 0.8 bpp, the proposed algorithm can accurately predict the optimal bit-depth. However, the bit-depth predicted by [21] is 6, which is one bit less than the optimal bitdepth. In Figure 9b, the optimal bit-depth is 4 at the bit-rate of 0.2 bbp, and the optimal bit-depth is 5 at the bit-rate of 0.6 bpp and 0.7 bbp. Compared with [21], the predicted bit-depths of the proposed algorithm are more accurate. Some similar situations occur in   Figure 9 shows the proposed algorithm's rate-distortion curves of the eight test images encoded by the CS-based coding system with uniform SQ. The rate-distortion curve of the proposed algorithm is very close to the optimal rate-distortion curve. The PSNRs of the proposed algorithm are slightly worse than the optimal PSNRs only at a few bit-rates. When the bit-rate is 0.3 bpp for Monarch, the predicted optimal bit-depth is 5 bit, while the actual optimal bit-depth is 4 bit. The PSNR of the proposed algorithm is 0.52 dB less than the optimal PSNR at bit-rate of 0.3 bpp. When the bit-rate is 0.3 and 0.8 bpp for Parrots, the predicted optimal bit-depth is 4 and 6, while the actual optimal bit-depth is 5 bit. The PSNRs of the proposed algorithm are about 0.25 and 0.3 dB less than the optimal PSNR at bit-rate of 0.3 and 0.8 bpp. When the bit-rate is 0.3 and 0.9 bpp for Cameraman, the predicted optimal bit-depth is 5 and 6, while the actual optimal bit-depth is 4 and 5. The PSNRs of the proposed algorithm are about 0.48 and 0.12 dB less than the optimal PSNR at bit-rate of 0.3 and 0.9 bpp. When the bit-rate is 0.4 bpp for Foreman, the predicted optimal bit-depth is 5 bit, while the actual optimal bit-depth is 6 bit. The PSNR of the proposed algorithm is 0.41 dB less than the optimal PSNR at bit-rate of 0.4 bpp. The optimal bit-depth model mainly causes these deviations, whereas the maximum deviation is 1 bit.
The proposed algorithm's rate-distortion curves are very close to the results of [21] on Barbara, Boats, House, Lena, and are better than [21] on Monarch, Parrots, Cameraman, Foreman. It can be seen from Figure 9a that the optimal bit-depth is 7 at the bit-rate of 0.7 bpp or 0.8 bpp, the proposed algorithm can accurately predict the optimal bit-depth. However, the bit-depth predicted by [21] is 6, which is one bit less than the optimal bit-depth. In Figure 9b, the optimal bit-depth is 4 at the bit-rate of 0.2 bbp, and the optimal bit-depth is 5 at the bit-rate of 0.6 bpp and 0.7 bbp. Compared with [21], the predicted bit-depths of the proposed algorithm are more accurate. Some similar situations occur in Figure 9e,f. Figure 10 shows the proposed algorithm's rate-distortion curves of the eight test images encoded by the CS-based coding system with DPCM-plus-SQ. The rate-distortion curve of the proposed algorithm is very close to the optimal rate-distortion curve. The PSNRs of the proposed algorithm are slightly worse than the optimal PSNRs only at a few bit-rates. When the bit-rate is 0.5 bpp for Parrots, the predicted optimal bit-depth is 6 bit,  Figure 10 shows the proposed algorithm's rate-distortion curves of the eight test images encoded by the CS-based coding system with DPCM-plus-SQ. The rate-distortion curve of the proposed algorithm is very close to the optimal rate-distortion curve. The PSNRs of the proposed algorithm are slightly worse than the optimal PSNRs only at a few bit-rates. When the bit-rate is 0.5 bpp for Parrots, the predicted optimal bit-depth is 6 bit, while the actual optimal bit-depth is 5 bit. The PSNR of the proposed algorithm is 0.25 dB less than the optimal PSNR at bit-rate of 0.5 bpp. When the bit-rate is 0.2 and 0.8 bpp for the image boats, the predicted optimal bit-depth is 4 and 5 bit, while the actual optimal bit-depth is 5 and 6. The PSNRs of the proposed algorithm are about 0.5 dB and 0.4 dB less than the optimal PSNR at bit-rates of 0.2 and 0.8 bpp. When the bit-rate is 0.7 bpp for Cameraman, the predicted optimal bit-depth is 6 bit, while the actual optimal bit-depth is 5 bit. The PSNR of the proposed algorithm is 0.45 dB less than the optimal PSNR at bit-rate of 0.7 bpp. When the bit-rate is 0.8 bpp for Foreman, the predicted optimal bit-depth is 6 bit, while the actual optimal bit-depth is 7 bit. The PSNR of the proposed algorithm is about 0.29 dB less than the optimal PSNR at bit-rate of 0.8 bpp.
From Figures 9 and 10, the prediction deviation of the optimal bit-depth is at most 1 bit, which mainly occurs at the junction between the two optimal bit-depths and has little effect on PSNR. The rate-distortion curves of the proposed algorithm are almost the optimal curve for the DPCM-plus-SQ framework and SQ framework. Although the proposed algorithm's rate-distortion performance is not optimal at some bit-rates, the gap is small.

Conclusions
The CS-based coding system needs to assign sampling rate and quantization bit-depth for a given bit-rate before encoding an image. In this work, we first propose a bit-rate model and an optimal bit-depth model for the CS-based coding system. The proposed bit-rate model and optimal bit-depth model have simple mathematical forms, and they have effective parameters based on training off-line data. Then, we propose a general rate-distortion optimization method to assign sampling rate and quantization bit-depth based on the bit-rate model and optimal bit-depth model. The proposed method only needs to extract some features of a small number of measurements, so the computational cost is low. Compared with the first sampling calculation of the CS measurements (blocks' size is 16×16), the addition and multiplication of the optimization process are about 5.94% and 1.17% of the sampling process, respectively, and the percentage decrease as the block size increases. The disadvantage of the proposed method is that a large amount of offline data needs to be collected to train the model parameters, which is usually acceptable. We test the uniform SQ framework and DPCM-plus-SQ framework, respectively. Experimental results show that the optimized rate-distortion performance and bit-rate of the proposed algorithm are very close to the optimal rate-distortion performance and the target bit-rate.