A Convolutional Neural Network-Based Quantization Method for Block Compressed Sensing of Images

Block compressed sensing (BCS) is a promising method for resource-constrained image/video coding applications. However, the quantization of BCS measurements has posed a challenge, leading to significant quantization errors and encoding redundancy. In this paper, we propose a quantization method for BCS measurements using convolutional neural networks (CNN). The quantization process maps measurements to quantized data that follow a uniform distribution based on the measurements’ distribution, which aims to maximize the amount of information carried by the quantized data. The dequantization process restores the quantized data to data that conform to the measurements’ distribution. The restored data are then modified by the correlation information of the measurements drawn from the quantized data, with the goal of minimizing the quantization errors. The proposed method uses CNNs to construct quantization and dequantization processes, and the networks are trained jointly. The distribution parameters of each block are used as side information, which is quantized with 1 bit by the same method. Extensive experiments on four public datasets showed that, compared with uniform quantization and entropy coding, the proposed method can improve the PSNR by an average of 0.48 dB without using entropy coding when the compression bit rate is 0.1 bpp.

The QCS primarily focuses on optimizing the encoder or decoder to enhance the quality of signal reconstruction for the commonly used quantization methods.The exploration of advanced compression techniques in the realm of block-based compressed sensing (BCS) has been a focal point in contemporary research.One such advancement is the application of Differential Pulse Code Modulation (DPCM) for quantized block-based compressed sensing of images.This strategy leverages DPCM's efficiency in exploiting spatial correlations within image blocks to minimize quantization errors, demonstrating improved bit-rate efficiency without sacrificing reconstruction quality [9].However, the quantized measurements must undergo entropy coding to attain ideal performance.Ref. [10] proposes a progressive quantization method, which essentially improves the encoding and decoding strategies while still utilizing uniform scalar quantization.The reconstruction algorithm [11][12][13][14] has been a primary focal point in CS for optimizing the decoder.
In order to improve the quantization performance, vector quantization has been used to quantify the CS measurements [15,16].Subsequently, to further amplify the efficiency Entropy 2024, 26, 468 2 of 23 of the vector quantization technique, ref. [17] has leveraged deep neural networks.Due to the high computational complexity of vector quantization, scalar quantization is more suitable for CS measurements.In data compression theory, the scalar quantizer that performs entropy coding on quantized output data is usually called an entropy-constrained quantizer or entropy-coded quantizer [18].When quantization error is used as a distortion measurement, the uniform scalar quantizer is the optimal entropy coding quantizer in rate-distortion performance [19,20].In other words, for the BCS measurements, the ratedistortion performance of the uniform or non-uniform scalar quantization methods will be inferior to the joint performance of uniform quantization and entropy coding.Therefore, in current research on CS for images, the CS measurements are usually quantified using the uniform scalar quantization method [21], and the quantized measurements are encoded by entropy coding to improve the compression performance [22,23].However, since the computational cost involved in entropy coding is usually high [23,24], using entropy coding will reduce the low-complexity advantage of the CS encoder.
There are two main ways to improve rate-distortion performance.The first way is to reduce the bitrate while keeping the distortion constant, while the second way is to reduce the distortion while keeping the bitrate constant.Using entropy coding on quantized data is the first way.Without considering entropy coding, the second way is the only method to improve rate-distortion performance.Moreover, there are strategies aimed at enhancing the compression efficiency of CS.For example, ref. [25] introduces a novel application of Zadoff-Chu sequences, renowned for their excellent autocorrelation properties.By utilizing this sequence in the measurement matrix, sparsity in the compressive domain is enhanced, leading to improved recovery accuracy for quantized CS data.Ref. [26] explores the use of Discrete Fourier Transform (DFT) for measurement, enabling parallel processing capabilities and enhancing the efficiency of block compressive sensing.This strategy capitalizes on the computational advantages of FFT to accelerate the CS process while maintaining reconstruction fidelity.While many approaches can improve compression efficiency at the encoding stage [25][26][27][28], they are not necessarily applicable or effective for QCS.
While the field of block compressed sensing (BCS) has witnessed significant advancements in recent years, traditional quantization techniques employed in image/video coding applications continue to face pivotal challenges.Specifically, uniform quantization, a common choice due to its simplicity and compatibility with entropy coding, often incurs substantial quantization errors, compromising the fidelity of the reconstructed images.Moreover, it tends to overlook the inherent structure and correlations within the data, leading to encoding redundancy and inefficiency.On the other hand, Lloyd-Max quantization, although theoretically optimal for minimizing mean squared error, necessitates computationally intensive offline training and struggles to adapt dynamically to varying image characteristics.Furthermore, the reliance on entropy coding as a supplementary step to mitigate the loss from quantization adds to the computational burden and complexity of the encoding process.
In light of these challenges, our work introduces a novel convolutional neural network (CNN)-based quantization method specifically designed to overcome the drawbacks of traditional approaches.By leveraging the power of deep learning, our method transcends the uniformity imposed by classic quantizers, achieving a more nuanced mapping of measurements that closely follows their underlying distribution.This adaptive quantization, coupled with a sophisticated dequantization mechanism that harnesses correlation from quantized data, significantly reduces quantization errors without resorting to additional entropy coding.
In this paper, we propose a quantization method of BCS measurements that can reduce distortion while maintaining the bitrate.At the end of encoding, the proposed method models the measurements' distribution of each image block.Subsequently, it quantizes the measurements of data that conform to a uniform distribution based on the distribution model.The proposed method uses the distribution parameters of each image block as side Entropy 2024, 26, 468 3 of 23 information of the encoder and adopts the same strategy to quantize the side information with 1 bit.At the end of decoding, the proposed method first restores the quantized data to data that conform to the distribution of the original measurements, then extracts the correlation information of the measurements from the quantized data of adjacent blocks to correct the measurements.Before the dequantization of the measurements, the same strategy is used to dequantize the side information.All quantization and dequantization processes are implemented as convolutional neural networks (CNN), and all networks are jointly learned offline.The CS coding structure based on the proposed method is shown in Figure 1.
Entropy 2024, 26, x FOR PEER REVIEW 3 of 24 quantizes the measurements of data that conform to a uniform distribution based on the distribution model.The proposed method uses the distribution parameters of each image block as side information of the encoder and adopts the same strategy to quantize the side information with 1 bit.At the end of decoding, the proposed method first restores the quantized data to data that conform to the distribution of the original measurements, then extracts the correlation information of the measurements from the quantized data of adjacent blocks to correct the measurements.Before the dequantization of the measurements, the same strategy is used to dequantize the side information.All quantization and dequantization processes are implemented as convolutional neural networks (CNN), and all networks are jointly learned offline.The CS coding structure based on the proposed method is shown in Figure 1.The main contributions of this article are as follows:

•
A quantization method of BCS measurement based on CNN is proposed.The proposed method constructs and jointly trains the CNN of the quantization and dequantization processes, which aims to maximize the coding output entropy and minimize the quantization error.

•
A quantization process based on measurements' distribution is proposed.Based on the properties of the cumulative distribution function (CDF), a neural network model was constructed to map measurements to the quantized data following a uniform distribution, which maximizes the amount of information carried in the quantized data.An activation function with a constrained output range was designed to reduce the computational complexity of the network's activation function.

•
A dequantization process based on the neighborhood information of measurements is proposed.The inverse process of quantization is used as a module to restore the quantized data to data that conform to the distribution of the original measurements.Furthermore, an information correction module is introduced to extract correlation information from multiple quantized values for correcting the measurements.The two modules are used to improve the quality of the dequantized measurements through residual connection.

•
The distribution parameters of the block measurements are used as side information, which is quantized with 1 bit by the same quantization process.
While conventional approaches such as uniform quantization and Lloyd-Max quantization with entropy coding have been widely employed, they often introduce significant quantization errors and encoding inefficiencies.Our work diverges from these methodologies by introducing a CNN-based quantization strategy that not only maps measurements to a uniformly distributed quantized space to maximize the information content The main contributions of this article are as follows: • A quantization method of BCS measurement based on CNN is proposed.The proposed method constructs and jointly trains the CNN of the quantization and dequantization processes, which aims to maximize the coding output entropy and minimize the quantization error.• A quantization process based on measurements' distribution is proposed.Based on the properties of the cumulative distribution function (CDF), a neural network model was constructed to map measurements to the quantized data following a uniform distribution, which maximizes the amount of information carried in the quantized data.An activation function with a constrained output range was designed to reduce the computational complexity of the network's activation function.• A dequantization process based on the neighborhood information of measurements is proposed.The inverse process of quantization is used as a module to restore the quantized data to data that conform to the distribution of the original measurements.Furthermore, an information correction module is introduced to extract correlation information from multiple quantized values for correcting the measurements.The two modules are used to improve the quality of the dequantized measurements through residual connection.
• The distribution parameters of the block measurements are used as side information, which is quantized with 1 bit by the same quantization process.
While conventional approaches such as uniform quantization and Lloyd-Max quantization with entropy coding have been widely employed, they often introduce significant quantization errors and encoding inefficiencies.Our work diverges from these methodologies by introducing a CNN-based quantization strategy that not only maps measurements to a uniformly distributed quantized space to maximize the information content but also incorporates a novel dequantization process that leverages correlations from the quantized data to minimize reconstruction errors.This innovative method bypasses the need for entropy coding, offering a more efficient and adaptive solution for BCS applications.
In comparison to uniform quantization, which assigns equal intervals to the entire dynamic range, and Lloyd-Max quantization, known for minimizing the mean squared error but requiring complex optimization, our CNN-based approach dynamically adapts to the underlying data distribution.Unlike entropy coding methods, which reduce redundancy at the expense of computational complexity, our method directly optimizes the quantization process through learning, achieving superior performance without additional encoding steps.
The core of our method lies in the design of the CNN architecture, which jointly learns the quantization and dequantization processes.This contrasts with traditional quantization techniques that rely on predetermined, static decision boundaries.Our network specifically tailors the quantization levels based on the input data's statistical properties, ensuring a closer match to the original signal characteristics.Additionally, the utilization of correlation information from the quantized data for post-processing further distinguishes our method, leading to reduced quantization artifacts.
The rest of this paper is organized as follows.Section 2 introduces the proposed method, which mainly includes the BCS quantization process, parameter estimation, parameter quantization and dequantization, and the BCS dequantization process.Section 3 presents the experimental results.The conclusion is given in Section 4.

Proposed Method
In this paper, we aim to improve the amount of information of quantized data and the information extraction ability of the dequantization process.The proposed method mainly includes the following aspects: (1) BCS quantization process, (2) parameter estimation, (3) parameter quantization and dequantization, and (4) BCS dequantization process.

BCS Quantization Process
Ref. [29] demonstrates that a random variable's CDF can transform its data distribution into a uniform distribution.The random variables are usually modeled by probability density functions (PDF) to describe their distributions.It is difficult to obtain a closed-form expression for the CDF due to the need for integral calculations involving the PDF.Due to the strong function approximation ability of feedforward neural networks, we propose utilizing a feedforward neural network to model the CDF of the CS measurements.
Assuming that P(y) and F(y) are the PDF and CDF of the measurement variable y, respectively, P(y) and F(y) need to satisfy the following conditions: Compared with a three-layer feedforward neural network, a four-layer feedforward neural network can accurately establish the relationship between input and target variables with fewer hidden neurons [30].When employing a four-layer feedforward neural network to build the CDF of the measurements, it can be represented as: where g 1 , g 2 , g 3 represent the activation functions, and Based on Equation (1), it is necessary to ensure that the output values of F(y) fall within [0, 1].In model (2), the activation function g 3 determines the output range of F(y).Among the commonly used activation functions, the sigmoid function can ensure that the output values fall within [0, 1].However, the sigmoid is a highly complex nonlinear activation function.We propose a rectified linear activation function with a limited output range, which can be expressed as: where α is a finite constant greater than 0. The curve graph and gradient curve of G(x) are shown in Figure 2. where , , g g g represent the activation functions, and Based on Equation (1), it is necessary to ensure that the output values of ( ) F y fall within [0,1].In model ( 2), the activation function 3 g determines the output range of ( ) F y .Among the commonly used activation functions, the sigmoid function can ensure that the output values fall within [0,1].However, the sigmoid is a highly complex nonlinear activation function.We propose a rectified linear activation function with a limited output range, which can be expressed as: where α is a finite constant greater than 0. The curve graph and gradient curve of ( ) G x are shown in Figure 2. To improve the adaptiveness of the activation function, we take α as a learnable parameter of the CDF model.Because activation functions usually do not come with trainable parameters, we transform Equation (3) as: By comparing Equations ( 3) and (4), we obtain , and then the neural network model of CDF can be expressed as: where β ∈, and 2 1 , g g are the LeakyReLU activation function [31].
In BCS, all measurements are usually stored in the form of a matrix, which can be expressed as: where represents the measurements' matrix, Φ represents the measurement matrix, M represents the number of measurements for each image block, To improve the adaptiveness of the activation function, we take α as a learnable parameter of the CDF model.Because activation functions usually do not come with trainable parameters, we transform Equation (3) as: By comparing Equations ( 3) and ( 4), we obtain G(x) = g 3 x α .Let β = 1 a , and then the neural network model of CDF can be expressed as: where β ∈ R, and g 1 , g 2 are the LeakyReLU activation function [31].
In BCS, all measurements are usually stored in the form of a matrix, which can be expressed as: where Y ∈ R M×N B represents the measurements' matrix, Φ represents the measurement matrix, M represents the number of measurements for each image block, N B represents the number of image blocks, x i represents the pixel value vector of the i-th block, and y i represents the measurements' vector of the i-th block.
When the input becomes a matrix, we can convert the feedforward neural network to a CNN.The number of convolutional layers in the CNN equals the number of parameter layers in the feedforward neural network.The number of channels in the convolutional network is equal to the number of neurons in the hidden layer.Since the feedforward neural network model has a single input and output, the kernel size of the convolutional network is 1 × 1. Considering both quality and complexity, we set the output feature of the intermediate convolutional layer to six channels.The network structure diagram of CNN based on model ( 5) is shown in Figure 3.
eter layers in the feedforward neural network.The number of channels in the convolutional network is equal to the number of neurons in the hidden layer.Since the feedforward neural network model has a single input and output, the kernel size of the convolutional network is 1 × 1. Considering both quality and complexity, we set the output feature of the intermediate convolutional layer to six channels.The network structure diagram of CNN based on model ( 5) is shown in Figure 3.
In Figure 3, "Conv" denotes convolutions, the numbers above "Conv" indicate the kernel size, "LeakyReLU" and " 3 g " represent the activation functions used in the current convolutional layer, and the rectangular boxes represent the output feature maps of the convolutional layer.The numbers below the rectangular boxes indicate the number of channels.To balance the computational cost and fitting performance of the model ( 5), we adopt 2 1

6
L L = = in this paper.Due to variations between image blocks, the distribution parameters and CDF of different block measurements also differ.However, the model ( 5) does not account for the influence of distribution parameters, which limits its adaptability.
When using a Gaussian measurement matrix, the measurements of the same image block approximately follow a Gaussian distribution.Assuming that the measurement variable j y of the j-th image block follows a Gaussian distribution ( ; , ) through Equation ( 7), which can be expressed as: In Figure 3, "Conv" denotes convolutions, the numbers above "Conv" indicate the kernel size, "LeakyReLU" and "g 3 " represent the activation functions used in the current convolutional layer, and the rectangular boxes represent the output feature maps of the convolutional layer.The numbers below the rectangular boxes indicate the number of channels.To balance the computational cost and fitting performance of the model (5), we adopt L 1 = L 2 = 6 in this paper.
Due to variations between image blocks, the distribution parameters and CDF of different block measurements also differ.However, the model ( 5) does not account for the influence of distribution parameters, which limits its adaptability.
When using a Gaussian measurement matrix, the measurements of the same image block approximately follow a Gaussian distribution.Assuming that the measurement variable y j of the j-th image block follows a Gaussian distribution N (y j ; µ j , σ j ), it can be transformed into a standard normal distribution N (z; 0, 1) through Equation ( 7), which can be expressed as: where µ j and σ j represent the location parameter (mean) and scale parameter (standard deviation), respectively.For the measurements' matrix Y, Equation ( 7) can be expressed as: where U = µ j represents the matrix of position parameters, Λ = σ j represents the matrix of scale parameters, and ./represents the element-wise division of matrices.
After removing the impact of distribution parameters through Equation ( 8), the same CNN can transform the measurements of all blocks into a uniform distribution.
In order to reduce the computational burden of parameter extraction, the matrix Y 0 ∈ R M 0 ×N B of a small number of measurements is used for extracting the distribution parameters.The proposed BCS quantization process is shown in Figure 4.In Figure 4, "Repmat" represents repeat copies of an array, which is used to copy the parameters into the same size as the measurements' matrix.Because the measurements of an image block have the same position and scale parameters, the size of the position and scale parameters are 1 × N B .
After removing the impact of distribution parameters through Equation ( 8), the same CNN can transform the measurements of all blocks into a uniform distribution.
In order to reduce the computational burden of parameter extraction, the matrix of a small number of measurements is used for extracting the distribution parameters.The proposed BCS quantization process is shown in Figure 4.In Figure 4, "Repmat" represents repeat copies of an array, which is used to copy the parameters into the same size as the measurements' matrix.Because the measurements of an image block have the same position and scale parameters, the size of the position and scale parameters are ).

Parameter Estimation
In general, the expressions of the mean (position parameter) μ and variance (scale parameter) σ are as follows: ( ) Since using partial measurements to estimate distribution parameters may introduce some errors, we use a neural network to estimate the distribution parameters.According to the definition of convolution, Equation (9) can be expressed as: where * denotes the convolution operation, .represents the element-wise multiplication of matrices, and denotes the convolution kernel used for computing the mean.

Parameter Estimation
In general, the expressions of the mean (position parameter) µ and variance (scale parameter) σ 2 are as follows: Since using partial measurements to estimate distribution parameters may introduce some errors, we use a neural network to estimate the distribution parameters.According to the definition of convolution, Equation (9) can be expressed as: where * denotes the convolution operation, ./represents the element-wise multiplication of matrices, and W µ = 1 N , . . ., 1 N denotes the convolution kernel used for computing the mean.
Equation (10) shows that the mean and variance can be estimated using convolutions.Based on Equation (10), we construct CNNs to estimate the position and scale parameters, as shown in Figures 5 and 6, respectively.Since the mean and standard deviation functions are relatively simple in form, three channels are used for the middle convolution layer.Equation (10) shows that the mean and variance can be estimated using convolutions.Based on Equation (10), we construct CNNs to estimate the position and scale parameters, as shown in Figure 5 and Figure 6, respectively.Since the mean and standard deviation functions are relatively simple in form, three channels are used for the middle convolution layer.

BCS Dequantization Process
The information of the quantized data is not only present in a single quantized value but also exists among multiple quantized values.Typically, the dequantization process at the decoding end is the inverse of the quantization process at the encoding end.However, the proposed quantization process and its inverse process only operate on individual quantized values.If dequantization only utilizes the inverse process of the proposed quantization process, it cannot use the correlated information among the multiple quantized values.Therefore, we propose adding a measurements' information correction module to extract the measurement correction from the multiple neighboring quantized values.The dequantization process is shown in Figure 7.The quantization process is built based on the CDF and its inverse process can be constructed with the inverse function of the CDF.We adopt the same network architecture for the inverse process module.Since the similarity between adjacent image blocks is

BCS Dequantization Process
The information of the quantized data is not only present in a single quantized value but also exists among multiple quantized values.Typically, the dequantization process at the decoding end is the inverse of the quantization process at the encoding end.However, the proposed quantization process and its inverse process only operate on individual quantized values.If dequantization only utilizes the inverse process of the proposed quantization process, it cannot use the correlated information among the multiple quantized values.Therefore, we propose adding a measurements' information correction module to extract the measurement correction from the multiple neighboring quantized values.The dequantization process is shown in Figure 7.

BCS Dequantization Process
The information of the quantized data is not only present in a single quantized value but also exists among multiple quantized values.Typically, the dequantization process at the decoding end is the inverse of the quantization process at the encoding end.However, the proposed quantization process and its inverse process only operate on individual quantized values.If dequantization only utilizes the inverse process of the proposed quantization process, it cannot use the correlated information among the multiple quantized values.Therefore, we propose adding a measurements' information correction module to extract the measurement correction from the multiple neighboring quantized values.The dequantization process is shown in Figure 7.The quantization process is built based on the CDF and its inverse process can be constructed with the inverse function of the CDF.We adopt the same network architecture for the inverse process module.Since the similarity between adjacent image blocks is The quantization process is built based on the CDF and its inverse process can be constructed with the inverse function of the CDF.We adopt the same network architecture for the inverse process module.Since the similarity between adjacent image blocks is significant, we use convolution kernels of 1 × 3 to extract the compensation information from the neighborhood measurements.In order to improve the information extraction, five convolution layers are used in the measurement compensation information module and the inverse process of the proposed quantization.The network structure of the inverse process of the proposed quantization is shown in Figure 8.The network structure of the measurement compensation information module is illustrated in Figure 9.
Entropy 2024, 26, x FOR PEER REVIEW 9 of 24 significant, we use convolution kernels of 1 × 3 to extract the compensation information from the neighborhood measurements.In order to improve the information extraction, five convolution layers are used in the measurement compensation information module and the inverse process of the proposed quantization.The network structure of the inverse process of the proposed quantization is shown in Figure 8.The network structure of the measurement compensation information module is illustrated in Figure 9.

Parameter Quantization and Dequantization
Due to the high similarity between adjacent image blocks, the distribution parameters exhibit some similarity.Therefore, we use the same quantization and dequantization processes to quantize and dequantize the distribution parameters.To reduce the extra bits of the side information, we use 1 bit to quantize the side information.The network structures of quantization and dequantization for the parameters are shown in Figure 10 and Figure 11, respectively.

Local Normalization of Measurements' Matrix
The element in the i-th row and j-th column of the measurements' matrix can be expressed as:

Parameter Quantization and Dequantization
Due to the high similarity between adjacent image blocks, the distribution parameters exhibit some similarity.Therefore, we use the same quantization and dequantization processes to quantize and dequantize the distribution parameters.To reduce the extra bits of the side information, we use 1 bit to quantize the side information.The network structures of quantization and dequantization for the parameters are shown in Figures 10 and 11, respectively.

Parameter Quantization and Dequantization
Due to the high similarity between adjacent image blocks, the distribution parame ters exhibit some similarity.Therefore, we use the same quantization and dequantizatio processes to quantize and dequantize the distribution parameters.To reduce the extra bit of the side information, we use 1 bit to quantize the side information.The network struc tures of quantization and dequantization for the parameters are shown in Figure 10 an Figure 11, respectively.

Local Normalization of Measurements' Matrix
The element in the i-th row and j-th column of the measurements' matrix can be ex pressed as:

Parameter Quantization and Dequantization
Due to the high similarity between adjacent image blocks, the distribution parameters exhibit some similarity.Therefore, we use the same quantization and dequantization processes to quantize and dequantize the distribution parameters.To reduce the extra bits of the side information, we use 1 bit to quantize the side information.The network structures of quantization and dequantization for the parameters are shown in Figure 10 and Figure 11, respectively.

Local Normalization of Measurements' Matrix
The element in the i-th row and j-th column of the measurements' matrix can be expressed as:

Local Normalization and Loss Function 2.5.1. Local Normalization of Measurements' Matrix
The element in the i-th row and j-th column of the measurements' matrix can be expressed as: where Φ i represents the i-th row of the measurement matrix Φ.The grayscale values of the images are typically represented by 256 levels, the pixel values satisfy the following: Combining Equations ( 11) and ( 12), we have: Based on Equation ( 13), we take 255 max Φ i,k , 0 as the minimum and maximum values of Y i,j , which can be expressed as: Based on Equation ( 14), each measurement can be normalized, which can be expressed as: According to Equation ( 14), it is known that the same row of Y shares the same maximum and minimum values.Therefore, Equation ( 15) is referred to as the local normalization method for the measurements' matrix.
Since Equation ( 14) only requires computation based on the measurement matrix, the row normalization method does not require transmitting the maximum and minimum value.Equation ( 15) transforms all measurements into real numbers between 0 and 1, and the input and output of the CNNs in the quantization and the dequantization processes are also real numbers between 0 and 1.Therefore, the quantization of the measurements' matrix can be expressed as: where b represents the bit-depth of quantization, and F q−net represents the CNN in the quantization process.At the decoding end, the output value of the dequantization network needs to be denormalized, which can be represented as: where F dq−net represents the CNN in the dequantization process.

Loss Function
The main objective of the proposed method is to optimize the amount of information retained in the quantized data while minimizing the quantization error.The average information of data is usually measured by information entropy.However, since the information entropy depends on probability statistics, it cannot propagate the gradient.In this paper, we propose a continuous function to estimate the information entropy when training the network.
The entropy of the quantized measurements' matrix ⌢ Y can be represented as follows: where P(s k ) represents the probability of the symbol s k in ⌢ Y, which requires counting the number of occurrences of symbol s k .
We can count the numbers greater than s k in the measurements' matrix by using a step function, which is defined as follows: The numbers greater than s k in the measurements' matrix can be represented as: The step function h(x) is not differentiable at 0. To ensure differentiability, we use the sigmoid activation function to approximate h(x).The approximate function can be represented as follows: When η → +∞ , we can get: where 0 < ε < 1 is used to ensure that the function maps to 1 when ⌢ Y i,j = s k , and we set ε = 0.5.The parameter η can be used to adjust the variation of the sigmoid function.Based on practical experience, setting η = 64 can make the output values of the sigmoid function approach 0 or 1 as much as possible.
Based on Equation ( 22), the probability greater than s k in the measurements' matrix can be represented as: Assuming that s k+1 > s k , we have: Based on the above, the computation of information entropy can be estimated as: To simultaneously minimize the quantization error and maximize the information entropy, the objective function is composed of two parts.The first part is the mean square error (MSE) between the CS measurements before and after quantization, and the second part is the information entropy of the quantized measurements.The loss function is as follows: where λ > 0 is a parameter that controls the importance of the information entropy of quantized data.Ŷ represents the dequantized measurements.

Results
In this section, we present various experimental results that validate the performance of our method.The proposed method is primarily implemented through CNNs, which requires the collection of a training dataset for network training.The training dataset comprises 200 training images taken from the BSDS500 dataset [32].Each image has been cropped into grayscale images of size 256 × 256 with a stride of 60 pixels.The block size utilized in BCS is set at 16 × 16.The samples and labels of the training data set are both a matrix of BCS measurements for each image.The matrix used to collect the measurements has a sampling rate of 0.8, so the trained network can be applied to any measurements' matrix with a sampling rate lower than 0.8.Each block typically requires at least ten measurements to reconstruct an image from the BCS measurements efficiently, so we take M 0 = 10 as the number of the partial measurements.
All CNNs were implemented using the Pytorch framework.We trained the CNNs of the quantization and dequantization processes together.The batch size was set to 32, with the optimization process performed using the Adam algorithm, initialized with a learning rate of 0.001.After the initial training of 10,000 epochs, the learning rate was reduced by a factor of 10, and all networks were trained for an additional 20,000 epochs.The training process was conducted on a server powered by an Intel Xeon CPU, a Nvidia RTX 2080Ti GPU with 11 GB of memory, and 128 GB of DDR4 RAM.The test images consisted of an APC, aerial, airplane, airport, building, moon surface, tank, and truck, as illustrated in Figure 12.Publicly available datasets such as Set5 [33] (5 images), Set11 [34] (11 images), Set14 [35] (14 images), and BSD68 [36] (68 images) were also employed.All experiments were conducted on an Intel Core i5-8300H CPU @ 2.30GHz, and the proposed method's performance was measured using the PSNR of the reconstructed images.
In this section, we present various experimental results that validate the performance of our method.The proposed method is primarily implemented through CNNs, which requires the collection of a training dataset for network training.The training dataset comprises 200 training images taken from the BSDS500 dataset [32].Each image has been cropped into grayscale images of size 256 × 256 with a stride of 60 pixels.The block size utilized in BCS is set at 16 × 16.The samples and labels of the training data set are both a matrix of BCS measurements for each image.The matrix used to collect the measurements has a sampling rate of 0.8, so the trained network can be applied to any measurements' matrix with a sampling rate lower than 0.8.Each block typically requires at least ten measurements to reconstruct an image from the BCS measurements efficiently, so we take as the number of the partial measurements.All CNNs were implemented using the Pytorch framework.We trained the CNNs of the quantization and dequantization processes together.The batch size was set to 32, with the optimization process performed using the Adam algorithm, initialized with a learning rate of 0.001.After the initial training of 10,000 epochs, the learning rate was reduced by a factor of 10, and all networks were trained for an additional 20,000 epochs.The training process was conducted on a server powered by an Intel Xeon CPU, a Nvidia RTX 2080Ti GPU with 11 GB of memory, and 128 GB of DDR4 RAM.The test images consisted of an APC, aerial, airplane, airport, building, moon surface, tank, and truck, as illustrated in Figure 12.Publicly available datasets such as Set5 [33] (5 images), Set11 [34] (11 images), Set14 [35] (14 images), and BSD68 [36] (68 images) were also employed.All experiments were conducted on an Intel Core i5-8300H CPU @ 2.30GHz, and the proposed method's performance was measured using the PSNR of the reconstructed images.

Analysis of Measurement Reconstruction Results
In this section, we analyzed the number of reconstruction levels and the quantization errors of the quantization method.
The current studies on improving CS quantization methods typically focus on sparse signals, but these methods are not suitable for images with a large number of elements.To ensure a low complexity of the encoder, the BCS encoder usually uses uniform quantization and entropy coding to process the BCS measurements.In addition, uniform quantizer is considered as the optimal quantizer for entropy-coded quantization in data compression theory [23,24], which is why BCS encoders tend to use uniform quantization and entropy coding.Currently, the most advanced quantization techniques for BCS of images are believed to be the prediction quantization method [9] and the progressive quantization method [10].However, they essentially improve the coding strategy, while the quantization method employed is still uniform quantization, which can explore improvements using the commonly used quantization methods and the proposed method.To simplify the experi-mental process, this paper only compares the quantization techniques, using entropy-coded uniform quantization, µ-law quantization [37,38], and Lloyd-max quantization [39,40] as benchmarks.The entropy-coded uniform quantization method refers to the use of entropy coding after performing uniform quantization on the measurements.The codebook for Lloyd-Max quantization was obtained through offline training.
In scalar quantization methods, the number of reconstruction levels is typically equal to the number of quantization levels, as shown in Table 1.Table 2 shows the number of reconstructed levels of the proposed method for eight test images at a measurement rate of 0.2.Comparison between Tables 1 and 2 shows that the proposed method have more different elements in the dequantized measurements.This is mainly because the proposed method utilizes the information of multiple quantized values for the dequantization, which gives the proposed method the advantage of many-to-many mapping.Moreover, each row of the measurements' matrix adopts different maximum and minimum values for local denormalization.The local normalization approach also increases the number of reconstruction levels in accordance with the increase in measurement rate.The greater the number of reconstruction levels of dequantized measurements, the greater the quantization error can be reduced.Tables 3-5 display the MSE of the various quantization methods for the measurements quantized with 3-bit, 6-bit, and 8-bit, respectively.Table 3 shows that when using 3-bit quantization, the proposed method reduces the MSE by 788.07, 670.48, and 585.35 compared with uniform quantization, µ-law quantization, and Lloyd-Max quantization, respectively.Similarly, in Table 4, the proposed method accomplishes a reduced MSE by 8.25, 6.77, and 10.66 when 6-bit quantization is employed.Table 5 reveals that when 8-bit quantization is applied, the proposed method reduces the MSE by 0.4665, 0.3765, and 1.8162 compared with uniform quantization, µ-law quantization, and Lloyd-Max quantization, respectively.Tables 3-5 demonstrate that the proposed method has a significantly lesser MSE than other quantization methods.

Analysis of the Impact of Entropy Loss Constraints
In this section, we analyzed the effect of entropy constraint.The parameter λ in the loss function determines the degree of entropy constraint.In order to analyze the appropriate value of λ, we only select a few common values for training the CNNs of the quantization and dequantization processes.Table 6 shows the MSE of the dequantized measurements and the information entropy of the quantized measurements when 8-bit is used to quantize the measurements of the BSDS500 dataset.It can be seen from Table 6 that the entropy constraint can increase the information entropy of the quantized measurements but it has a slight impact on reducing the MSE of the dequantized measurements.When λ = 0.05, the MSE of the dequantized measurements is the smallest.Therefore, when training the proposed network, we use λ = 0.05.
In addition, it can be observed that the entropies of the quantified measurements are very close to the bit-depth.Some images may not be compressed when using a fixed code table for entropy coding.In other words, the measurements quantized by the proposed method do not need entropy coding.

Analysis of the Impact of the Measurement Information Correction Module
In this section, we analyzed the impact of the information correction module on the dequantization process.Table 7 shows the quantization performance of the BSDS500 dataset when different information correction modules are used in the proposed method.In Table 7, Contrast Scheme 1 did not use a measurement information correction module.Contrast Scheme 2 used a measurement information correction module composed of three convolutional layers.Contrast Scheme 3 used a measurement information correction module composed of six convolutional layers.Table 7 reveals that the proposed method has significant advantages over the µ-law quantization method.Compared with Contrast Scheme 1, the MSE of Contrast Scheme 2 is reduced by 0.0271, while its entropy is increased by 0.067.Similarly, the MSE of Contrast Scheme 3 is reduced by 0.0457 and the entropy is increased by 0.1616.These results illustrate that the information correction module effectively improves the quantization performance.Furthermore, the information correction module exhibits a stronger correction capability with the increase of convolutional layers.

Rate-Distortion Performance Comparison
In this section, we compared the rate-distortion performance of the proposed method with the entropy-coded uniform quantization, µ-law quantization, and Lloyd-Max quantization.The entropy-coded uniform quantization method refers to the use of entropy coding after performing uniform quantization on the measurements, which is expressed by "uniform quantization + entropy coding" in this paper.When drawing the rate-distortion curve, we traverse multiple quantization bit-depths and sampling rates to encode and decode the test images.Then, we choose the optimum Bitrate-PSNR points and connect them with a line.The bit-depth adopts seven values in {2, 3, 4, . .., 8}, and the sampling rate chooses 77 values in {0.04, 0.05, 0.06, . .., 0.8}.The image reconstruction algorithm used is the BCS-SPL-DCT algorithm [41].When calculating the bitrate of "uniform quantization + entropy coding," the average codeword length of entropy coding is replaced by information entropy.Figure 13 shows the PSNR curve of the eight test images.
In Figure 13, the proposed method has the best rate-distortion performance on all eight test images, particularly for the aerial, building, and tank images.The PSNR curve of "uniform quantization + entropy coding" is better than the µ-law and Lloyd-Max quantization methods.This observation confirms that the existing quantizers without entropy coding have inferior rate-distortion performance compared with "uniform quantization + entropy coding".
tion curve, we traverse multiple quantization bit-depths and sampling rates to encode and decode the test images.Then, we choose the optimum Bitrate-PSNR points and connect them with a line.The bit-depth adopts seven values in {2, 3, 4, …, 8}, and the sampling rate chooses 77 values in {0.04, 0.05, 0.06, …, 0.8}.The image reconstruction algorithm used is the BCS-SPL-DCT algorithm [41].When calculating the bitrate of "uniform quantization + entropy coding," the average codeword length of entropy coding is replaced by information entropy.Figure 13 shows the PSNR curve of the eight test images.In Figure 13, the proposed method has the best rate-distortion performance on all eight test images, particularly for the aerial, building, and tank images.The PSNR curve of "uniform quantization + entropy coding" is better than the µ-law and Lloyd-Max quantization methods.This observation confirms that the existing quantizers without entropy coding have inferior rate-distortion performance compared with "uniform quantization + Figure 14 shows the reconstructed images of the eight test images with different methods at a compression bit rate of 0.1.The Lloyd-Max quantization approach generates an adaptive quantization dictionary for each image.We do not count the bits of the quantization dictionary in the compression bit rate of the Lloyd-Max quantization method.Therefore, the results of the Lloyd-Max quantization method in Figure 14 are equivalent to the optimal results of the conventional quantizer.The four quantization methods were also tested on the four test image datasets, a the average PSNR curves are shown in Figure 15.As shown in Figure 14, the proposed method exhibits the best visual effect and PSNR, followed by "uniform quantization + entropy coding" and Lloyd-Max quantization.Compared with "uniform quantization + entropy coding," for the eight test images, the PSNR of the proposed method increased by 0.65 dB, 0.44 dB, 1.97 dB, 0.02 dB, 0.46 dB, 0.09 dB, 0.37 dB, and 0.29 dB, respectively.Compared with the Lloyd-Max quantization, for the eight test images, the PSNR of the proposed method increased by 2.1 dB, 0.75 dB, 1.8 dB, 0.28 dB, 0.78 dB, 1.55 dB, 1.76 dB, and 1.53 dB, respectively.
The four quantization methods were also tested on the four test image datasets, and the average PSNR curves are shown in Figure 15.
In Figure 15, the average PSNR is the mean of the PSNR of the reconstructed images at a given bit rate for all images in the dataset.The PSNRs at a given bit rate are obtained by linear interpolation from the Bitrate-PSNR curve for each image.The given bit rates are set to {0.1, 0.2, . .., 1 bpp}.For datasets Set5, Set11, Set14, and BSD68, the average PSNR curve of the proposed method is better than "uniform quantization + entropy coding," µ-law quantization and Lloyd-Max quantization.Particularly, at a low bit rate (around 0.1 bpp), the proposed method's PSNR is much higher than the other methods.
The datasets SunHays80 [42] and Urban100 [43] have been extended for testing (all data are converted to grayscale images with 256 × 256).The quality of reconstruction is evaluated by the peak signal to noise ratio (PSNR) and the structural similarity (SSIM) between the reconstructed image and the original image.Table 8 shows the PSNRs and SSIMs of the four datasets at a bit rate of 0.1 bpp.Table 9 shows the PSNRs and SSIMs of the four datasets at a bit rate of 0.2 bpp.
For all images of the six datasets, when the bitrate is set to 0.1 bpp, the proposed method, "uniform quantization + entropy coding," µ-law quantization, and Lloyd-Max quantization achieve average PSNRs of 19.69 dB, 19.24 dB, 17.54 dB, and 18.67 dB, respectively.Compared with "uniform quantization + entropy coding," the proposed method improves the PSNR by an average of 0.45 dB without entropy coding.The proposed method, "uniform quantization + entropy coding," µ-law quantization, and Lloyd-Max quantization achieve average SSIMs of 0.1855, 0.1739, 0.1408, and 0.1547 respectively.Compared with "uniform quantization + entropy coding," the proposed method improves the SSIM by an average of 0.0116 without entropy coding.The four quantization methods were also tested on the four test image datasets, and the average PSNR curves are shown in Figure 15.For all images of the six datasets, when the bitrate is set to 0.2 bpp, the proposed method, "uniform quantization + entropy coding," µ-law quantization, and Lloyd-Max quantization achieve average PSNRs of 21.27 dB, 21.09 dB, 20.88 dB, and 20.93 dB, respectively.Compared with "uniform quantization + entropy coding," the proposed method improves the PSNR by an average of 0.18 dB without entropy coding.The proposed method, "uniform quantization + entropy coding," µ-law quantization, and Lloyd-Max quantization achieve average SSIMs of 0.2738, 0.2683, 0.2543, and 0.2567, respectively.Com-Entropy 2024, 26, 468 20 of 23 pared with "uniform quantization + entropy coding," the proposed method improves the SSIM by an average of 0.0055 without entropy coding.Across all images in the six datasets, the proposed method demonstrates superior performance compared to standard uniform quantization combined with "uniform quantization + entropy coding," as well as the µ-law and Lloyd-Max quantization schemes.

Analysis of Computational Complexity
On the encoding side, the calculation of the proposed quantization method involves four networks: the position parameter estimation network, the scale parameter estimation network, the parameter quantization network, and the measurement quantization network.The network structure of the position parameter estimation network and scale parameter estimation network are identical, as shown in Table 10.Similarly, the network structures of the parameter quantization and measurement quantization are identical, as shown in Table 11.The position and scale parameters are derived from partial measurements Y 0 ∈ R 10×N B .According to Table 10, convolutional layer 1 requires around 10 × N B × 3 multiplications and 10 × N B × 3 additions, convolutional layer 2 requires 3 × N B × 3 multiplications and 3 × N B × 3 additions, and convolutional layer 3 requires 3 × N B multiplications and 3 × N B additions.In total, location parameter estimation and scale parameter estimation need about 84N B times multiplications, 84N B times additions, and 12N B times LeakyReLU operations.
According to Table 11, to quantize a parameter or measurement, the quantization network typically requires 48 times multiplications, 48 times additions, 12 times LeakyReLU operations, and one time g 3 operation.For the measurements' matrix Y ∈ R M×N B , the numbers of measurements and parameters that need to be quantized are M × N B and 2N B , respectively.In total, it is necessary to compute (2 + M)48N B times multiplications, (2 + M)48N B times additions, (2 + M)N B times LeakyReLU, and (2 + M)N B times g 3 .
The proposed method requires approximately the same number of multiplication and addition operations, and the activation function to be computed only involves linear operations.Since addition is much faster than multiplication in practical operations, we only compared the number of multiplication operations.With an image size of 256 × 256 and a block size of 16 × 16, the total number N B of blocks would be 256.Assuming that the measurement rate of BCS is 0.1, each block obtains 26 measurements.Each measurement needs about 256 times multiplications and 255 times additions, and the calculation of measurements is about 26 × 256N B times multiplications and 26 × 255N B times additions.The proposed quantization method requires 660N B times multiplications, 660N B times additions, 156N B times LeakyReLUs, and 12N B times g 3 .Compared with the calculation for BCS measurements, the calculation of the proposed quantization process is about 9.92% of that of the BCS measurements.

Figure 1 .
Figure 1.The CS coding structure based on the proposed method.

Figure 1 .
Figure 1.The CS coding structure based on the proposed method.

Figure 2 .
Figure 2. The curve and gradient curve of the proposed activation function.

Figure 2 .
Figure 2. The curve and gradient curve of the proposed activation function.

Figure 3 .
Figure 3. Network structure diagram of CNN based on model (5).

Figure 4 .
Figure 4. Structure diagram of the proposed BCS measurement quantization process.

Figure 4 .
Figure 4. Structure diagram of the proposed BCS measurement quantization process.

Figure 5 .Figure 5 .
Figure 5. Network structure diagram for estimating the location parameters.

Figure 5 .Figure 6 .
Figure 5. Network structure diagram for estimating the location parameters.

Figure 6 .
Figure 6.Network structure diagram for estimating the scale parameters.

Figure 5 .
Figure 5. Network structure diagram for estimating the location parameters.

Figure 6 .
Figure 6.Network structure diagram for estimating the scale parameters.

Figure 7 .
Figure 7. Structure diagram of the proposed dequantization process.

Figure 8 .Figure 8 .
Figure 8. Network structure diagram of the inverse process of the proposed quantization.

Figure 8 .Figure 9 .
Figure 8. Network structure diagram of the inverse process of the proposed quantization.

Figure 9 .
Figure 9. Network structure diagram of the measurement information correction module.

Figure 8 .Figure 9 .
Figure 8. Network structure diagram of the inverse process of the proposed quantization.

Figure 8 .Figure 9 .
Figure 8. Network structure diagram of the inverse process of the proposed quantization.

Figure 13 .
Figure 13.PSNR curves of the eight test images.

Figure 13 .
Figure 13.PSNR curves of the eight test images.

.
Figure 14.Visual comparison of different methods at a compression bit rate of 0.1.

Figure 15 .
Figure 15.Average PSNR curves of test image sets.

Figure 15 .
Figure 15.Average PSNR curves of test image sets.

Table 1 .
Number of reconstruction levels for scalar quantization methods.

Table 2 .
Number of reconstruction levels for the proposed quantization.

Table 3 .
MSE of the measurements quantized with 3-bit.

Table 4 .
MSE of the measurements quantized with 6-bit.

Table 5 .
MSE of the measurements quantized with 8-bit.

Table 6 .
MSE and entropy for different λ.

Table 7 .
Quantization performance of different information correction modules.

Table 8 .
The PSNRs of the four datasets at a bit rate 0.1 bpp.

Table 9 .
The PSNRs of the four datasets at a bit rate of 0.2 bpp.

Table 10 .
Detailed network structures of parameter estimation.

Table 11 .
Detailed network structures of the quantization process.