Improved JPEG Coding by Filtering 8 × 8 DCT Blocks

The JPEG format, consisting of a set of image compression techniques, is one of the most commonly used image coding standards for both lossy and lossless image encoding. In this format, various techniques are used to improve image transmission and storage. In the final step of lossy image coding, JPEG uses either arithmetic or Huffman entropy coding modes to further compress data processed by lossy compression. Both modes encode all the 8 × 8 DCT blocks without filtering empty ones. An end-of-block marker is coded for empty blocks, and these empty blocks cause an unnecessary increase in file size when they are stored with the rest of the data. In this paper, we propose a modified version of the JPEG entropy coding. In the proposed version, instead of storing an end-of-block code for empty blocks with the rest of the data, we store their location in a separate buffer and then compress the buffer with an efficient lossless method to achieve a higher compression ratio. The size of the additional buffer, which keeps the information of location for the empty and non-empty blocks, was considered during the calculation of bits per pixel for the test images. In image compression, peak signal-to-noise ratio versus bits per pixel has been a major measure for evaluating the coding performance. Experimental results indicate that the proposed modified algorithm achieves lower bits per pixel while retaining quality.


Introduction
In parallel with developments in the field of image capture technologies, data storage is becoming a significant issue encountered by computer and mobile users. Many encoding methods for image data storage have been developed, which can be divided into lossy and lossless methods. Approaches using various techniques to compress data for storage without losing any bits of information in the original image (often captured by a camera or sensor) are called lossless image encoding methods. Examples [1] of lossless methods include GIF (graphics interchange format), JBIG, and PNG (portable network graphics). In contrast, methods using various techniques to store data such that some unimportant details are lost while retaining visual clarity on users' displays are called lossy image coding methods. Examples of lossy methods include JPEG and BPG (better portable graphics).
Each method has its own advantages and disadvantages. In the field of image compression, methods are evaluated by complexity, compression ratio, and quality of the image obtained. Every method aims to obtain a higher compression ratio, higher quality, and less complexity. Over the previous decade, many methods have competed to obtain a better result. Concerning the above factors and associated trade-offs, JPEG has consistently been the leading image coding standard for lossy compression up to the present day.
Many methods currently provide much better results, but JPEG has been used for the past two decades and still dominates the market. For example, it has been claimed that BPG is likely to overcome JPEG, but this does not seem possible soon. BPG is more complex [2] and thus takes a much longer time to decompile. It was created using high-efficiency video coding (HEVC), which is patented by a company called MPEG LA. It is commonly J. Imaging 2021, 7, 117 2 of 15 expected to require considerable time for BPG to be popularly integrated into existing and future computer systems on the market.
In this study, we propose a method to increase the compression ratio of JPEG images without affecting their quality.

JPEG Image Coding Standard
The JPEG standard was created in 1992. For a detailed study, readers are referred to [3,4]. Figure 1 shows the basic overview of the conventional JPEG encoder. YCbCr color components are obtained from the raw input image in the first step. Based on user choice, chroma components Cb and Cr are downsampled to the 4:2:2 or 4:2:0 type [4]. Each channel is divided into 8 × 8 blocks. A discrete cosine transform (DCT) is applied on the 8 × 8 blocks in the order from left-to-right and top-to-bottom.
J. Imaging 2021, 7, x FOR PEER REVIEW 2 of 14 complex [2] and thus takes a much longer time to decompile. It was created using highefficiency video coding (HEVC), which is patented by a company called MPEG LA. It is commonly expected to require considerable time for BPG to be popularly integrated into existing and future computer systems on the market. In this study, we propose a method to increase the compression ratio of JPEG images without affecting their quality.

JPEG Image Coding Standard
The JPEG standard was created in 1992. For a detailed study, readers are referred to [3,4]. Figure 1 shows the basic overview of the conventional JPEG encoder. YCbCr color components are obtained from the raw input image in the first step. Based on user choice, chroma components Cb and Cr are downsampled to the 4:2:2 or 4:2:0 type [4]. Each channel is divided into 8 × 8 blocks. A discrete cosine transform (DCT) is applied on the 8 × 8 blocks in the order from left-to-right and top-to-bottom. After the DCT, blocks are forwarded for quantization. Luma and Chroma components are quantized using different quantization tables [3]. Quantization tables are generated based on quality factors (QF). The compression ratio and quality of the image are controlled by the QF value. To reduce the redundancy of consecutively occurring DC coefficients, the differential pulse code modulation (DPCM) method is used. In the end, all the processed data is forwarded to the entropy coding module.

Entropy Coding
Data obtained after quantization need to be stored without losing any information. However, instead of saving the data as-is, JPEG compression performs an additional step of entropy coding. Entropy coding achieves additional compression by encoding the After the DCT, blocks are forwarded for quantization. Luma and Chroma components are quantized using different quantization tables [3]. Quantization tables are generated based on quality factors (QF). The compression ratio and quality of the image are controlled by the QF value. To reduce the redundancy of consecutively occurring DC coefficients, the differential pulse code modulation (DPCM) method is used. In the end, all the processed data is forwarded to the entropy coding module.

Entropy Coding
Data obtained after quantization need to be stored without losing any information. However, instead of saving the data as-is, JPEG compression performs an additional step of entropy coding. Entropy coding achieves additional compression by encoding the quantized DCT coefficients more efficiently based on their statistical characteristics [1]. An individual JPEG compression process uses one of two available entropy coding algorithms, either Huffman [5] or arithmetic encoding [6].

Huffman
Huffman coding is an entropy encoding algorithm using a variable-length code table. This table has been derived based on the estimated probability of occurrence for each possible value of the source symbol (such as a character in a file) [1]. The principle of Huffman coding is to assign lower bits to the more frequently occurring data [7]. A dictionary associating each data symbol with a codeword has the property that no codeword in the dictionary is a prefix of any other codeword in the dictionary [8].
In the JPEG encoder, Huffman coding is combined with run-length coding (RLC) and is called the run-amplitude Huffman code [9]. This code represents the run-length of zeros before a nonzero coefficient and the size of that coefficient. The code is then followed by additional bits precisely defining the coefficient amplitude and sign [4,9]. The end-of-block (EOB) marker is coded when the last nonzero coefficient occurs. This strategy is omitted in the rare case that the last element of the 8×8 block is nonzero. In the case of an empty block, i.e., where all AC coefficients are zero, the encoder codes an EOB.

Arithmetic
Compared to Huffman coding, arithmetic coding bypasses the mechanism of assigning a specific code to an input symbol. An interval (0, 1) is divided into several sub-intervals based on the occurrence probability of the corresponding symbol. The ordering sequence is known to both the encoder and decoder. In arithmetic coding, unlike Huffman coding, the number of bits assigned to encode each symbol varies according to their assigned probability [10]. Symbols with lower probability are assigned higher-bit encodings compared to symbols with higher probability, and their probability decreases in inverse proportion to the probability of the occurrence of the character [1]. The key idea of arithmetic encoding is to assign each symbol an interval. Further, each symbol is divided into subintervals equal to their probability [11].
Both Huffman and arithmetic encoding are performed on the data without filtering out empty AC coefficient blocks, which decreases the compression ratio.

Proposed Algorithm
Our proposed algorithm is based on the filtration of 8 × 8 blocks. Figure 2 shows an overview of the proposed JPEG image coding. To maintain equivalent complexity between the conventional and our proposed entropy coding, we used separate modes for arithmetic and Huffman coding. Before forwarding the 8 × 8 blocks to the JPEG entropy encoder, we perform three steps. These three steps are named as (1) filtration of blocks, (2) changing bits, and (3) replacing values. The third step (Section 3.3) is performed only in the case of the Huffman encoding mode. It should be noted that the whole process explained in this section is lossless. During the decoding process, we perform the inverse of these steps, and at the end of the inverse process, we know about the location of empty blocks. Moreover, this process has no additional consequences, as we do not change any of the coefficient values.

Filtration of Blocks
In our proposed algorithm, instead of allowing the encoder to encode the EOB marker for the empty blocks along with the array of non-empty blocks, all the empty blocks are filtered out, and information on the location of empty and non-empty blocks is stored in a separate binary buffer. In this buffer, we store 0 for empty blocks and 1 for nonempty blocks. In the JPEG encoder, the Y component is compressed using a different quantization table compared to the Cb and Cr components. Due to the different nature of their compression, we use separate buffers for the Y, Cb, and Cr components at this stage. Finally, all buffers for Y, Cb, and Cr are concatenated.

Filtration of Blocks
In our proposed algorithm, instead of allowing the encoder to encode the EOB marker for the empty blocks along with the array of non-empty blocks, all the empty blocks are filtered out, and information on the location of empty and non-empty blocks is stored in a separate binary buffer. In this buffer, we store 0 for empty blocks and 1 for non-empty blocks. In the JPEG encoder, the Y component is compressed using a different quantization table compared to the Cb and Cr components. Due to the different nature of their compression, we use separate buffers for the Y, Cb, and Cr components at this stage. Finally, all buffers for Y, Cb, and Cr are concatenated.

Changing Bits
After concatenating Y, Cb, and Cr buffers, we improve the consistency of identical bit sequence occurrences by replacing all the bit values with 0, except the initial bit 1, only in the case where the next bit is different from the current. In this process, identical occurrences of either 0-or 1-bit values are saved as 0. Thus, we further increase the occurrence of zero bits. For example, suppose we have a sequence of bits as "000011110111". We have four consecutive zeros followed by four consecutive ones, indicating that a change occurs at the 5th bit in the sequence. Then, we can observe that the next change occurs at the 9th and 10th bit in the sequence. Hence, the sequence is transformed to "000010001100" with bit 1 placed where the change in the sequence occurs.
In the case of arithmetic encoding mode, after performing this step, we provide our buffer to the binary arithmetic encoder [12]. The remaining of the 8 × 8 blocks, where a nonzero AC coefficient existed, were encoded in a conventional way. After the compression process was completed, we appended our compressed buffer to the remainder of the encoded file.

Changing Bits
After concatenating Y, Cb, and Cr buffers, we improve the consistency of identical bit sequence occurrences by replacing all the bit values with 0, except the initial bit 1, only in the case where the next bit is different from the current. In this process, identical occurrences of either 0-or 1-bit values are saved as 0. Thus, we further increase the occurrence of zero bits. For example, suppose we have a sequence of bits as "000011110111". We have four consecutive zeros followed by four consecutive ones, indicating that a change occurs at the 5th bit in the sequence. Then, we can observe that the next change occurs at the 9th and 10th bit in the sequence. Hence, the sequence is transformed to "000010001100" with bit 1 placed where the change in the sequence occurs.
In the case of arithmetic encoding mode, after performing this step, we provide our buffer to the binary arithmetic encoder [12]. The remaining of the 8 × 8 blocks, where a nonzero AC coefficient existed, were encoded in a conventional way. After the compression process was completed, we appended our compressed buffer to the remainder of the encoded file.

Replacing Values
This step is performed only when the selected encoding mode is Huffman. By observing the nature of Huffman encoding, we can save more space if we convert our data, resulting from Section 3.2, from bits to bytes. Thus, before replacing the values, bits are converted into bytes. After the conversion, there are still long sequences of consecutive zero-valued bytes present, and to get rid of those long sequences, we perform the step of replacing values. Firstly, we calculate the average number of consecutive zero-valued bytes. The average is used because different types of images have different data. For example, in the case of homogeneous images, the average number of consecutive zeros should be higher owing to a larger amount of empty 8 × 8 blocks present consecutively, whereas, in the case of more detailed images, a smaller number of consecutive empty 8 × 8 blocks are present. The number of consecutively occurring zero-valued bytes equal to the calculated average number is replaced with a less frequently occurring byte value, i.e., 255. In the example shown in Figure 3, there are two data buffers; input data is the data obtained after converting the values from bits to bytes, while the other one is the processed data.
zero-valued bytes present, and to get rid of those long sequences, we perform the step of replacing values. Firstly, we calculate the average number of consecutive zero-valued bytes. The average is used because different types of images have different data. For example, in the case of homogeneous images, the average number of consecutive zeros should be higher owing to a larger amount of empty 8 × 8 blocks present consecutively, whereas, in the case of more detailed images, a smaller number of consecutive empty 8 × 8 blocks are present. The number of consecutively occurring zero-valued bytes equal to the calculated average number is replaced with a less frequently occurring byte value, i.e., 255. In the example shown in Figure 3, there are two data buffers; input data is the data obtained after converting the values from bits to bytes, while the other one is the processed data. The total number of zeros was equal to 21 in the original data buffer. These 21 zeros occurred in seven sequences of consecutive zeros. To obtain the average number of consecutively occurring zeros, we divided the total number of zeros with the number of consecutively occurring sequences. Thus, we obtained an average number of zero-valued bytes of three and replaced all three consecutively occurring sequences of three zeros with a constant value of 255.
In the case of floating-point results after division, we round it to the nearest integer. If a byte value of 255 occurs in the input data, we tail it with an additional byte value, e.g., 217, in order to differentiate between a replaced value of 255 and an input data value of 255. The example in Figure 3 demonstrates that after replacement, a sequence with a data size of 29 bytes was reduced to 19 bytes.
As the third step is performed only in the case of JPEG Huffman mode, the processed data is designated for Huffman encoding. After the compression process was completed, we appended our compressed buffer to the remainder of the encoded file. The total number of zeros was equal to 21 in the original data buffer. These 21 zeros occurred in seven sequences of consecutive zeros. To obtain the average number of consecutively occurring zeros, we divided the total number of zeros with the number of consecutively occurring sequences. Thus, we obtained an average number of zero-valued bytes of three and replaced all three consecutively occurring sequences of three zeros with a constant value of 255.
In the case of floating-point results after division, we round it to the nearest integer. If a byte value of 255 occurs in the input data, we tail it with an additional byte value, e.g., 217, in order to differentiate between a replaced value of 255 and an input data value of 255. The example in Figure 3 demonstrates that after replacement, a sequence with a data size of 29 bytes was reduced to 19 bytes.
As the third step is performed only in the case of JPEG Huffman mode, the processed data is designated for Huffman encoding. After the compression process was completed, we appended our compressed buffer to the remainder of the encoded file.

Experimental Results
We conducted an experiment on 15 test images using libjpeg-turbo [13] version 2.0.5. All test images were taken from the JPEG AI dataset [14]. These 15 images, shown in Figure 4, were selected carefully. They include two screenshots, two homogenous images, one image of night view, one image of street daytime view, one item close-up, one human close-up image, and seven additional random images. Thus, in these 15 test images, we included a broad variety of major types of images to obtain a useful and indicative result. Figures 5 and 6 describe the graphical results for all the test images shown in Figure 4. The Y-axis represents the PSNR and SSIM values respectively in Figures 5 and 6, whereas the X-axis represents the BPP. ure 4, were selected carefully. They include two screenshots, two homogenous images, one image of night view, one image of street daytime view, one item close-up, one human close-up image, and seven additional random images. Thus, in these 15 test images, we included a broad variety of major types of images to obtain a useful and indicative result. Figures 5 and 6 describe the graphical results for all the test images shown in Figure 4. The Y-axis represents the PSNR and SSIM values respectively in Figures 5 and 6, whereas the X-axis represents the BPP. As discussed in Section 2.1, chroma components Cb and Cr are downsampled to the 4:2:2 or 4:2:0 type in JPEG [4]. In this paper, we targeted the 4:2:0 subsampling at different QF ranging from low to high to obtain a better and clearer result. The selected QF values were 30, 50, 70, and 90. Graphs were obtained using MATLAB R2020a. We considered PSNR (peak signal-to-noise) and BPP (bits per pixel) to evaluate our obtained results. Compressed buffers after the step detailed in Section 3.2 for arithmetic and after the step is given in Section 3.3 for Huffman were included in the file size for calculating BPP. All the images decoded with the modified JPEG decoder had the same PSNR as the images decoded by the conventional JPEG decoder. This shows the successful implementation of our modified decoder.
In all images, our proposed approach achieved significant improvements. The demonstrated improvement in the case of homogeneous images was greater than for complex images. Among the test images used in the experiment, Figure 4f showed the best result. The proposed algorithm is tested only for high-resolution and original images. The lowest image resolution was 1980 × 1272 among the test images shown in Figure 5. Due to the high possibility of consecutive empty DCT 8 × 8 blocks, the proposed algorithm is considered useful in high-resolution images. To calculate the average gain in BPP for both Huffman and arithmetic mode, shown in Table 1, we used Bjontegaard's metric [15][16][17]. As discussed in Section 2.1, chroma components Cb and Cr are downsampled to the 4:2:2 or 4:2:0 type in JPEG [4]. In this paper, we targeted the 4:2:0 subsampling at different QF ranging from low to high to obtain a better and clearer result. The selected QF values were 30, 50, 70, and 90. Graphs were obtained using MATLAB R2020a. We considered PSNR (peak signal-to-noise) and BPP (bits per pixel) to evaluate our obtained results. Compressed buffers after the step detailed in Section 3.2 for arithmetic and after the step is given in Section 3.3 for Huffman were included in the file size for calculating BPP. All the images decoded with the modified JPEG decoder had the same PSNR as the images decoded by the conventional JPEG decoder. This shows the successful implementation of our modified decoder.
In all images, our proposed approach achieved significant improvements. The demonstrated improvement in the case of homogeneous images was greater than for complex images. Among the test images used in the experiment, Figure 4f showed the best result. The proposed algorithm is tested only for high-resolution and original images. The lowest image resolution was 1980 × 1272 among the test images shown in Figure 5. Due to the high possibility of consecutive empty DCT 8 × 8 blocks, the proposed algorithm is considered useful in high-resolution images. To calculate the average gain in BPP for both Huffman and arithmetic mode, shown in Table 1, we used Bjontegaard's metric [15][16][17]. In Tables 2 and 3, for Huffman and Arithmetic encoding mode, respectively, we describe the test images actual file size encoded by the conventional JPEG encoder [13], file size when we filtered out the empty blocks, and the difference between actual file size and when we excluded the empty blocks from encoding. This difference indicates the size taken by the empty blocks in encoded images. We added another column of the proposed method. It shows the additional data required by the proposed method to encode the empty blocks and their locations. Moreover, the column named "Gain" in Tables 2 and 3 represents the ratio of the size required to encode the empty blocks by the conventional JPEG encoder to the proposed JPEG encoder. All the sizes in Tables 2 and 3 are calculated in bytes.