Remote Sensing Image Compression Based on Direction Lifting-Based Block Transform with Content-Driven Quadtree Coding Adaptively

: Due to the limitations of storage and transmission in remote sensing scenarios, lossy compression techniques have been commonly considered for remote sensing images. Inspired by the latest development in image coding techniques, we present in this paper a new compression framework, which combines the directional adaptive lifting partitioned block transform (DAL-PBT) with content-driven quadtree codec with optimized truncation (CQOT). First, the DAL-PBT model is designed; it calculates the optimal prediction directions of each image block and performs the weighted directional adaptive interpolation during the process of directional lifting. Secondly, the CQOT method is proposed, which provides different scanning orders among and within blocks based on image content, and encodes those blocks with a quadtree codec with optimized truncation. The two phases are closely related: the former is devoted to image representation for preserving more directional information of remote sensing images, and the latter leverages adaptive scanning on the transformed image blocks to further improve coding efﬁciency. The proposed method supports various progressive transmission modes. Experimental results show that the proposed method outperforms not only the mainstream compression methods, such as JPEG2000 and CCSDS, but also, in terms of some evaluation indexes, some state-of-the-art compression methods presented recently.


Introduction
Along with the rapid development of remote sensing technology, it is becoming easier to acquire high spatial resolution remote sensing images from various satellites and sensors, which is undoubtedly beneficial to the application of remote sensing images [1,2]. On the other hand, massive data has brought a great burden to the storage and transmission of remote sensing images. Therefore, compression technology is indispensable in remote sensing image processing. In general, traditional compression methods can be used for the compression of remote sensing images, such as Embedded Zerotree Wavelet (EZW) [3], Set Partitioning in Hierarchical Tree (SPIHT) [4], Set Partitioned Embedded Block Coder (SPECK) [5], and Joint Photographic Experts Group 2000 (JPEG2000) [6], or some improved versions of them [7][8][9][10]. However, the remote sensing image has its own unique characteristics, which usually include complicated spatial information, clear details, and well-defined geographical objects, and even some small targets [11]. This means that some traditional methods that rarely consider high-frequency information and mainly focus on retaining more low-frequency information rate, respectively. However, the EBCOT method only considers the correlation within a subband, and there are a lot of truncated points that need to be stored during the process of running the algorithm [9,36].
Besides EBCOT, tree-based coding methods have been widely used. Some of the embedded coding methods are based on zerotree, such as [3][4][5][37][38][39]. The zerotree-based method exploits the property of self-similarity across scales in wavelet transformed images, which construct some zerotrees by spatial similarity of the wavelet coefficients at different scales. As a result, a great deal of non-significant coefficients can be represented by a root of the zero tree. Recently, some binary, tree-based coding methods were proposed for the compression of remote sensing images. The authors of [40] present a binary tree codec adaptively (BTCA) for on-board image compression. As a novel and robust coding way, the BTCA method has proven to be very effective. In [41], a human vision-based adaptive scanning (HAS) for the compression of remote sensing images was proposed, which combines the human visual model with the BTCA, and can provide a good visual quality for remote sensing images. Additionally, the quadtree-based coding methods are also presented for the compression of remote sensing image coding. The authors of [24] propose a compression method that is based on quadtree coding for Synthetic Aperture Radar (SAR) images and obtains a good coding performance.
In 2017, the state-of-the-art compression method based on quadtree set partitioning, known as quadtree coding with adaptive scanning and optimized truncation (QCAO), was proposed [42]. The main concept of this method is that the entire wavelet image is divided into several blocks and encoded individually. In the process of coding, the significant nodes and their neighbors are encoded before other nodes, and then the optimized truncation is followed to improve the coding efficiency. Although the QCAO is an effective coding method, similar to EBCOT, it mainly utilizes the clustering characteristics within a subband, but seldom considers the relations among subbands.
In this paper, a new compression method for remote sensing images is proposed. In order to improve the compression efficiency, the image representation phase and coding phase are jointly considered. In addition, both the characteristics of subbands and the scanning order among subbands and blocks are utilized in the process of coding. The overall framework of the proposed compression method is shown in Figure 1. The main contribution of the proposed method consists of two parts: One part is the directional adaptive lifting partitioned block transform (DAL-PBT), which can provide the optimal prediction direction for each image block and preserve more orientation properties of remote sensing images by directional interpolation. It should be noted that the size of the image block here is consistent with that of the following coding phase. The main contribution of the proposed method consists of two parts: One part is the directional adaptive lifting partitioned block transform (DAL-PBT), which can provide the optimal prediction direction for each image block and preserve more orientation properties of remote sensing images by directional interpolation. It should be noted that the size of the image block here is consistent with that of the following coding phase.
Another part is the content-driven quadtree codec with optimized truncation (CQOT), which establishes the quadtree for each image block by analyzing the scanning order among blocks and the scanning method of each block based on the image content and the characteristics of subbands, respectively. Then, the coding and optimized truncation is followed for each quadtree in order to provide a good overall coding performance.
The remainder of the paper is organized as follows. In Section 2, the proposed DAL-PBT model is described in detail. In Section 3, we first describe the content-driven adaptive scanning method, before the CQOT method is introduced. Section 4 gives some quality evaluation indicators used in this paper. In Section 5, some numerical experiments are performed and prove the high effectiveness of the proposed compression method. Finally, the conclusions are provided in Section 6.

The Directional Adaptive Lifting Partitioned Block Transform (DAL-PBT)
The two-dimensional (2-D) DWT is the most important image representation technique of the last decade [43]. Conventionally, the 2-D DWT is carried out as a separable transform by cascading two 1-D transforms in the vertical and horizontal directions, which only use the neighboring elements in the horizontal or vertical direction. However, most remote sensing images contain a lot of directional information, such as edges, contours of terrain, and complex texture of objects, which ensure that the conventional 2-D DWT fails to provide a good representation for them. How to provide an efficient image representation is one of the keys to improving the coding performance of remote sensing images. In this paper, a new DAL-PBT model is proposed. It first divides an image into several blocks and then calculates their optimal lifting directions. Then, a directional interpolation filter is utilized during the lifting process to improve the orientation property of the interpolated remote sensing images. The DAL-PBT model is described in detail as follows.

The Structure of Adaptive Directional Lifting Scheme
Without loss of generality, we present a general formulation of the 2-D DWT implemented with lifting. The authors of [44] point out that the 2-D wavelet transforms can be realized by one pair of the lifting steps, i.e., one prediction step followed by one update step. The typical lifting process consists of four stages: split, prediction, update, and normalize [45]. The 1-D directional lifting wavelet transform and inverse transform are shown in Figure 2a Another part is the content-driven quadtree codec with optimized truncation (CQOT), which establishes the quadtree for each image block by analyzing the scanning order among blocks and the scanning method of each block based on the image content and the characteristics of subbands, respectively. Then, the coding and optimized truncation is followed for each quadtree in order to provide a good overall coding performance.
The remainder of the paper is organized as follows. In Section 2, the proposed DAL-PBT model is described in detail. In Section 3, we first describe the content-driven adaptive scanning method, before the CQOT method is introduced. Section 4 gives some quality evaluation indicators used in this paper. In Section 5, some numerical experiments are performed and prove the high effectiveness of the proposed compression method. Finally, the conclusions are provided in Section 6.

The Directional Adaptive Lifting Partitioned Block Transform (DAL-PBT)
The two-dimensional (2-D) DWT is the most important image representation technique of the last decade [43]. Conventionally, the 2-D DWT is carried out as a separable transform by cascading two 1-D transforms in the vertical and horizontal directions, which only use the neighboring elements in the horizontal or vertical direction. However, most remote sensing images contain a lot of directional information, such as edges, contours of terrain, and complex texture of objects, which ensure that the conventional 2-D DWT fails to provide a good representation for them. How to provide an efficient image representation is one of the keys to improving the coding performance of remote sensing images. In this paper, a new DAL-PBT model is proposed. It first divides an image into several blocks and then calculates their optimal lifting directions. Then, a directional interpolation filter is utilized during the lifting process to improve the orientation property of the interpolated remote sensing images. The DAL-PBT model is described in detail as follows.

The Structure of Adaptive Directional Lifting Scheme
Without loss of generality, we present a general formulation of the 2-D DWT implemented with lifting. The authors of [44] point out that the 2-D wavelet transforms can be realized by one pair of the lifting steps, i.e., one prediction step followed by one update step. The typical lifting process consists of four stages: split, prediction, update, and normalize [45]. The 1-D directional lifting wavelet transform and inverse transform are shown in Figure 2a   We consider a 2-D image x(m, n) m,n∈Z . First, all samples of the image are split into two parts: the even sample subset x e and the odd sample subset x o . In the predict stage, the odd elements are predicted by the neighboring even elements with a prediction direction obtained via a discriminant criteria. Suppose the direction adaptive prediction operator is DA_P; then, the predict process can be represented as In the update stage, the even elements are updated by the neighboring prediction error with the same direction of the predict stage. Suppose the direction adaptive update operator is DA_U, the updated process can be described as Here, the directional prediction operator DA_P is The directional update operator DA_U is The p i and u j are the coefficients of high pass filter and low pass filter, respectively. θ v represents the direction of prediction and update.
Finally, the outputs of the lifting are weighted by the coefficients K e and K o , which normalize the scaling and wavelet functions, respectively.
For a remote sensing image, if the directional lifting is carried out along the edge or texture direction of the image, then the energy of the high frequency subbands will decrease, which is advantageous for improving the coding performance. In this paper, the integer pixels and the fractional pixels close to the predicted pixel are considered. Obviously, the more reference lifting directions, the better representation of the image block. However, more side information is needed. On the other hand, if there are very few reference lifting directions, the image cannot be well represented. In this paper, 15 reference lifting directions are chosen for one-dimensional horizontal and vertical transformations, which are shown in Figure 3a,b, respectively. Here, the directional filter can be represented along the direction d = d x , d y T , d ∈ R 2 . Therefore, the 15 reference directions can be recorded as d −7

Image Segmentation and the Calculation of Optimal Prediction Direction
In order to ensure that the direction of the lifting is consistent with the local direction of the image, image segmentation is carried out. In [46], a rate-distortion optimization segmentation method based on quadtree is adopted, which divides a natural image recursively into blocks with different sizes by quad-tree segmentation. However, the efficiency of this segmentation method is closely related to the content of the image. For the remote sensing image, it usually reflects complex landforms rich in details, but rarely large flat areas, so the adaptive segmentation method hardly shows its advantages. The reason for this is that for images with complex content, the adaptive segmentation result most likely shows all of the blocks with the smallest allowed size, which is nearly equivalent to equal size segmentation but with a higher computational cost. In addition, for rate-distortion optimization segmentation method based on quadtree, the "segmentation tree" at each bit rate is different, and all of them should be transmitted to the receiver for correct decoding. The more complex the image is, the more branches of the "segmentation tree" there are, which leads to more side information.
Therefore, the rate-distortion optimization segmentation method based on quadtree is not very suitable for remote sensing images.
Remote Sens. 2018, 10, x FOR PEER REVIEW 6 of 25 rate-distortion optimization segmentation method based on quadtree, the "segmentation tree" at each bit rate is different, and all of them should be transmitted to the receiver for correct decoding. The more complex the image is, the more branches of the "segmentation tree" there are, which leads to more side information. Therefore, the rate-distortion optimization segmentation method based on quadtree is not very suitable for remote sensing images.
(a) (b) Based on the above analysis, the block segmentation method with the same size is directly adopted for remote sensing images. In order to provide a better coding performance, the size of the partition block here is the same as that of the following coding phase.
For the directional lifting-based wavelet transform, the prediction error is closely related to the high frequency subbands. For an image block, the optimal prediction direction is the direction with minimal residual information in high frequency subbands.   Based on the above analysis, the block segmentation method with the same size is directly adopted for remote sensing images. In order to provide a better coding performance, the size of the partition block here is the same as that of the following coding phase.
For the directional lifting-based wavelet transform, the prediction error is closely related to the high frequency subbands. For an image block, the optimal prediction direction is the direction with minimal residual information in high frequency subbands.
Suppose the reference direction set is θ re f , which includes 15 directions, {−7, −6, −5, −4, −3, −2, −1, 0, 1, 2, 3, 4, 5, 6, 7}, and the total number of blocks is N a . For each block B l , l = 0, 1, . . . , N a − 1, the optimal prediction direction θ * opti can be calculated as follows: Here, D(·) is the function of the image distortion, and x(m, n) represents the value at the position (m, n) in the block B l . In this paper, let D(·) =|·|. Equation (6) illustrates that the optimal prediction direction of an image block B l is the direction with a minimal prediction error. The process of calculating the optimal prediction direction of a block is shown in Figure 4. For an original image, the directional wavelet transform is carried out along all the reference directions θ re f ,i (i = 1, 2, . . . , 15), respectively. Following this, for each image block B l of these transformed images, we are looking for the "tile" with a minimum prediction error, and its corresponding prediction direction is the optimal prediction direction θ * opti . We repeat this process, until all the optimal prediction directions of all the image blocks are determined. For each block, with the optimal prediction direction θ * opti , the predicted process and the updated process of all its elements are carried out via Equations (4) and (5) (6) illustrates that the optimal prediction direction of an image block l B is the direction with a minimal prediction error.
The process of calculating the optimal prediction direction of a block is shown in Figure 4. For an original image, the directional wavelet transform is carried out along all the reference directions , respectively. Following this, for each image block l B of these transformed images, we are looking for the "tile" with a minimum prediction error, and its corresponding prediction direction is the optimal prediction direction * o p ti θ . We repeat this process, until all the optimal prediction directions of all the image blocks are determined. For each block, with the optimal prediction direction * o p ti θ , the predicted process and the updated process of all its elements are carried out via Equations (4) and (5) Compared with the adaptive segmentation method, the proposed direction lifting-based wavelet transform does not need to transmit the "segmentation tree" at all bit rates, and it only needs to transmit a small number of prediction directions as side information. Therefore, the side information of the proposed method is low.

Directional Adaptive Interpolation
For the directional lifting-based wavelet transform, the lifting direction often needs the values at fractional positions, i.e., the tangent of the lifting direction tanθ is not always an integer. In this situation, it is necessary to interpolate those samples at fractional locations. The Sinc interpolation is a very popular interpolation method. However, it only uses the coefficients in the horizontal or vertical direction, which may blur the direction characteristics of the image. In this paper, a directional interpolation method that interpolates the sub-pixels along the local texture directions by using the adjacent integer pixels is adopted. As an example, for the horizontal transform, the process of direction interpolation is shown in Figure 5.
In Figure 5, for different sub-pixel positions, some different integer pixels are used to interpolate the sub-pixel, and the interpolation direction is adapted to the signal properties for interpolation [47]. For example, in order to interpolate a quarter-position coefficient, not only the integer coefficients { }  Compared with the adaptive segmentation method, the proposed direction lifting-based wavelet transform does not need to transmit the "segmentation tree" at all bit rates, and it only needs to transmit a small number of prediction directions as side information. Therefore, the side information of the proposed method is low.

Directional Adaptive Interpolation
For the directional lifting-based wavelet transform, the lifting direction often needs the values at fractional positions, i.e., the tangent of the lifting direction tan θ is not always an integer. In this situation, it is necessary to interpolate those samples at fractional locations. The Sinc interpolation is a very popular interpolation method. However, it only uses the coefficients in the horizontal or vertical direction, which may blur the direction characteristics of the image. In this paper, a directional interpolation method that interpolates the sub-pixels along the local texture directions by using the adjacent integer pixels is adopted. As an example, for the horizontal transform, the process of direction interpolation is shown in Figure 5.
In Figure 5, for different sub-pixel positions, some different integer pixels are used to interpolate the sub-pixel, and the interpolation direction is adapted to the signal properties for interpolation [47]. For example, in order to interpolate a quarter-position coefficient, not only the integer coefficients {c −2 , c −1 , c 0 , c 1 } are used but also the coefficients {c −3 , c 2 } along the predicted direction. The directional interpolation filter should be constructed by these coefficients {c −3 , c −2 , c −1 , c 0 , c 1 , c 2 } on the integer position and make a prediction to the sub-pixel position. The final coefficients of the directional interpolation filter are determined by three kinds of filters: bilinear filter, Telenor 4-tap filter, and 2-tap filter. The directional interpolation process is shown in Figure 6. The coefficients of these filters are listed in Table 1. on the integer position and make a prediction to the sub-pixel position. The final coefficients of the directional interpolation filter are determined by three kinds of filters: bilinear filter, Telenor 4-tap filter, and 2-tap filter. The directional interpolation process is shown in Figure 6. The coefficients of these filters are listed in Table 1.

Boundary Handing and Semidirection Displacement
The lifting operations of different image blocks are performed with their own optimal prediction directions. However, the boundary effect is a problem. In this paper, for the lifting of block boundaries, the adjacent pixels of other blocks are used, unless the boundary pixels are at the edge of the image. As a result, the error can be restrained to some extent and can avoid the boundary effect. on the integer position and make a prediction to the sub-pixel position. The final coefficients of the directional interpolation filter are determined by three kinds of filters: bilinear filter, Telenor 4-tap filter, and 2-tap filter. The directional interpolation process is shown in Figure 6. The coefficients of these filters are listed in Table 1.

Boundary Handing and Semidirection Displacement
The lifting operations of different image blocks are performed with their own optimal prediction directions. However, the boundary effect is a problem. In this paper, for the lifting of block boundaries, the adjacent pixels of other blocks are used, unless the boundary pixels are at the edge of the image. As a result, the error can be restrained to some extent and can avoid the boundary effect.   Figure 6 shows that {c −3 , c 2 } are the inputs of the bilinear filter, {c −2 , c −1 , c 0 , c 1 } are the inputs of the Telenor 4-tap filter, and both the outputs of the bilinear filter and the Telenor 4-tap filter are similar to the inputs of the 2-tap filter. As a result, the outputs of the 2-tap filter are the weighted coefficients of the directional interpolation filter.

Boundary Handing and Semidirection Displacement
The lifting operations of different image blocks are performed with their own optimal prediction directions. However, the boundary effect is a problem. In this paper, for the lifting of block boundaries, the adjacent pixels of other blocks are used, unless the boundary pixels are at the edge of the image. As a result, the error can be restrained to some extent and can avoid the boundary effect.
The processing of the semi-direction is another problem that might need attention. Assuming that a row transform is carried out firstly, the size of the image block is changed, which makes the block direction no longer accurate. To solve this problem, some studies try to reduce the lifting direction of a block by half, before performing another one-dimensional transform with the half angle, such as in [16]. However, the half-angle may not correspond exactly to the optimal prediction direction previously calculated, which may lead to direction deviation. In addition, the deviation will be accumulated with the increment of the wavelet decomposition levels. We adopt the directional interpolation method introduced in Section 2.3 for solving this problem. The rectangular block is interpolated into a square block first, and then, another one-dimensional transform is performed using the previously calculated optimal direction.

A Content-Driven Quadtree Coding Method for Remote Sensing Images
Tree-based coding methods have been widely used in recent years. In 2017, a state-of-the-art coding method based on scanning for remote sensing images, known as quadtree coding with adaptive scanning order and optimized truncation (QCAO), was proposed [42]. The main idea of QCAO is that the transformed image is divided into several blocks, which are encoded by quadtree coding method, respectively. Then, the optimized truncation is performed for each codestream of a block in order to improve the coding performance.
Although the QCAO method is somehow related to the content of the image, it only considers the scanning process within a block. However, the contents of different images are often very different, especially for remote sensing images. In order to improve the coding performance, the scanning order among subbands should be different according to the image content. In addition, for a given subband, the scanning order and scanning methods among and within blocks should be determined while taking the characteristics of the subband into account. Therefore, a content-driven quadtree codec with optimized truncation (CQOT) is introduced, which provides an adaptive scanning method based on the image content and the characteristics of subbands, and then encodes more significant coefficients as much as possible at the same bit rate. The proposed CQOT algorithm is described in detail as follows. For the image compression at the same bit rate, different scanning orders can lead to different coding efficiencies, which makes the scanning order very important. Generally, a 2-D transformed image can be converted into a 1-D coefficient sequence using some traditional scanning strategies, such as zigzag scan, morton scan, or raster scan. However, these scanning methods do not take the image content and the characteristics of the wavelet subband into consideration, which will have an impact on the coding performance. Moreover, remote sensing images usually contain some important details, such as terrain contours, complex object textures, or even small targets, which also need to be preserved as much as possible during the encoding process. Therefore, how to design an effective adaptive scanning method is also an important issue for the compression of remote sensing images.

Content-Driven Subband Scan and Block Scan
In this paper, an adaptive scanning method is presented. This scanning method can provide different scanning orders among subbands based on the image content, and different scanning orders and scanning methods among and within a block according to the characteristics of subbands. The adaptive scanning process based on the image content and the characteristics of subbands (ASCC) is shown in function ASCC (Algorithm 1). Algorithm 1. Function S = ASCC(X, J, b).

Input:
The transformed image X with the DAL-PBT model, the decomposition level J, and the size of block b.

•
Calculate the energy of each subband, and represent it as E λ,θ .
r and c are the number of row and column of the current subband X(l, θ), respectively; X(l, θ)(i, j) represents the value of the current subband at location (i, j).

•
Determine the scanning order among subbands according to the descending order of E l,θ . • Determine the scanning order among blocks in a subband based on the characteristics of this subband.
-If these blocks are in the lowest frequency subband or horizontal subband, the "horizontal z-scan" is adopted. -If these blocks are in the vertical subband, the "vertical z-scan" is performed. -If these blocks are in the diagonal subband, then the scanning order among blocks depends on the energy of horizontal subband and vertical subband of this level. 1 If E l,2 ≥ E l,4 , the "horizontal z-scan" is adopted. 2 If E l,2 < E l,4 , the "vertical z-scan" is adopted.
• Determine the scanning method within a block according to the characteristics of this subband. - If it is in the lowest frequency subband or a horizontal subband, the "horizontal z-scan" is adopted. -If it is in a vertical subband, the "vertical z-scan" is performed. -If it is in a diagonal subband, then the scanning method depends on the horizontal subband and vertical subband of this level. 1 If E l,2 ≥ E l,4 , the "horizontal z-scan" is performed to this block. 2 If E l,2 < E l,4 , the "vertical z-scan" is performed to this subband.

End
Output: The generated 1-D coefficient sequence S by scanning the 2-D transformed image X.
It has been demonstrated that from the ASCC algorithm, the adaptive scanning process is determined by the image content and the characteristics of subbands together. To explain this algorithm clearly, we give an example of the ASCC algorithm. The remote sensing image "Europa3" is chosen as a test image. Suppose the decomposition level is 3, after the wavelet transform with DAL-PBT model, the scanning order among the subbands is determined according to the descending order of energy of these subbands, i.e., LL 3 , LH 3 , HH 3, HL 3, LH 2 , HH 2 , HL 2 , LH 1 , HL 1 , HH 1 . Following this, the scanning order among blocks in each subband is determined based on the characteristics of subbands. Take the subbands LH 1 , HL 1 , and HH 1 , for example: the scanning order among blocks in LH 1 is conducted with the "vertical z-scan" method, and that in HL 1 follows the "horizontal z-scan" method. Because the energy of LH 1 is larger than that of HL 1 , the scanning order among blocks in HH 1 follows the "vertical z-scan" method. Finally, for a given block, its scanning method depends on the characteristics of the subband in which the block is located. The adaptive scanning process of the test image is shown in Figure 7. In Figure 7, take a 4 × 4 block, for example: the scanning result and the corresponding quadtree is provided. The numbers outside the circles represent the establishing order for the quadtree. The numbers in the circles are the coefficient values. Moreover, the green circles and the brown circles refer to the significant coefficients at different scanning thresholds of the quadtree, which will be discussed in detail in Section 3.2.
Remote Sens. 2018, 10, x FOR PEER REVIEW 11 of 25 adaptive scanning process of the test image is shown in Figure 7. In Figure 7, take a 4 × 4 block, for example: the scanning result and the corresponding quadtree is provided. The numbers outside the circles represent the establishing order for the quadtree. The numbers in the circles are the coefficient values. Moreover, the green circles and the brown circles refer to the significant coefficients at different scanning thresholds of the quadtree, which will be discussed in detail in Section 3.2.

Content-Driven Quadtree Codec with Optimized Truncation (CQOT)
The tree-based coding method has been widely used in recent years. Inspired by the most recent development in the quadtree coding technique [42], we propose a CQOT method that combines the quadtree codec with the content-based adaptive scanning method, introduced in the last section. In this framework, for a transformed image with a DAL-PBT model, the ASCC method is adopted first to provide the whole scanning order of the image. Secondly, for each block, a 1-D coefficient sequence is obtained after scanning, and a quadtree is established based on the sequence by comparing the four adjacent coefficients in turn. Then, each quadtree of a block is encoded individually, and the optimized truncation is performed by setting the truncation points during the coding process. The CQOT algorithm is described as Algorithm 2.

Content-Driven Quadtree Codec with Optimized Truncation (CQOT)
The tree-based coding method has been widely used in recent years. Inspired by the most recent development in the quadtree coding technique [42], we propose a CQOT method that combines the quadtree codec with the content-based adaptive scanning method, introduced in the last section. In this framework, for a transformed image with a DAL-PBT model, the ASCC method is adopted first to provide the whole scanning order of the image. Secondly, for each block, a 1-D coefficient sequence is obtained after scanning, and a quadtree is established based on the sequence by comparing the four adjacent coefficients in turn. Then, each quadtree of a block is encoded individually, and the optimized truncation is performed by setting the truncation points during the coding process. The CQOT algorithm is described as Algorithm 2.

Algorithm 2. Function [code, P] = CQOT(X, J, b, T k ).
Input: The transformed image X with the DAL-PBT model, and the wavelet decomposition level J; the size of each block b, T k is the current threshold.

•
For each quadtree Γ of a block, suppose that the total number of coefficients is L 0 , the height of the quadtree is H = log(L 0 )/log 4, and the total number of nodes of the quadtree is Establish the quadtree Γ for each block based on the 1-D coefficient sequence obtained by the scanning order S, respectively.
Calculate the series of optimized truncation points P j k ; The codestream code of the transformed image at the given threshold T k , and the truncation points set p. } For a transformed image, the neighbors of the significant coefficients are usually important, because they often reflect some contour or edge information, which is very beneficial for preserving the details of remote sensing images. The CQOT algorithm scans the neighbors of the previous significant coefficients before other regions are scanned. A simple example is shown in Figure 8. In Figure 8, the green nodes are the significant coefficients at the current threshold, and the brown nodes are the previous significant coefficients. During the process of scanning, the neighbors of those brown nodes are encoded in priority, and then the encoding of green nodes follows.
For a transformed image, the neighbors of the significant coefficients are usually important, because they often reflect some contour or edge information, which is very beneficial for preserving the details of remote sensing images. The CQOT algorithm scans the neighbors of the previous significant coefficients before other regions are scanned. A simple example is shown in Figure 8. In Figure 8, the green nodes are the significant coefficients at the current threshold, and the brown nodes are the previous significant coefficients. During the process of scanning, the neighbors of those brown nodes are encoded in priority, and then the encoding of green nodes follows.  To apply a rate-distortion optimization, the question of how to obtain the valid truncation points is an important issue. Similar to EBCOT, a Post-Compression-Rate-Distortion (PCRD) algorithm is adopted to select the candidate truncation points for each block so that the minimum distortion can be obtained at a given bit rate.
For the jth candidate truncation point of a block, suppose the bit rate is R j and corresponding distortion is D j , then represents the distortion-rate of the block. If P j+1 > P j , the jth candidate truncation point is not selected; then, the distortion-rate of the block is updated with Following this, the P j+1 is compared with P j−1 . If P j+1 > P j−1 , the (j − 1)th truncation point should also not be selected, and we update the distortion-rate of the block with The process is carried out until P j+1 < P j k , then Finally, the strictly decreasing P j k is selected as the series of valid truncation points. The QC algorithm adopted in the CQOT algorithm is described as follows. Algorithm 3 uses a stack to avoid a recursive function. Input: Γ represents a quadtree, and j is the index of a node of the quadtree. T k represents the threshold.

The Overhead of Bits
For the proposed compression method, the bit overhead, including the optimal prediction directions of those image blocks, the scanning order among the subbands and the scanning method of those diagonal subbands are also encoded and sent to the decoder with the bitstream.
Suppose the size of the image is M × N and the block size is P × Q. Then, the number of blocks is MN/PQ, which means that the number of optimum prediction directions that need to be saved is MN/PQ. Since the number of reference prediction directions is 15, one byte can store two prediction directions (record a direction with 4 bit). That is, MN/2PQ bytes are required to store those prediction directions. Moreover, suppose the number of wavelet decomposition levels is J. Then, only 3J + 1 statuses are needed to represent the scanning order among the subbands. Additionally, J statuses are needed to represent the scanning methods of J diagonal subbands. If the wavelet decomposition is inferior or equal to five levels, the (3J + 1) + J status can be stored with no more than 11 bytes (record each status with 4 bit). Usually, the maximum byte overhead of the side information that needs to be sent to the decoder is MN/2PQ + 11.
We take the image in Figure 8 as an example. The size of 'Europa3' is 512 × 512. According to [42], the proper size of the image block is 64 × 64. Then, it requires (512 × 512)/(2 × 64 × 64) = 32 bytes to store those optimal prediction directions. In addition, we record the scanning order among the subbands according to LL J , HL i , HH i , and LH i (i = J, J − 1, . . . . . . , 1), and the scanning order of those diagonal subbands according to HH i (i = J, J − 1, . . . . . . , 1). Based on the algorithm ASCC, the scanning order among the subbands is 1, 4, 2, 3, 7, 5, 6, 9, 8, and 10, and the scanning methods of the diagonal subbands are 11, 11, and 11. (The 11 represents the "vertical z-scan" method, and 12 represents the "horizontal z-scan" method). Therefore, they can be stored with 7 bytes. The bit rate of side information is (32 + 7) × 8/512 × 512 = 0.00119 bpp. It is worth mentioning that, if the size of the image is larger or the bit depth of the image is greater than 8 bit, the proportion of overhead bits is smaller. Moreover, if entropy encoding is used for these overhead bits, the overhead cost would be further reduced. Based on the analysis above, the overhead bits of the proposed compression method are extremely small, which makes it have almost no impact on the coding performance.

Quality Evaluation Index
In order to evaluate the performance of the proposed compression method comprehensively, in this paper, PSNR (peak signal-to-noise ratio, dB), MS-SSIM (multi-scale structural similarity method), and VIF (visual information fidelity) are chosen as the evaluation indexes, respectively.

PSNR
Suppose X is the original image, its size is M × N and its possible maximum value is L. Suppose Y is the reconstructed image, then the PSNR can be represented as

MS-SSIM
The MS-SSIM incorporates the variations of viewing conditions and is more flexible than SSIM [43]. Thus, we adopt the MS-SSIM as an evaluation index in this paper.
Here, l M (x, y) represents the luminance comparison at scale M, and c j (x, y) and s j (x, y) represent the contrast and structure comparison at scale j, respectively. The exponents α M , β j , and γ j are used to adjust the relative importance of different components.

VIF
Based on the theory in [48], the model of VIF can be represented as Here, X(i, j) is the original image, Y(i, j) is the reconstructed image, χ(i, j) is the image captured by humans, and W(i, j) is the noise image of the human vision system (HVS). X(i, j) is decomposed into several subbands (X 1 (i, j), X 2 (i, j), . . . , X M (i, j)) by wavelet transformation. β(i, j) is the value of the window used in the distortion channel estimation. V(i, j) is a stationary additive, zero-mean Gaussian noise. If the variance of W(i, j) is σ W , c = 0.01, and Then

Experiments and Discussion
In this section, some experiments are carried out to evaluate the performance of the proposed compression method. The proposed compression method is compared with other five scan-based methods, which are the mainstream or the state-of-the-art compression methods, in terms of different evaluation indexes at different bit rates.

Space-Borne Images from Different Sensors
To fully verify the performance of the proposed compression method, the test space-borne images come from different sensors. Some test images are from a CCSDS test image set [49], which includes a variety of space imaging instrument data, such as planetary, solar, stellar, earth observations, and radar. In this paper, "lunar", "ocean_2kb1", and "pleiades_portdebouc_pan" are chosen. Moreover, the test image "Miramar NAS" is derived from the USCSIPI Image Database [50]. In addition, two other remote sensing images are chosen. "Pavia" is derived from an image acquired by the Quickbird sensor over Pavia, Italy, which has a resolution of 0.6 m. "Houston" is derived from an image acquired by the World View-2 sensor over Houston, USA in 2013, which has a resolution of 0.5 m. We crop the left upper part of these images with a size of 512 × 512 for comparison under the same condition. The test image set is shown in Figure 8. The data source and bit depth of the test image set are listed in Table 2.

Performance Comparison of the Proposed Compression Method with Other Scan-Based Methods
In the experiments, the 9/7-tap biorthogonal wavelet filter frame is used for the directional adaptive lifting. The size of the image block is set to 64 × 64. All the test remote sensing images are compressed by the proposed compression methods and the other five scan-based methods, respectively. The methods used for comparison are the mainstream or the state-of-the-art compression methods, including CCSDS 122.0-B-1, JPEG2000, BTCA, HAS, and QCAO. The compared results, in terms of PSNR, MS-SSIM, and VIF, are tabulated in Tables 3-5. The results are evaluated at eight bit rates, namely, 0.0313 bpp, 0.0625 bpp, 0.125 bpp, 0.25 bpp, 0.5 bpp, 1 bpp, 2 bpp, and 3 bpp.  Table 4. PSNR is a quality index that evaluates an algorithm under the sense of a means squared error (MSE). In Table 3, the compared results are listed from the perspective of MSE. It can be seen that the coding performance of CCSDS is the worst. The reason for this is that the CCSDS is designed for onboard compression, which considers the complexity of the algorithm more than the coding performance. As a result, the low complexity of the algorithm is at the cost of the coding performance. The PSNR results of JPEG2000 are good, because some complicated models of JPEG2000, such as the context model and rate-distortion optimization, can help to provide a good coding performance. Compared with JPEG2000, BTCA can encode the neighbors of those significant coefficients in priority, which makes it more advantageous for the compression of remote sensing images, because more details can be preserved. Therefore, the PSNR results of BTCA are better than those of JPEG2000 when bit rate is not very high. The HAS algorithm is an improved version of BTCA that takes human vision into consideration, aiming to improve the visual quality of the reconstructed remote sensing images. It can be seen that from Table 3, the PSNR results of HAS are similar to those of BTCA. The QCAO method is the state-of-the-art compression method, which is an efficient image compression method based on quadtree, and it can provide quality, resolution, and position scalability. The PSNR results of QCAO are higher than those of the previously mentioned methods. The proposed compression method can provide the best coding performance in most cases. The reason is that the designed DAL-PBT model can provide a more efficient representation for remote sensing images, and the CQOT method can provide a reasonable scanning method among and within blocks based on image content, before establishing some quadtrees based on the scanning results. The effective image representation and appreciate scanning mode are all beneficial to the improvement of the coding performance. When the bit rate is very high, sometimes the JPEG2000 can provide a higher PSNR. Table 4 compares the results in terms of the MS-SSIM. The MS-SSIM can provide a whole approximation to the perceived image quality from the view of structural similarity. From Table 4, we can see that the proposed compression method is still better than the other five compression methods in most cases, especially at low bit rates. The reason is that the DAL-PBT model can provide a good representation of the high frequency information of remote sensing images, and the CQOT method encodes the brothers of significant coefficients in priority during the coding process, which ensures that more important outlines and details of remote sensing images are preserved. Therefore, the structural similarity between the original image and the reconstructed image is improved. As the bit rate increases, some other compression methods, such as the HAS-based method, can sometimes provide a better result. Take the "Houston", for example; Table 4 shows that the MS-SSIM results of the HAS-based method are 0.9512, 0.9788, 0.9928, and 0.9976 at 0.5 bpp, 1 bpp, 2 bpp, and 3 bpp, respectively. In comparison, the MS-SSIM results of the proposed method are 0.9507, 0.9769, 0.9919, and 0.9973, respectively. The slight performance improvements of the HAS-based method are caused by the visual weighing mask. Nevertheless, for most of the test images, the proposed compression method can still provide the best results at high bit rates. Table 5 lists the comparison of the compression results in terms of VIF. The VIF is derived from a quantification of two mutual information quantities: the mutual information between the input and the output of the HVS channel for the reference image, and the mutual information between the input of the distortion channel and the output of the HVS channel for the reconstructed image. Therefore, VIF is a quality index that evaluates an algorithm from the perspective of human visual perception. Table 5 shows that the proposed compression method is superior to the other five compression methods in most cases. At a few high bit rates, the HAS method can provide a better visual quality for reconstructed images. The reason is that the HAS method is specially designed for image visualization applications. The retina-based visual weighting mask of the HAS aims to preserve more visually sensitive information of the image. Table 6 shows the encoding times(s) of these compression methods. The results are the average encoding time for six test images at three bit rates. These programs are evaluated on a PC with Intel Core i7-3770 CPU @ 3.40 GHz and 32 GB memory. In Table 6, we can see that BTCA is faster than CCSDS with entropy coding. The encoding time of QCAO is slightly shorter than that of BTCA. The reason is that, compared with binary tree, the establishment and scanning of quadtree need fewer comparison times. For HAS, it takes some time to calculate the retina-based visual weighting mask, so its encoding time is longer than that of BTCA. The proposed method is an improved version of QCAO. Compared with QCAO, the proposed method takes a lot of time to calculate the optimum prediction directions of those blocks and directional interpolation. Therefore, the encoding time of the proposed method is longer than that of QCAO, and sometimes even longer than that of JPEG2000. Thus, the proposed method is not applicable to onboard compression, and it can be used for occasions with low real-time requirements and high compression efficiency.

Image
In order to compare the visual quality of reconstructed images obtained by the different compression methods, some experiments are carried out. Here, we provide the visual quality of reconstructed images obtained by the proposed compression method and the mainstream JPEG2000. The comparison results of the image "Lunar" at 0.0625 bpp are shown in Figure 9. It can be seen that the reconstructed image with the proposed method is clearer than that with JPEG2000 in the red box. Figure 10 gets a similar result with the image "Ocean" at 0.125 bpp.
Based on the analysis above, we can conclude that, compared with other scan-based methods, the proposed compression method can provide a better coding performance. One reason is that the designed DAL-PBT model can provide a more efficient representation for remote sensing images. Another reason is that the CQOT method provides a reasonable scanning method among and within subbands/blocks based on image content, which ensures that the important information of the image can be scanned in priority. The effective image representation and the adaptive scanning-based coding method are all beneficial to the improvement of the coding performance.
The comparison results of the image "Lunar" at 0.0625 bpp are shown in Figure 9. It can be seen that the reconstructed image with the proposed method is clearer than that with JPEG2000 in the red box. Figure 10 gets a similar result with the image "Ocean" at 0.125 bpp.
Based on the analysis above, we can conclude that, compared with other scan-based methods, the proposed compression method can provide a better coding performance. One reason is that the designed DAL-PBT model can provide a more efficient representation for remote sensing images. Another reason is that the CQOT method provides a reasonable scanning method among and within subbands/blocks based on image content, which ensures that the important information of the image can be scanned in priority. The effective image representation and the adaptive scanning-based coding method are all beneficial to the improvement of the coding performance.

Conclusions
In this paper, we present a new compression framework for remote sensing images. It combines the directional adaptive lifting partitioned block transform with content-driven quadtree codec with optimized truncation. The image representation phase and coding phase are closely related; the former aims to provide a good representation for more directional information of remote sensing images, and the latter focuses on the adaptive scanning mode of the transformed image blocks for further improving the coding efficiency. The experiments are carried out with the images from different sensors (Galileo Image, NOAA Polar Orbiter, QuickBird, WorldView-2, and Simulated PLEIADES) and covering different scenes (lunar, ocean, city, and suburbs). Experimental results show that the proposed method outperforms the mainstream JPEG2000 up to 2.41 dB in PSNR. Compared with the latest QCAO method, up to 0.55 dB improvement in PSNR is reported. Moreover, some improvements in subjective quality are also observed. The proposed method also supports various progressive transmission modes, such as the quality progressive mode, resolution progressive mode, and position progressive mode. Since the block coding is independent, the proposed method can also support the region of interest (ROI)-based progressive transmission mode.

Conclusions
In this paper, we present a new compression framework for remote sensing images. It combines the directional adaptive lifting partitioned block transform with content-driven quadtree codec with optimized truncation. The image representation phase and coding phase are closely related; the former aims to provide a good representation for more directional information of remote sensing images, and the latter focuses on the adaptive scanning mode of the transformed image blocks for further improving the coding efficiency. The experiments are carried out with the images from different sensors (Galileo Image, NOAA Polar Orbiter, QuickBird, WorldView-2, and Simulated PLEIADES) and covering different scenes (lunar, ocean, city, and suburbs). Experimental results show that the proposed method outperforms the mainstream JPEG2000 up to 2.41 dB in PSNR. Compared with the latest QCAO method, up to 0.55 dB improvement in PSNR is reported. Moreover, some improvements in subjective quality are also observed. The proposed method also supports various progressive transmission modes, such as the quality progressive mode, resolution progressive mode, and position progressive mode. Since the block coding is independent, the proposed method can also support the region of interest (ROI)-based progressive transmission mode.
Multicomponent images, such as multispectral and hyperspectral images, are also very popular in remote sensing applications. Therefore, a natural idea is to apply the proposed method to the compression of multicomponent images. The proposed method employs block coding, which can be combined with parallel computing to improve compression efficiency. Actually, we are now doing this work for the compression of hyperspectral images. In addition, for the compression of hyperspectral images, another issue that should be considered is the bit allocation strategy among those transformed components. In our next work, we will research this.