Discrete Sine Transform-Based Interpolation Filter for Video Compression

Fractional pixel motion compensation in high-efficiency video coding (HEVC) uses an 8-point filter and a 7-point filter, which are based on the discrete cosine transform (DCT), for the 1/2-pixel and 1/4-pixel interpolations, respectively. In this paper, discrete sine transform (DST)-based interpolation filters (DST-IFs) are proposed for fractional pixel motion compensation in terms of coding efficiency improvement. Firstly, a performance of the DST-based interpolation filters (DST-IFs) using 8-point and 7-point filters for the 1/2-pixel and 1/4-pixel interpolations is compared with that of the DCT-based IFs (DCT-IFs) using 8-point and 7-point filters for the 1/2-pixel and 1/4-pixel interpolations, respectively, for fractional pixel motion compensation. Finally, the DST-IFs using 12-point and 11-point filters for the 1/2-pixel and 1/4-pixel interpolations, respectively, are proposed only for bi-directional motion compensation in terms of the coding efficiency. The 8-point and 7-point DST-IF methods showed average Bjøntegaard Delta (BD)-rate reductions of 0.7% and 0.3% in the random access (RA) and low delay B (LDB) configurations, respectively, in HEVC. The 12-point and 11-point DST-IF methods showed average BD-rate reductions of 1.4% and 1.2% in the RA and LDB configurations for the Luma component, respectively, in HEVC.


Introduction
The International Telecommunication Union-Telecommunication (ITU-T) Standardization Sector-Video Coding Expert Group (VCEG) and the Moving Picture Expert Group (ISO/IEC MPEG) organized the Joint Collaborative Team on Video Coding (JCT-VC) [1], and they jointly developed the next-generation video-coding standard HEVC/H.265.In high-efficiency video coding (HEVC) [2], motion-compensated prediction (MCP) is a significant video-coding function.MCP reduces the amount of information which should be transmitted to a decoder by using temporal redundancy in video signals [3][4][5][6].In the MCP, each prediction unit (PU, block) in the encoder finds the best matching block that has the least SAD (sum of absolute difference) from the reference pictures in terms of the Lagrangian cost [7].Using the best matching block, the motion vector that represents the movement from the current block to the best matching block is transmitted to the decoder with the residual signals that are the difference signals between the current block and the best matching block.Since the moving objects between two pictures are continuous, it is difficult to identify the actual motion vector in block-based motion estimation.In other words, the true displacements of moving objects between pictures are continuous and do not follow the sampling grid of the digitized video sequence.Hence, by utilizing fractional accuracy for motion vectors instead of integer accuracy, the residual error is decreased and coding efficiency of video compression is increased [4].Therefore, the use of fractional pixels that have been derived from an interpolation filter for motion-vector searches can improve the precision of the MCP.The fractional interpolation filters in HEVC were discreetly considered with several factors such as coding efficiency, implementation complexity, and visual quality [8].
The sinc function is an ideal interpolation filter in terms of signal processing [9,10].However, the sinc-interpolation filter is difficult to implement in HEVC because the sinc-interpolation filter needs to reference the neighbor pixels from −∞ to ∞.Therefore, the finite filter lengths of interpolation filters are determined and motion vectors are supported with 1/4-pixel accuracy in HEVC.During the development of HEVC, there were several proposed interpolation filter techniques, such as switched interpolation filters with offset (SIFOs) [11], maximum order of interpolation with minimal support (MOMS) [12], one-dimensional directional interpolation filters (DIFs) [13], and DCT-based interpolation filters (DCT-IFs) [14].As a result, the DCT-IFs are adopted in HEVC for the sake of coding efficiency.The HEVC interpolation filters are designed from the DCT type-II (DCT-II) transform [15][16][17] that reduces the bit-rate by approximately 4.0% for Luma and 11.3% for Croma components compared with the H.264/AVC (Advanced Video Coding) interpolation filters.The coding efficiency increments are very remarkable for some sequences and can reach a maximum coding gain of 21.7% [18].The filter lengths of the DCT-II-based interpolation filter (DCT-IF) are 8-point and 7-point for the 1/2-pixel and 1/4-pixel interpolations, respectively.In the present paper, discrete sine transform [19] (DST)-based interpolation filters (DST-IFs) that use different interpolation filter lengths are proposed.
This paper is organized as follows.Section 2 presents the ideal interpolation filter, the sinc function, the DCT-IF, the proposed DST-IF, and an analysis of the interpolation filters.Section 3 presents the experiment results, and Section 4 concludes the paper.

The Sinc-Based Interpolation Filter
The sinc-based interpolation filter is an ideal interpolation filter in terms of signal processing and its equation is as follows: where the sinc-based interpolation filter is defined as x(t), t represents the locations of the subsamples, and k is the integer sample value, and T s is the sampling period that is equal to 1.When the sinc-based interpolation filter is lengthened from −∞ to ∞, it is the ideal interpolation filter to reconstruct all the samples.Although the sinc-based interpolation filter is ideal, it is not possible to implement it in HEVC.Since it is impossible to reference all of the neighbor pixels in a picture, the DCT-IF is adopted in HEVC, the filter lengths of which are restricted within 8-point and 7-point for the 1/2-pixel and 1/4-pixel interpolations, respectively.

The DCT-II Interpolation Filter (DCT-IF) in HEVC
The DCT-IF [9] in HEVC is designed in a different way, but it can be designed easily in this paper from the following forward/inverse DCT-II: In Equation (2), X(k) is the DCT-II coefficients and the input pixel x(n) is the IDCT-II (Inverse DCT-II) coefficients in Equation (3).
where c k is 1/ √ 2 at k = 0, and c k is 1 at k = 0.The substitution of Equation (2) into Equation (3) results in the following DCT-IF equation: For example, the 1/2-pixel interpolation filter, when n = 3.5, in the 8-point DCT (N = 8) is derived as a linear combination of the cosine coefficients and x(m), m = 0, 1, . . ., 7. Similarly, the 1/4-pixel interpolation filter, when n = 3.25, in the 7-point DCT (N = 7) is derived as a linear combination of the cosine coefficients and x(m), m = 0, 1, . . ., 6. Lastly, the DCT-IFs that interpolate the 1/2-pixel and 1/4-pixel interpolations are shown as the integer numbers in Table 1.The filter-coefficient order of the 3/4-pixel interpolation filter is the reverse of the filter-coefficient order of the 1/4-pixel interpolation filter.Figure 1 is an example of the integer-and fractional-pixel positions in the Luma motion compensation.In Figure 1, the capital letters (A 0 to A 7 ) indicate the integer-pixel position, the small letter b 0 is the 1/2-pixel position, and a 0 and c 0 are the 1/4-pixel and 3/4-pixel positions, respectively.For example, using the DCT-IF, the b 0 and a 0 are calculated from Table 1 as follows: where the computation of a 0 is the same as that of b 0 from Table 1, the computation of c 0 is in the order that is the reverse of that of a 0 , and the ">>" operation means the bit-wise shift right.
Symmetry 2017, 9, 257 3 of 9 where ck is 1/ 2 at k = 0, and ck is 1 at k ≠ 0. The substitution of Equation ( 2) into Equation ( 3) results in the following DCT-IF equation: For example, the 1/2-pixel interpolation filter, when n = 3.5, in the 8-point DCT (N = 8) is derived as a linear combination of the cosine coefficients and x(m), m = 0, 1, …, 7. Similarly, the 1/4-pixel interpolation filter, when n = 3.25, in the 7-point DCT (N = 7) is derived as a linear combination of the cosine coefficients and x(m), m = 0, 1, …, 6. Lastly, the DCT-IFs that interpolate the 1/2-pixel and 1/4pixel interpolations are shown as the integer numbers in Table 1.The filter-coefficient order of the 3/4pixel interpolation filter is the reverse of the filter-coefficient order of the 1/4-pixel interpolation filter.In Figure 1, the capital letters (A0 to A7) indicate the integer-pixel position, the small letter b0 is the 1/2-pixel position, and a0 and c0 are the 1/4-pixel and 3/4-pixel positions, respectively.For example, using the DCT-IF, the b0 and a0 are calculated from Table 1 as follows: ( 1 where the computation of a0 is the same as that of b0 from Table 1, the computation of c0 is in the order that is the reverse of that of a0, and the ">>" operation means the bit-wise shift right.

The Proposed DST-VII Interpolation Filter (DST-IF)
The DST-IF for HEVC can easily be designed in this paper from the forward/inverse DST-VII.The DST-VII and inverse DST-VII are defined as follows: where X(k) is the DST-VII coefficient and x(n) represents the input pixels.The substitution of Equation ( 7) into Equation ( 8) results in the following DST-IF equation:

The Proposed DST-VII Interpolation Filter (DST-IF)
The DST-IF for HEVC can easily be designed in this paper from the forward/inverse DST-VII.The DST-VII and inverse DST-VII are defined as follows: where X(k) is the DST-VII coefficient and x(n) represents the input pixels.The substitution of Equation (7) into Equation ( 8) results in the following DST-IF equation: In the similar way to obtain the DCT-IF coefficients, the DST-IF is derived from Equation (9).For example, the 1/2-pixel interpolation filter, when n = 3.5, in the 8-point DST (N = 8) is derived as a linear combination of the sine coefficients and x(m), m = 0, 1, . . ., 7. Similarly, the 1/4-pixel interpolation filter, when n = 3.25, in the 7-point DST (N = 7) is derived as a linear combination of the sine coefficients and x(m), m = 0, 1, . . ., 6. Lastly, the DST-IFs that interpolate the 1/2-pixel and 1/4-pixel interpolations are shown in Table 2.The filter-coefficient order of the 3/4-pixel interpolation filter is the reverse of the filter-coefficient order of the 1/4-pixel interpolation filter [20].In the given example, the 8-point and 7-point DST-IFs were derived, but the M-point and (M-1)-point DST-IFs, where M > 8, can be easily derived in a similar way for high-resolution sequences to improve the video-coding efficiency.
The 12-point and 11-point DST-IFs that interpolate the 1/2-pixel and 1/4-pixel interpolations are shown in Table 3.The 12-point and 11-point DST-IFs in Table 3 are derived in this paper from 10.3390/sym9110257 (9), where N = 12 and n = 5.5, and N = 11 and n = 5.25, respectively.The 12-point and 11-point DCT-IFs in Table 4 were derived in a similar way.

Analysis of the Interpolation Filters
Figure 2 shows all of the different graphs of the magnitude responses of the 1/2-pixel interpolation filters.In the x-axis, the discrete time frequency ω is normalized in the range of 0 to 1, where 1 corresponds to the π radian.The y-axis is the magnitude response.Figure 2

Experimental Conditions
The proposed DST-IF was implemented in the HEVC reference software, HM (HEVC test Model)-16.6[21], according to the HEVC common-test conditions.Table 5 shows the test sequences where the sequences of the classes B, C, D, and E comprise the resolutions of 1080p, 832 × 480, 416 × 240, and 720p, respectively, and the proposed method was applied when the quantization-parameter (QP) values were 22, 27, 32, and 37, respectively.Tables 6 and 7 show the test sequences and the BD-rate gain compared with those of HM-16.6 for the Luma component in the low delay B (LDB), low delay P (LDP), and RA configurations, respectively.The random access configuration has hierarchical B pictures (IBBBBBBBP) which have a GOP (group of pictures) size of eight (8).The low delay structure is composed of the first I (intra) picture and the following P (predictive) pictures (IPPPPP…).The P pictures in the low delay structure are GPBs (generalized P and B pictures), in which the P pictures are replaced by B pictures having the same two reference pictures.

Experimental Conditions
The proposed DST-IF was implemented in the HEVC reference software, HM (HEVC test Model)-16.6[21], according to the HEVC common-test conditions.Table 5 shows the test sequences where the sequences of the classes B, C, D, and E comprise the resolutions of 1080p, 832 × 480, 416 × 240, and 720p, respectively, and the proposed method was applied when the quantization-parameter (QP) values were 22, 27, 32, and 37, respectively.Tables 6 and 7 show the test sequences and the BD-rate gain compared with those of HM-16.6 for the Luma component in the low delay B (LDB), low delay P (LDP), and RA configurations, respectively.The random access configuration has hierarchical B pictures (IBBBBBBBP) which have a GOP (group of pictures) size of eight (8).The low delay structure is composed of the first I (intra) picture and the following P (predictive) pictures (IPPPPP . . .).The P pictures in the low delay structure are GPBs (generalized P and B pictures), in which the P pictures are replaced by B pictures having the same two reference pictures.
The negative sign of the BD-rate represents the bit-saving of the proposed method compared with that of HM-16.6 in the same PSNR (peak signal-to-noise ratio) [22].

Experimental Results
HM-16.6 uses an 8-point filter and a 7-point filter for the 1/2-pixel and 1/4-pixel interpolations, respectively.From Table 6, the average bit-saving (BD-rate gain) in the RA configuration was improved by 0.6% with the use of the 8-point DST-IF for 1/2-pixel and 7-point DST-IF for 1/4-pixel.Especially, the result of BQSquare in Class D achieved a bit-saving up to 5.2% in the RA configuration.The average bit-savings of 0.6% and 0.1% were achieved in the RA and LDB configurations, respectively.However, the average bit-saving was decreased by 1.6% in the LDP configuration.In Table 6, the 12-point and 11-point DST-IFs that were applied to HM-16.6 also showed bit-saving in the RA and LDB configurations and bit-increasing (BD-rate loss) in the LDP configuration.In Table 6, Class E sequences in the RA configuration are not experimented because they are not experimental condition in the HEVC test.Those sequences are marked as x.
Interestingly, the DST-IFs in the LDP configuration show bit increments (BD-rate loss), while the DST-IFs in the RA and LDB configurations show bit-savings.It is because the backward (uni-directional) prediction using the decoded past pictures provides the incomplete motion-compensated block compared with the bi-directional prediction that utilizes the average pixel values of two different blocks that were derived by the forward and backward motion-compensations for subsample interpolation.Therefore, the proposed 12-point and 11-point DST-IFs are applied only on the bi-directional motion-compensated blocks.The 12-point and 11-point DST-IFs, which are almost the same filter coefficients as the 12-point and 11-point DCT-IFs, are effective on the bi-directional prediction.Table 7 shows the results of the DST-IF bit-saving results applied only on the bi-directional prediction.In the RA and LDB configurations, the 8-point and 7-point DST-IFs achieved bit-savings of 0.7% and 0.3% compared with HM-16.6, respectively, and the 12-point and 11-point DST-IFs achieved bit-savings of 1.4% and 1.2% compared with HM-16.6, respectively.Table 7 shows the results of the 12-point and 11-point DCT-IFs as well.It shows bit-savings of 0.6% and 0.7% in the LDB and RA configurations compared with HM-16.6, respectively.
Table 8 shows the computational complexity results.As the 12-point and 11-point DST-IFs reference four additional neighbor pixels compared with the 8-point and 7-point DST-IFs in HEVC, when both the uni-directional and bi-directional predictions were applied, the computational complexities in the encoding process and the decoding process were increased by 118% and 113%, respectively.However, the 12-point and 11-point DST-IFs, which were applied on only the bi-directional prediction, increased the computational complexity in the encoding process by 104% and in the decoding process by 107%.The computational complexity of the 12-point and 11-point DCT-IFs is almost same as that of the 12-point and 11-point DST-IFs.Even if the complexity of the proposed 12-point and 11-point DST-IFs is increased compared with that of the existing 8-point and 7-point DCT-IFs in HEVC, the proposed method gives better bit-saving results than the existing method.For an alternative method, one interpolation filter was chosen between the DCT-IF and the DST-IF, and this experiment has been tested using the coding unit-level rate-distortion optimization [23], but the results are worse than those of Tables 6 and 7 because one signaling bit is needed to indicate which interpolation filter is used in the decoder side.An alternative interpolation method selecting the DCT-IF and DST-IF based on Coding Tree Unit (CTU) will be explored in a future study.3  and 4. The experiment results show that the proposed DST-IF pairs achieved coding gains in the RA and LDB configurations.However, as the bit-rate was increased in the LDP configuration using the uni-directional prediction, the proposed DST-IF method was applied only on the bi-directional prediction.Overall, the proposed 12-point and 11-point DST-IFs achieved average BD-rate reductions of 1.4% and 1.2% compared with the 8-point and 7-point DCT-IFs in the RA and LDB configurations of the Luma component, respectively.We believe this method can be considered in the next video coding standard.

1 Figure 1
Figure1is an example of the integer-and fractional-pixel positions in the Luma motion compensation.In Figure1, the capital letters (A0 to A7) indicate the integer-pixel position, the small letter b0 is the 1/2-pixel position, and a0 and c0 are the 1/4-pixel and 3/4-pixel positions, respectively.For example, using the DCT-IF, the b0 and a0 are calculated from Table1as follows:

Figure 1 .
Figure 1.Fractional pixel position in Luma motion compensation.

Figure 1 .
Figure 1.Fractional pixel position in Luma motion compensation.
Figure2shows all of the different graphs of the magnitude responses of the 1/2-pixel interpolation filters.In the x-axis, the discrete time frequency ω is normalized in the range of 0 to 1, where 1 corresponds to the π radian.The y-axis is the magnitude response.Figure2illustrates the magnitude-response graphs of five (5) interpolation filters reconstructing the 1/2-pixel position.The sinc function, which is assumed to be the ideal interpolation filter, is designed with a 48-point interpolation filter and represented by a dot-line.The 48-point sinc interpolation filter has relatively high frequency response even around ω = 0.9π compared with other interpolation filters such as 8-point DCT-IF, 8-point DST-IF, 12-point DCT-IF, and 12-point DST-IF and it comprises many more ripples at high frequencies compared with the other interpolation filters.In particular, in the low frequency responses when ω < 0.5π, all interpolation filters have similar responses.It can be interpreted that all five (5) interpolation filters have similar low frequency responses, but the high frequency responses are different.Comparing the 8-point DCT-IF drawn in a gray line and the 8-point DST-IF drawn in a black line, the 8-point DST-IF has relatively high frequency responses compared with the 8-point DCT-IF around ω = 0.9π even if the low frequency responses are quite similar.In case of the 12-point DST-IF and 12-point DCT-IF, which are represented by a green and red line, two interpolation filters

Figure 2 .
Figure 2. Magnitude responses of interpolation filters for the 1/2-pixel position in the Luma component.

Figure 2 .
Figure 2. Magnitude responses of interpolation filters for the 1/2-pixel position in the Luma component.
In this paper, DST-IF pairs of 12-point and 11-point filter lengths are proposed to achieve a bit-rate reduction compared with the 8-point and 7-point DCT-IFs.Interestingly, the 12-point DST-IF and the 12-point DCT-IF have similar high frequency responses because the 12-point DST-IF and 12-point DCT-IF derived have almost similar interpolation filter coefficients as shown in Tables

Table 5 .
Test sequences used in HEVC common-test conditions.

Table 5 .
Test sequences used in HEVC common-test conditions.

Table 6 .
DST-IF bit-saving results applied to uni-and bi-directional prediction.

Table 7 .
DST-IF bit-saving results applied to bi-directional prediction.

Table 8 .
Results of the computational complexity of the proposed method in the low delay B (LDB) configuration.