An Analytic Transform Kernel Derivation Method for Video Codecs

Kumar, Ankit; Lee, Bumshik

doi:10.3390/app11199280

Open AccessArticle

An Analytic Transform Kernel Derivation Method for Video Codecs

by

Ankit Kumar

and

Bumshik Lee

^*

Department of Information and Communication Engineering, Chosun University, Gwangju 61452, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(19), 9280; https://doi.org/10.3390/app11199280

Submission received: 25 August 2021 / Revised: 29 September 2021 / Accepted: 29 September 2021 / Published: 6 October 2021

Download

Browse Figures

Versions Notes

Abstract

:

In the standardization of versatile video coding (VVC), discrete cosine transform (DCT)-2, discrete sine transform (DST)-7, and DCT-8 are regarded as the primary transform kernels. However, DST-4 and DCT-4 can also be considered as the transform kernels instead of using DST-7 and DCT-8 owing to their effectiveness in smaller resolution test sequences. To implement these different block size transform kernels, a considerable amount of memory has to be allocated. Moreover, memory consumption to store different block size transform kernels is regarded as a major issue in video coding standardization. To address this problem, a common sparse unified matrix concept is introduced in this study, where any block size transform kernel matrix can be obtained after some mathematical operations. The proposed common sparse unified matrix saves approximately 80% of the static memory by storing only a few transform kernel elements for DCT-2, DST-7, and DCT-8. Full-required transform kernels are derived using the stored transform kernels and generated unit-element matrices and a permutation matrix. The static memory required is only for 1648 elements instead of 8180 elements, each with 8-bit precision. The defined common sparse unified matrix is composed of two parts: a unified DST-3 matrix and a grouped DST-7 matrix. The unified DST-3 matrix is used to derive different points of DCT-2 transform kernels, and the grouped DST-7 matrix is used to derive different points of DST-7 and DCT-8 transform kernels. The new technique of grouping concept is introduced, which shows the relationship between different rows of DST-7 transform kernels with various block sizes. The proposed grouping concept supports the fast algorithm of DST-7 by implementing the proposed method of the “one group one feature” principle. The simulation was conducted using the VTM-3.0 reference software under common test conditions. The simulation result of the all intra (AI) configuration is Y = 0.00%, U = −0.02%, V = 0.00% with an encoding time of 100%, and a decoding time of 100%. Similarly, the simulation results of random access (RA) configuration are Y = −0.01%, U = 0.09%, V = 0.06%, and the encoding and decoding times are 101% and 100%, respectively. The simulation result of the low delay B (LDB) configuration is Y = 0.01%, U = 0.08%, and V = −0.27%, for encoding and decoding times of 101% and 100%, respectively.

Keywords:

VVC; DCT-2; DST-7; DCT-8; DST-3; DST-4; DCT-4; common sparse unified matrix; unified DST-3 matrix; unit element matrices; grouped DST-7 matrix; permutation matrix

1. Introduction

Various standardization works for video compression have been performed, including H.261, H.262, H.263 [1], H.264/advanced video coding [2], and H.265/ high-efficiency video coding (HEVC) [3]. Recently, the standardization process of versatile video coding (VVC) [4], which aims to provide a significant improvement in compression performance over the existing HEVC standard and aid the deployment of higher quality video services and emerging applications, such as 360° omnidirectional immersive multimedia and high-dynamic-range video, has been standardized. In the standardization of VVC, discrete cosine transform-2 (DCT-2), discrete sine transform-7 (DST-7), and DCT-8 are regarded as three vital primary transform kernels [5,6,7,8,9], which can also be termed as the multiple transform set (MTS) [10,11]. The benefits of MTS are usually to enlarge transform sets when necessary, that is, reduced distortion in the R-D trade-off as complex residuals exploit the multiple transforms and reduce transform sets when necessary. This means reduced bitrate in the R-D trade-off by avoiding wasting bits signaling the transforms when necessary [12]. Similarly, DST-7 approximates the optimal transform better than DCT-2 along the prediction direction, that is, for intra prediction residuals [11].

Along with these three transform kernels, DST-4 and DCT-4 can also be regarded as the replacement of DST-7 and DCT-8, respectively, because they exhibit similar kernel behavior and show better coding efficiency for smaller resolution sequences [13].

The used transform kernel elements are derived based on the following mathematical equations.

DCT-2

{[C_{2}]}_{n, k} = \sqrt{\frac{2}{N}} ε_{k} c o s (\frac{π (2 n + 1) k}{2 N})

(1)

where

ε_{k} = \{\begin{matrix} \frac{1}{\sqrt 2} k = 0 \\ 1 o t h e r w i s e \end{matrix}

DCT-4

{[C_{4}]}_{n, k} = \sqrt{\frac{2}{N}} c o s (\frac{π (2 n + 1) (2 k + 1)}{4 N})

(2)

DCT-8

{[C_{8}]}_{n, k} = \frac{2}{\sqrt{2 (N - 1)}} c o s (\frac{π (2 n + 1) (2 k + 1)}{4 N - 2})

(3)

DST-4

{[S_{8}]}_{n, k} = \sqrt{\frac{2}{2 N}} ε_{k} s i n (\frac{π (2 n + 1) (k + 1)}{2 N})

(4)

where,

ε_{k} = \{\begin{matrix} \frac{1}{\sqrt 2} k = N - 1 \\ 1 o t h e r w i s e \end{matrix}

DST-7

{[S_{7}]}_{n, k} = \frac{2}{\sqrt{2 N + 1}} s i n (\frac{π (2 k + 1) (n + 1)}{2 N + 1})

(5)

where N represents the block size.

Several design aspects of HEVC transform coding are inherited in the VVC codec. In addition to the conventional DCT-2, alternate transform types like DST-7 and DCT-8 are also adopted in the VVC, which is generally referred to as primary transform because it is first applied to the predicted residual pixels and named as contrast meaning of the secondary transform [14] in VVC. The size of DCT-2 ranges from 4-point to 64-point, whereas that of DST-7 and DCT-8 ranges from 4-point to 32-point. The kernels elements defined in VVC are composed of 8-bit signed integer values. The additional integer transform kernels defined in VVC are derived by scaling the floating-point transform kernel with 64

\sqrt{N}

, where N represents the transform size [14]. For the alignment of the worst-case multiplications per coefficients with HEVC, for 64-point DCT-2 and 32-point DST-7/DCT-8, only the first 32 and 16 low-frequency coefficients are kept, and the higher frequency coefficients are zeroed out respectively, which is also considered in the last coefficient position coding and the coefficient group scanning [15].

All the kernels elements are stored in the reference software VTM-3.0 in an 8-bit representation. A single 64-point DCT-2 kernel matrix is newly defined in VCC, and other sizes of DCT-2 are kept unchanged with HEVC. Other individual sets of 4-, 8-, 16-, and 32-point DST-7 and DCT-8 kernel matrices are also stored in the reference software [16], adding together to the storage of 8180 elements which is a prime concern for the memory. Unlike the conventional video codec such as HEVC, the partial butterfly structure for the transform of DST-7 and DCT-8 is avoided in the transform process to be compatible with the virtual pipeline process and low delay in hardware design, which is thoroughly investigated and agreed in the standardization works of VVC. Instead, the fast transform method is designed to support the dual implementation of matrix multiplication and butterfly operation [17]. Thus, all kernel elements for matrix multiplication operation should be stored in the codec devices or software. However, the number of multiplication and addition is significantly reduced even though matrix operations are performed with matrix multiplications by utilizing several features of DST-7 and DCT-8 kernels [17]. Table 1 shows the use of MTS transform kernels based on the prediction modes and block sizes in VVC.

Table 1 shows the use of multiple transform set (MTS) for intra and inter prediction. In the intra prediction, various prediction tools have been introduced in VVC, which are intra-sub-partition (ISP) and normal prediction. For ISP, the combinations of DCT-2 and DST-7 transform kernels are used as the horizontal and vertical transform types with a transform block size of 4 × 4 up to 16 × 16. For the normal intra-prediction modes, the combinations of DCT-2 and DST-7 transform kernel sets are used, based on the prediction mode and comparison of width and height of the block sizes, ranging from 4 to 16 pixels in accordance with the spec text [15]. The 16-point or 32-point DST-7 is applied to non-zero coefficients transform blocks in either the horizontal or vertical direction. For inter prediction, using sub-block transform (SBT) [19], as a prediction tool, the combinations of DST-7 and DCT-8 are applied on the partitioned transform unit block for a block size up to 32 pixels, depending on the cu_sbt_horizontal_flag and cu_sbt_vertical_flag [15] in VTM-3.0.

Despite significant coding gain improvement owing to the usage of MTS for different block sizes and kernels, memory storage of kernel elements is regarded as one of the crucial issues for the standardization of VVC. In MTS, DST-4 and DCT-4 can be used as a replacement for the DST-7 and DCT-8 transform kernels, respectively, owing to their relatively easier kernel derivation by sub-sampling of even and odd rows of higher-order DCT-2 transform kernels [20].

DST-4/DCT-4 provides significant improvement for lower resolution test sequences, whereas the use of DST-7 and DCT-8 provides significant results for all resolution test sequences; however, the derivation of different points DST-7 and DCT-8 transform kernels cannot be achieved by subsampling the larger block size DCT-2 transform kernels. Hence, separate memory must be allocated for the storage of these transform kernels, which results in the memory issue in VVC. The total number of elements for which memory has to be allocated is 8180 elements with 8-bit precision. To address this scenario, various proposals have been presented in VVC.

In [21], a 64 × 64 compound orthonormal transform (COT) matrix is introduced, which comprises two aspects: (1) for 4-point and 8-point DST-4/DCT-4 transform kernels replacing DST-7/DCT-8 transform kernels, and (2) 16-point and 32-point DST-7/DCT-8 transform kernels embedded into 64-point DCT-2. It is implemented using a single 64 × 64 matrix, which has a total of 8180 elements and provides all three types of transform kernels. In [21], 2-, 4-, 8-, 16-, and 32-point DCT-2 can be extracted from the even rows 64-point DCT-2, whereas 4-, 8-point DST-4/DCT-4 and 16- and 32-point DST-7/DCT-8 can be extracted from the odd rows 64-point DCT-2. The metrics of the operation counts and minimum bit-precision remain the same as VTM-3.0. With this approach of using a single 64 × 64 matrix, 33% of transform storage, i.e., 2.7 kilobytes of memory, can be saved compared to VTM-3.0. Although the proposed method provides good gains, it fails to provide the precise 64-point DCT-2 because some of the kernel values of DST-7/DCT-8 are embedded in some rows of 64-point DCT-2. Owing to the mismatch of the proposed 64-point DCT-2 transform kernel with the original DCT-2 transform kernel, coding performance improvement is not significant when only the DCT-2 transform kernel is used. Conversely, the fast computation algorithm [22] for the 64-point DCT-2 transform cannot be used. Similarly, the number of transform kernel matrices is also increased to 5, that is DCT-2, DST-7, DCT-8, DST-4, and DCT-4, even though it was aimed to use only three transform kernels.

In [23], a 64-point unified matrix was proposed. It is implemented using the 64 × 64 unified matrix with a total of 8180 elements and provides all three types of transform with all block sizes from 4 × 4 to 64 × 64 by sampling a subset of coefficients and basis vectors of larger transforms. This method is designed with an 8-bit representation. Using simple calculations from the proposed matrix, different kernel types with different sizes were derived. This method uses 131,072 bits amount of kernel elements which account for 16 kilobytes of memory for the total bits amount and is 19% lesser than the total memory of bits used in VTM-3.0. Experimental results show that no gain is observed for all intra (AI) configurations for higher resolution test sequences, that is, class A1 and A2, but they comprise some gain for lower resolution test sequences, that is, classes B, C, D, and E. For random access (RA) and low delay B (LDB) configurations, the overall result shows no gain. The losses are because of the mismatch between the original and derived DST-7 and DCT-8 transform kernels.

Similarly, in [24], the adjustment stage concept for memory reduction was introduced. The adjustment stages are defined as sparse block-band orthogonal matrices and are similar to filters with small numbers of taps. It is proposed to approximate different sizes and types of cosine and sine transform such as DCT-5, DCT-8, DST-1, and DST-7 by applying different adjustment matrices (stages). The major drawback of this method is that the number of multiplications increases by using these adjustment stages, and the normalized values also differ for different transform kernels, which results in a mismatch with the original transform kernels. A transform adjustment filter (TAF) was introduced in [25], which is similar to the adjustment stage in [24]. A sparse matrix is used as a preprocessor to the partial butterfly DCT-2 algorithm. It approximates the DCT-8 and DST-7 with an adjustment stage followed by DCT-2. The DCT-8/DST-7 kernels are approximated by the combination of a block band matrix and a DCT-2 kernel. The size-16, size-32, and size-64 adjustment matrices require the overall 356 8–bit coefficients. Also, the adjustment stage is efficiently implemented using the SIMD instructions. Symmetries are applied for the 32-point and 64-point adjustment matrix having the storage requirement of 93 and 189 8-bit coefficients, respectively, whereas, for the 16-point adjustment matrix, 74 coefficients are required. All the adjustment matrix coefficients require 8-bit storage. This method reduces the complexity of the DST-7/DCT-8 transform and reduces the memory usage by storing only the different point DCT-2 transform kernels. Although the number of TAFs is reduced compared to [24], the total number of multiplications increases, i.e., total multiplication required is 160, 482, and 1468 for 16-, 32- and 64-point, respectively. This is because additional multiplication with the TAF has to be performed, and the normalization values of TAF are also not identical, thus resulting in the mismatch of the derived transform kernels with the original transform kernels.

In this study, an analytical derivation of the transform kernel using a common sparse unified matrix is introduced. The proposed method reduces the total amount of memory for the MTS kernels by storing only 1648 elements instead of 8180 elements with 8-bit precision. The transform kernels used in this study were DCT-2, DST-7, and DCT-8. Using the proposed method, DST-4 and DCT-4 transform kernels can also be achieved without affecting the memory. Similarly, the proposed method supports the implementation of the fast algorithm of DCT-2 [22] and DST-7 [17]. Figure 1 shows the proposed common sparse unified matrix, composed of two parts: Unified DST-3 matrix (U) and grouped DST-7 matrix. The U matrix is composed of different block sizes of DST-4 and is used to derive different block sizes of DCT-2 transform kernels. It stores 1368 elements with 8-bit precision. Similarly, grouped DST-7 is used to derive different block sizes of DST-7 transform kernels. It comprises P, Q, R, and S matrices, which are different selected rows from 32-point, 16-point, 8-point, and 4-point DST-7 transform kernels. It stores 280 elements with 8-bit precision. Hence, the total memory allocation must be appointed for only 1648 elements for the overall common sparse unified matrix.

The remainder of this paper is organized as follows. In Section 2, the proposed MTS kernel derivation method is described. In Section 3, the experimental results are provided, and the concluding statements are provided in Section 4.

2. Proposed Common Sparse Unified Matrix

Figure 1 shows the proposed common sparse unified matrix. It comprises two parts: a unified DST-3 matrix (U) and a grouped DST-7 matrix. The unified DST-3 matrix (U) is used to derive the different sizes of DCT-2 transform kernels, whereas the grouped DST-7 matrix is used to derive the different sizes of DST-7 and DCT-8 transform kernels, where DCT-2, DST-7, and DCT-8 are the transform kernels of MTS in VVC. The U matrix includes 32-, 16-, 4-, and 2-point DST-4 transform kernels and a 2-point DST-3 transform kernel. Similarly, the grouped DST-7 matrix comprises rows selected from different point DST-7 transform kernels, that is, 6

\times

32 block size P, 4

\times

16 block size Q, 2

\times

8 block size R, and 2

\times

4 block size S matrices are the row elements selected from 32, 16, 8, and 4-point DST-7 transform kernels, respectively.

Note that the overall size of the proposed common sparse unified matrix is 1648 bytes (integer, 8 bit-precision). The original size of the transform kernel is 8180 bytes (integer, 8-bit precision) for storing different block sizes of DCT-2, DST-7, and DCT-8.

2.1. Unified DST-3 Matrix (U)

The proposed unified DST-3 matrix (U) was used to derive DCT-2 transform kernels of different sizes. It stores only 1368 elements with 8-bit precision for deriving any block size of the DCT-2 transform kernel, whereas the current VVC allocates memory for 5460 elements with 8-bit precision for deriving different block sizes of the DCT-2 transform kernels. The proposed basic structure of the unified DST-3 matrix is shown in Figure 2.

The matrix U comprises 32, 16, 8, 4, and 2-point DST-4 and a 2-point DST-3 matrix. The remaining elements of U were set to zero. A relationship exists between DST-3 and DCT-2 [26], which can be expressed as Equation (6).

C_{2} = F \times S_{3} \times S

(6)

where

C_{2}

is DCT-2,

S_{3}

is DST-3, and F and S are flipping and sign change matrices, respectively. The elements in F and S are defined as

F_{m, n} = \{\begin{matrix} 1, & if n = N - 1 - m, \\ 0, & otherwise, \end{matrix}

(7)

S_{m, n} = \{\begin{matrix} {(- 1)}^{m}, & if n = m, \\ 0, & otherwise . \end{matrix}

(8)

where m and n are matrix indices, and N is the size of the matrix.

To retrieve any block size of the DCT-2 kernel, different block sizes of the DST-3 kernel are required, as in (6). As shown in Figure 2, a 2-point DST-3 transform kernel matrix exists at the right bottom part of U. Using (6) and the 2-point DST-3 transform kernel, a 2-point DCT-2 transform kernel can be derived. Similarly, different block sizes of DST-3 transform kernels are required to derive other sizes of DCT-2 transform kernels. To obtain different block sizes of DST-3 transform kernels from U, the predefined 64-point unit-element matrices, A, B, C, D, and E, are multiplied by U.

Figure 3 shows the DST-3 derivation mechanism from unified DST-3 matrix (U). The unit-element matrices A, B, C, D, and E are the matrices with values of −1, 0, or 1, and their elements are defined as Equations (9)–(13), respectively.

A_{m, n} = \{\begin{matrix} 1, i f (m = n, \forall 0 < (m, n) < 60) | | \\ (m = n, \forall 60 < (m, n) < 62) | | \\ (\begin{matrix} m + n = 123 \forall (61 < m < 64) a n d \\ (60 < n < 62) \end{matrix}) | | \\ (\begin{matrix} i - j = - 2, \forall (60 < m < 62) a n d \\ (61 < n < 64) \end{matrix}) \\ - 1, i f (m + n = 125, \forall 61 < (m, n) < 64 \\ 0, e l s e w h e r e \end{matrix}

(9)

B_{m, n} = \{\begin{matrix} 1, i f (m = n, \forall 0 < (m, n) < 56) | | \\ (m = n, \forall 55 < (m, n) < 60) | | \\ (\begin{matrix} m + n = 119, \forall (59 < m < 64) a n d \\ (55 < n < 60) \end{matrix}) | | \\ (\begin{matrix} i - j = - 4, \forall (55 < m < 60) a n d \\ (59 < n < 60) \end{matrix}) \\ - 1, i f (m + n = 123, \forall 59 < (m, n) < 64 \\ 0, e l s e w h e r e \end{matrix}

(10)

C_{m, n} = \{\begin{matrix} 1, i f (m = n, \forall 0 < (m, n) < 48) | | \\ (m = n, \forall 47 < (m, n) < 56) | | \\ (\begin{matrix} m + n = 111, \forall (55 < m < 64) a n d \\ (47 < n < 56) \end{matrix}) | | \\ (\begin{matrix} i - j = - 8, \forall (47 < m < 56) a n d \\ (55 < n < 64) \end{matrix}) \\ - 1, i f (m + n = 119, \forall 55 < (m, n) < 64 \\ 0, e l s e w h e r e \end{matrix}

(11)

D_{m, n} = \{\begin{matrix} 1, i f (m = n, \forall 0 < (m, n) < 32) | | \\ (m = n, \forall 31 < (m, n) < 48) | | \\ (\begin{matrix} m + n = 95, \forall (47 < m < 64) a n d \\ (31 < n < 48) \end{matrix}) | | \\ (\begin{matrix} i - j = - 16, \forall (31 < m < 48) a n d \\ (47 < n < 64) \end{matrix}) \\ - 1, i f (m + n = 111, \forall 47 < (m, n) < 64 \\ 0, e l s e w h e r e \end{matrix}

(12)

E_{m, n} = \{\begin{matrix} 1, i f (m = n, \forall 0 < (m, n) < 32) | | \\ (\begin{matrix} m + n = 63, \forall (31 < m < 64) a n d \\ (0 < n < 32) \end{matrix}) | | \\ (\begin{matrix} i - j = - 32, \forall (0 < m < 32) a n d \\ (31 < n < 64) \end{matrix}) \\ - 1, i f (m + n = 95, \forall 31 < (m, n) < 64 \\ 0, e l s e w h e r e \end{matrix}

(13)

The relation between U and the unit-element matrices can be expressed as Equation (14).

S_{3, 64} = U \times A \times B \times C \times D \times E

(14)

where

S_{3, 64}

is the 64-point DST-3 transform kernel, U is the proposed unified DST-3 matrix, and A, B, C, D, and E are the 64-point unit-element matrices. Using U and the unit-element matrices, DST-3 transform kernels with different sizes can be obtained, which is explained using Equations (15)–(19)

S_{3, 4} = T {[U_{64} \times A_{64}]}_{4}

(15)

S_{3, 8} = T {[U_{64} \times M_{64}]}_{8}

(16)

S_{3, 16} = T {[U_{64} \times N_{64}]}_{16}

(17)

S_{3, 32} = T {[U_{64} \times O_{64}]}_{32}

(18)

S_{3, 64} = U_{64} \times P_{64}

(19)

where

S_{3, N}

is an N-point DST-3 transform kernel,

T {[]}_{N}

represents a function that takes an N × N block of the right-bottom part of the matrix, and M, N, O, and P are defined using unit elements (−1, 0, and 1) as in Equation (20).

M = A \times B N = A \times B \times C O = A \times B \times C \times D P = A \times B \times C \times D \times E

(20)

For example, to derive the 16-point DCT-2 kernel, a 16-point DST-3 is required, as in Equation (6). First, U is multiplied by the A, B, and C unit-element matrices as in (17) in sequential order, thus leading to 32-point DST-4, 16-point DST-4, and 16-point DST-3 transform kernel matrices. Using the resulting matrices, the 16-point DST-3 matrix at the bottom right of the generated matrix is obtained. Finally, the 16-point DCT-2 transform kernel can be obtained using (6) and a 16-point DST-3. Figure 4 shows the overall process of generation of the 16-point DST-3 transform kernel.

Figure 5 shows the proposed derivation of the N-point DCT-2 transform kernel. First, the value of N is equal to 2. If the condition is satisfied,

S_{3, 2}

is obtained by the selection of a 2 × 2 block of the right-bottom part of the U matrix; otherwise,

S_{3, N}

is achieved by the selection of the N × N block of the bottom right of the matrix obtained after the multiplication of U, and

E_{64}

.

E_{64}

represents the unit-element matrices A, B, C, D, and E.

E_{64, N}^{}

can be obtained by multiplying the unit-element matrices A, B, C, D, and E, depending on N. For example,

E_{64, N}^{}

is calculated as

A \times B \times C \times D

,

A \times B \times C

,

A \times B,

and A for N = 32, N = 16, N = 8, and N = 4, respectively. Finally, the N-point DCT-2 transform kernel was obtained using (6). Figure 5 shows an illustration of the proposed DCT-2 transform kernel derivation.

2.2. Proposed Grouped DST-7 Matrix

To derive the full DST-7 and DCT-8 matrices with less memory, a grouped DST-7 matrix is proposed. The proposed grouped DST-7 matrix is based on the principle of storing only a few selected rows of different block-sized DST-7 transform kernels. The rows selected from the original kernel matrix are used to generate the group of elements of the respective block size DST-7 transform kernel with the help of the permutation matrix (G) comprising −1, 0, or 1. The matrix G is multiplied by the selected rows, resulting in a full block-sized DST-7 transform kernel. The matrices P, Q, R, and S in Figure 1 indicate the rows for the derivation of 32, 16, 8, and 4-point DST-7 transform kernels, respectively, and they are only stored for DST-7 and DCT-8 kernel derivation, leading to a significant reduction in memory storage. The grouped DST-7 matrix is defined as a 14 × 64 matrix of the lower part, as shown in Figure 1, and the total memory allocation is required for only 280 elements with 8-bit precision, which significantly reduces the memory sizes of the current DST-7 in VVC with 1360 elements each with 8-bit precision.

To derive the full-size 32-point DST-7 transform kernel, only six-row vectors from the 32-point DST-7 are selected, that is, P =

[S_{7, 0} S_{7, 1} S_{7, 2} S_{7, 3} S_{7, 5} S_{7, 6}]

^T and can be base vectors for each group of DST-7 matrix in the proposed method. Similarly, for the derivation of full-size 16, 8, and 4-point DST-7 transform kernels, Q with four rows, R with two rows, and S with two rows, that is, Q =

[S_{7, 0} S_{7, 1} S_{7, 2} S_{7, 5}]

^T, R =

[S_{7, 0} S_{7, 1}]

^T, and S =

[S_{7, 0} S_{7, 1}]

^T, are selected from 16, 8, and 4-point DST-7, respectively and are regarded as the base vectors. Note that

S_{7, i}

represents the i-th row of DST-7. Using a few selected rows and G, the full-size DST-7 transform kernel can be derived by retrieving the row vectors of each group. Figure 6 shows an illustration of kernel derivation using a grouped 32-point DST-7 matrix. As shown in Figure 6, the row vectors of P in the grouped DST-7 matrix are used to derive the row vectors of the DST-7 kernel in each group.

Figure 7 shows the details of the derivation process of the kernel for each group. The 0-th group is taken as an example, and the other groups follow the same process using different row vectors of matrix P.

In Figure 7,

{[S_{7, (i, j)}]}_{N}

denotes the j-th row vector of the i-th group for the N-point DST-7. In the proposed method, matrix P comprises only the 0-th row vectors of the i-th group of the N-point DST-7. First, the 0-th row vector

{[S_{7, (i, 0)}]}_{N}

from P is obtained from the i-th group of DST-7. Subsequently, the absolute values of

e_{l}

and

e_{f}

, which are the last (the rightmost) and first (the leftmost) elements in the row vector, are compared. If these two absolute values are identical, the row vector can be one row of the DST-7 kernel with the same index as stored in P. Otherwise, the row vector is multiplied by G and a comparison of

e_{l}

and

e_{f}

in the resulting row vector is performed as in the previous step. If the first element in each row is a negative value, then the sign of every element in each row is altered. It should be noted that G is designed such that the absolute value of the first element in the previous row equals the absolute value of the last element of the next row vector after multiplying with G. This process is repeated until the end of the last row vector of the current group.

Figure 8 shows a 16-point DST-7 transform kernel derived using Figure 7, and the elements in each row vectors are perfectly matched with the definition of DST-7 in (5). Similarly, 32-, 8-, and 4-point DST-7 can be derived using P, R, S, and G matrices, which are defined according to the size, which will be discussed in the next section.

For the derivation of the full-size 16-point DST-7 transform kernel, only four rows from the 16-point DST-7 are selected, that is, Q =

[S_{7, (0, 0)} S_{7, (1, 1)} S_{7, (2, 2)} S_{7, (5, 5)}]

^T and can be used as base vectors for each group of DST-7 matrix in the proposed method.

2.3. Permutation Matrix (G)

The permutation matrix G plays a crucial role in the derivation of the DST-7 transform kernel using the proposed grouping algorithm, as mentioned in the previous section. The process of deriving G in the proposed method is described in Figure 9, which shows the first group of 16-point DST-7 transform kernels represented with floating-point elements, where

S_{7, (0, 0)}

is only stored in matrix Q in the common sparse unified matrix shown in Figure 1.

The elements between the consecutive rows in the group are observed with specific patterns that appear based on the characteristics of DST-7 kernels. Figure 10 shows the patterns of the elements in consecutive rows. As shown in Figure 10, elements in the odd column (1st, 3rd,…) of the 0th row (

{[S_{7, (0, 0)}]}_{32}

) are repeated in the left-half column in the next row. The elements in the right-half column are copied from the elements of even columns (0th, 2nd,…) in the previous row vector in reverse order.

Thus, this relation can be described in Equation (21).

{[S_{7, (i, 1)}]}_{N} = [{({[S_{7, (i, 1)}]}_{N / 2})}_{L}, {({[S_{7, (i, 1)}]}_{N / 2})}_{R}]

(21)

Based on Figure 7, (21) is the next row vector, composed of two parts,

{({[S_{7, (i, 1)}]}_{N / 2})}_{L}

and

{({[S_{7, (i, 1)}]}_{N / 2})}_{R}

.

{({[S_{7, (i, 1)}]}_{N / 2})}_{L}

and

{({[S_{7, (i, 1)}]}_{N / 2})}_{R}

represent the left and right halves of

{[S_{7, (i, 1)}]}_{N}

.

{({[S_{7, (i, 1)}]}_{N / 2})}_{L}

is derived using (22), where elements in the odd column (1st, 3rd, …) of the 0th row (previous row) are selected.

{({[S_{7, (i, 1)}]}_{N / 2})}_{R}

is derived using (23), where elements in the even column (0th, 2nd, …) of the 0th row (previous row) are selected and substituted in a reverse way. Note that

{[S_{7, (i, j), k}]}_{N / 2}

denotes the element in the k-th column and j-th row vector of the i-th group for N-point DST-7.

{({[S_{7, (i, 1)}]}_{N / 2})}_{L} = {[S_{7, (i, 0), 2 n - 1}]}_{N / 2}

(22)

{({[S_{7, (i, 1)}]}_{N / 2})}_{R} = {[S_{7, (i, 0), 2 n - 2}]}_{N / 2}

(23)

where n is the integer value, N/2 < n > 0.

Hence, a change of one element in the row vectors in a group may affect the other elements of the other vector of the same group. Figure 10 shows an example of patterns observed between two consecutive row vectors of the 16-point DST-7 transform kernel. From the patterns, G comprising −1, 0, and 1 can be derived. Figure 11 shows the G matrix based on the pattern in Figure 10 for deriving the DST-7 transform kernel.

As shown in Figure 11, a smaller G can be embedded into a larger G with identical patterns of element values. Figure 12 shows an algorithm for obtaining the matrix. Even if the memory size of G in Figure 11 accounts for 256 bytes, there can be a case to derive G on the fly using the algorithm shown in Figure 12, not using static memory to save G.

As shown in Figure 12, the algorithm comprises two parts for the left column (S0) and right column part (S1) of G. First, variables used for the algorithm are initialized and the size (N) of the kernel matrix to derive is set. Then, variables a and b are initially set to 1 and 0, which will be used as initial values of the left column (S0) part of G.

i_{r}

and

i_{c}

represent the row and column indices of G, respectively. The index value of

i_{r}

was initialized to 0. First, the condition is checked if the

i_{r}

index value is less than the size N. If this is true, then the process is divided into two columns, the left column (S0) and right column (S1); then, the

i_{c}

index value is initialized to 0 else, the process terminates.

Thus, the

i_{c}

index value is checked if it is less than half of the value of N. If the condition defined is satisfied, then

i_{r}

and

i_{c}

corresponding to the respective a and b variables are checked. Variables a and b are set only for the S0 of the G matrix. The value of the permutation matrix at the respective location

{[G_{N}]}_{i_{r}, i_{c}}

is set to 0 if the checked condition is false,

i_{c}

is incremented by 1, and looped back to the condition, where the

i_{c}

index value is checked for whether it is less than half of N. If the checked condition holds true (T),

c_{1}

is checked for whether it is true (T) or false (F). If

c_{1}

is false, then the value

{[G_{N}]}_{i_{r}, i_{c}}

is set to −1, and

c_{1}

is set to true. Otherwise, the value of

{[G_{N}]}_{i_{r}, i_{c}}

is set to 1, and

c_{1}

is set to false. Subsequently, the value of a is incremented by 2, and the value of b is incremented by 1. After setting the values of a and b,

i_{c}

is incremented by 1 and finally looped back to the condition where the

i_{c}

index value is checked whether it is less than half of the value of N.

For the right column (S1) part of G, variables m and n are used, whose initial values are set to 0 and N−1, respectively. If the condition, where

i_{c}

index value is checked to be less than half of N is false, then the value of

i_{c}

is set as half of N. Further, the

i_{c}

index is checked to determine whether it is less than N. If the condition is false (F), then the

i_{r}

index value is incremented by 1 and looped back to the condition, where

i_{r}

is less than N or is not checked. However, if the condition exists (T), then the

i_{r}

index equals m and the

i_{c}

index, which equals n, is checked. If it is false, the value of

{[G_{N}]}_{i_{r}, i_{c}}

is set to 0. Otherwise, the

c_{2}

condition is further checked for whether it is false. If

c_{2}

is false, the value

{[G_{N}]}_{i_{r}, i_{c}}

is set to 1, and

c_{2}

is set as false; thus, the

i_{c}

index is incremented by 1 and looped back to the condition where

i_{c}

is checked to determine whether it is less than N. Otherwise, the value

{[G_{N}]}_{i_{r}, i_{c}}

is set as −1 and

c_{2}

is set as true, the value of m is incremented by 2, and that of n is decremented by 1. Finally, the value of the

i_{c}

index is incremented by 1 and looped back to the condition where the

i_{c}

index is checked to determine whether it is less than N.

An example of the splitting of S0 and S1 in the G matrix for N = 8 is shown in Figure 13.

2.4. Integer Kernels with VVC Standard

Generally, the floating-point elements in the kernels are scaled up to integer elements to avoid a mismatch between the encoder and decoder in most video coding standards. Transform and quantization using resulting integer kernel matrices are implemented with integer arithmetic, which benefits the complexity of implementations and is one of the primary concerns in the standardization of VVC. Despite the benefits of integer kernels, several disadvantages exist. Orthogonality in the kernel matrix can be lost when scaled up to integer values and the magnitudes of the matrix elements should be as small as possible under the consideration of dynamic ranges for the process of transform and quantization [26]. Nevertheless, it is known that the kernel similarity between the derived kernel and original kernels is more important than orthogonality when quantization is applied to transform the coefficients. Thus, it is unnecessary to maintain kernel orthogonality at the expense of kernel similarity. By reflecting the requirements of transform kernels for video codec, integer kernels for every transform block are proposed and finally adopted [15]. In this section, we show how our proposed kernel derivation method is applied for transformation with no deviation, which means the kernels derived by the proposed method are perfectly matched with the standard kernels.

Figure 14 shows a 32-point integer DST-7 transform kernel adopted in VTM-3.0 [27] as the standard. In Figure 14, each row vector is shown in the order of the group. The element values in the row vectors were identical to those of the standard kernel.

As described in Figure 7 of Section 2.2, the absolute value of

e_{f}

in the current row should be identical to the absolute value of

e_{l}

of the next row, which is obtained by multiplying the current row vector by G. The multiplication can be repeated until

e_{l}

of the base (first) row is equal to

e_{f}

of the row multiplied by G. For the derivation of all sizes of DST-7, the multiplications are repeated until the end of the last vectors of the group, except for a few cases for 32-point DST-7, which are observed in the 3rd group for 32-point DST-7. Figure 15 shows the case shown in the 3rd group for the 32-point DST-7.

As shown in Figure 15,

{[S_{7, (3, 18)}]}_{32}

is obtained after the multiplication of

{[S_{7, (3, 25)}]}_{32}

and G. Because

e_{l}

of

{[S_{7, (3, 3)}]}_{32}

equals

e_{f}

of

{[S_{7, (3, 18)}]}_{32}

, the algorithm for the current group is complete, as shown in Figure 7. Thus,

{[S_{7, (3, 4)}]}_{32}

,

{[S_{7, (3, 23)}]}_{32}

and

{[S_{7, (3, 14)}]}_{32}

for the full-size DST-7 kernel cannot be obtained using the proposed algorithm. This only happens when the element values are scaled up to make the element values of the kernel integer. To address this problem for integer kernels of DST-7, especially for the 32-point kernel, two approaches are proposed in our method. The first method is to tune

e_{l}

of

{[S_{7, (3, 3)}]}_{32}

from 88 to 89 without considering the sign value, and by tuning the value, the case of stopping the algorithm in

{[S_{7, (3, 18)}]}_{32}

obtained after the multiplication of

{[S_{7, (3, 25)}]}_{32}

and G can be removed, resulting in a full-size 32-point DST-7 transform kernel. Figure 16a shows this procedure. Similarly, the second method is to tune the 3rd element of the value of

{[S_{7, (3, 3)}]}_{32}

from 88 to 89 without considering the sign value, as shown in Figure 16b. By tuning the values, the related values in the other vectors are changed by the proposed algorithm, accordingly.

Finally, once all sizes of DST-7 kernels are derived, DCT-7 and all DCT-8 transform kernels can be easily derived using relation Equation (24) as

C_{8} = S \times S_{7} \times F

(24)

where

C_{8}

and

S_{7}

are the DCT-8 and DST-7 transform kernels, respectively, and S and F are the sign-changing and flipping matrices, defined in (7) and (8), respectively.

2.5. Application of the Proposed Method to the DST-7/DCT-8 Fast Algorithm in VTM

In Equation (23), a fast DST-7/DCT-8 algorithm with dual support was introduced and adopted in VTM-3.0. The fast algorithm is based on three different features for integer N-point DST-7/DCT-8 transform kernels shown in Figure 17, which can be defined as Figure 17.

Where {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p} are the N-point unique integer numbers for the 16-point DST-7 kernel. Assume that the output vector

y = {[y_{0}, y_{1}, \dots, y_{N - 1}]}^{T}

is obtained by multiplying the input vector

x = {[x_{0}, x_{1}, \dots, x_{N - 1}]}^{T}

with the transform kernel C, that is, (

y = x C

).

Feature #1:

It is observed that the sum of several (2 or 3) numbers is equal to the sum of several (1 or 2) numbers. The elements l, m, n, o, and p are obtained by (a + j), (b + i), (c + h), (d + g), and (e + f), respectively.

To calculate y₀, instead of performing vector-by-vector multiplication that requires 16 multiplications, the following alternative implementation is computed as shown in Equation (25):

y₀ = a·(x₀ + x₁₁) + b·(x₁ + x₁₂) + … + j·(x₉ + x₁₁) + k·x₁₀

(25)

which requires 10 multiplications.

Feature #2:

This is based on the feature that some elements in a row vector of DST-7 are symmetrically mirrored with each other. For example, the second vector {c, f, i, l, o, o, l, i, f, c, o,

-

c,

-

f,

-

i,

-

l,

-

o} in Figure xx with only c, f, i, l, and o elements.

To calculate y₁, instead of performing vector-by-vector multiplication that requires 16 multiplications, the following alternative implementation is computed, as shown in Equation (26):

y₁ = c·(x₀ + x₉ − x₁₁) + f·(x₁ + x₈ − x₁₂) + i·(x₂ + x₇ − x₁₃) + l·(x₃ + x₆ − x₁₄) + o·(x₄ + x₅ − x₁₅)

(26)

which requires five multiplications.

Feature #3:

This is based on the basic vectors that contain very few numbers (1 or 2) of distinct values without considering sign values {k, k, 0, −k, −k, 0, k, k, 0, −k, −k, 0, k, k, 0, −k}. To calculate y₅, instead of performing vector-by-vector multiplication that requires 16 multiplications, the following alternative implementation is computed as shown in Equation (27):

y₅ = k·(x₀ + x₁− … −x₁₅)

(27)

which requires one multiplication.

The major drawback of the fast algorithm [28] for DST-7 transform in VTM3.0 is that one feature out of the mentioned three features has to be applied in each row. However, once a feature is selected for the basic vector such as

{[S_{7, (0, 0)}]}_{16}

,

{[S_{7, (1, 1)}]}_{16}

,

{[S_{7, (2, 2)}]}_{16}

and

{[S_{7, (5, 5)}]}_{16}

, the same feature is applied to vectors belonging to the same group. Hence, the proposed method follows “one group one feature” and removes the matching of a single feature among the three features in each row.

For example, in the first row, a vector of DST-7, Feature #1 is applied, whereas in the second-row vector, Feature #2 is applied. Similarly, for the other rows, a single feature out of three features was applied. Initially, before applying the feature in each row vector, we were unaware of the feature to be applied in the respective row vector. Consequently, we need to check if the respective row vector follows Feature #1, #2, or #3 such that the single matched feature is applied. For a larger matrix, the number of row vectors is large; therefore, for each row, we have to manually check if each row follows Feature #1, #2, or #3, which is time-consuming.

Based on the proposed grouping method, only for the base row vector, a feature out of three features is selected manually, and the row vectors that depend on the base row, that is, each grouped row vector, follow the same feature that is applied to the based row vector. Figure 18 shows the application of the proposed 16-point grouped DST-7 transform to the fast algorithm in VVC. Feature 1 can be applied to the 0th and 2nd groups. The elements in the l, m, n, o, and p columns are obtained by the summation of the column elements such as (a + j), (b + i), (c + h), (d + g), and (e + f), respectively, of the 0th and 2nd groups. Feature #2 is applied to the 1-st group where column elements in f, g, h, i, and j are mirrored values of column elements e, d, c, b, and a, respectively. Similarly, elements in l, m, n, o, and p are mirrored in the column elements j, i, h, g, and f with flipped signed values, respectively. Similarly, Feature #3 is applied to

{[S_{7, (5, 5)}]}_{16}

as it contains only a single distinct element, that is, 77. This is possible with the help of the proposed grouping method.

3. Experimental Results and Analysis

To verify the proposed method for the transform kernel derivation for VVC, the similarities between the proposed kernels and the original kernels in VTM-3.0 [27] were investigated by plotting the kernels in a graphical representation. Figure 19 shows the comparison of the transform kernels. It can be seen that the derived transform kernels are almost perfectly matched with the original transform kernels. As described in Section 3, only a few integer elements in the 32-point DST-7 kernel are tuned from the original kernels to obtain full transform kernels. Thus, it can be inferred that the proposed transform kernels are more likely to have identical coding gains.

Experiment for verification of the proposed method were performed on the VVC reference software, VTM-3.0 [27], using CTC conditions [29]. The PC for simulation was Centos 7 OS with an Intel^® Xeon^® Silver 4114 CPU @ 2.20 GHz processor with 200 cores and 156 GB of RAM. Parallel processing is turned off to precisely measure the encoding and decoding complexity in the execution time. The coding performance was measured in BDBR (%) [30].

Table 2 shows the experimental results of the proposed method. As shown in Table 2, the coding performances of the proposed method are almost identical to the original VTM3.0 for all configurations. This is because the similarity of the proposed transform kernel with the original transform kernel is almost perfect.

Table 3, Table 4 and Table 5 show a comparison of the results of the proposed method for the AI, RA, and LDB configurations [31], respectively.

Although COT [21] provides better coding gain, it fails to provide a 64-point DCT-2 transform kernel, as some rows of 64-point DCT-2 transform kernels are removed and replaced with selected row values of DST-7/DCT-8 transform kernels [21].

In addition, [21] used five transform kernels, that is, DCT-2, DST-7, DCT-8, DST-4, and DCT-4. Similarly, the unified matrix [23] presented excellent coding gain; however, the higher resolution class sequences, Classes A1 and A2 have higher coding loss. This results from the deviation of the proposed mathematically derived transform kernels with the original transform kernels. In [24,25], owing to the mismatch in the proposed transform kernels with original transform kernels, coding losses can be seen. In our proposed method, no coding loss is observed in higher-resolution class sequences. The overall result showed a minor gain of 0.02% in the U-chroma. This is because the proposed transform kernels were in good agreement with the original transform kernels, as shown in Figure 19.

Table 4 shows the overall experimental results of RA using the VTM-3.0 reference software under the CTC conditions. Coding loss can be seen in [23,24,25] but [21] showed less significant luma gain. Similarly, the proposed method showed no significant loss compared to the other methods.

Table 5 shows the overall experimental results of LDB using the VTM-3.0 reference software under the CTC conditions. [21,23,25] showed higher coding loss, whereas the proposed method showed a 0.27% coding gain in V-chroma and negligible loss in U-chroma.

4. Conclusions

This study proposed an analytical kernel derivation method for the VVC video codec. Memory consumption to store different block size transform kernels is regarded as a major issue in video coding standardization. To address this scenario, the proposed method attempts to save static memory by storing only fewer elements of the transform kernels. Full-required transform kernels are derived using the stored transform kernels and generated unit-element matrices and a permutation matrix. In the proposed method, all the different block size transform kernels are generated dynamically during runtime. The static memory required is only for 1648 elements instead of 8180 elements, each with 8-bit precision, which means that approximately 80% of the static memory can be saved. The fast algorithm of DCT-2 and DST-7 can be implemented in the proposed method as the proposed signal closely overlaps with the original transform kernels, that is, DCT-2, DCT-8, and DST-7. Similarly, the proposed grouping concept of the DST-7 transform kernel shows the effect of tuning on one-row elements on the remaining rows that depend on the tuned row element using the permutation matrix.

Author Contributions

Conceptualization, A.K. and B.L.; methodology, B.L.; software, A.K.; validation, A.K. and B.L.; formal analysis, B.L.; investigation, B.L.; writing—original draft preparation, A.K.; writing—review and editing, B.L.; visualization, B.L.; supervision, B.L.; project administration, B.L.; funding acquisition, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by Korea government (No. NRF-2019R1I1A3A01058959).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Vetrivel, S.; Suba, K.; Thisha, G.A. An Overview of H.26x Series And its Applications. Int. J. Eng. Sci. Technol. 2010, 2, 4622–4631. [Google Scholar]
Wiegand, T.; Sullivan, G.J.; Bjontegaard, G.; Luthra, A. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 560–576. [Google Scholar] [CrossRef] [Green Version]
Sullivan, G.J.; Ohm, J.; Han, W.; Wiegand, T. Overview of the High Efficiency Video Coding (HEVC) Standard; IEEE: Piscataway, NJ, USA, 2012; pp. 1649–1668. [Google Scholar]
Chen, J.; Ye, Y.; Kim, S.H. Algorithm description for Versatile Video Coding and Test Model 3 (VTM 3), [JVET-L1002]. In Proceedings of the JVET of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 12th Meeting, Macao, China, 3–12 October 2018. [Google Scholar]
Rao, K.R.; Yip, P. Discrete Cosine Transform. In IEEE Transactions on Computers; Academic Press: Boston, MA, USA, 1990. [Google Scholar]
Cham, W.K. Development of Integer Cosine Transforms by the Principle of Dyadic Symmetry. Proc. Inst. Elect. Eng. 1989, 136, 276–282. [Google Scholar] [CrossRef]
Strang, G. The Discrete Cosine Transform. SIAM Rev. 1999, 41, 135–147. [Google Scholar] [CrossRef]
Abedi, M.; Sun, B.; Zheng, Z. A Sinusoidal-Hyperbolic Family of Transforms with Potential Applications in Compressive Sensing. IEEE Trans. Image Process. 2019, 28, 3571–3583. [Google Scholar] [CrossRef] [PubMed]
Jain, A.K. A Sinusoidal Family of Unitary Transforms. IEEE Trans. Pattern Anal. Mach. Intell. 1979, 1, 356–365. [Google Scholar] [CrossRef] [PubMed]
Sullivan, G.J.; Ohm, J. Meeting Report of the 13th Meeting of the Joint Video Experts Team (JVET). In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Zhao, X.; Chen, J.; Karczewicz, M.; Zhang, L.; Li, X.; Jung, W. Enhanced Multiple Transform for Video Coding. In Proceedings of the Data Compression Conference, Snowbird, UT, USA, 29 March–1 April 2016. [Google Scholar]
Philippe, P.; Biatek, T.; Lorcy, V. Improvement of HEVC Inter-coding Mode Using Multiple Transforms. In Proceedings of the 25th European Signal Processing Conference (EUSIPCO), Kos Island, Greece, 28 August–2 September 2017. [Google Scholar]
Abe, K.; Toma, T.; Ikeda, M.; Naser, K.; Leannec, F.L.; Francois, E. CE6: JVET-L0262: Replacing all DST-7/DCT-8 by DST-4/DCT-4 used in MTS (test 6.1.1e). In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Zhao, Z.; Kim, S.; Zhao, Y.; Egilmez, H.; Koo, M.; Liu, S.; Lainema, J.; Karczewicz, M. Transform coding in VVC standard. IEEE Trans. Circuits Syst. Video Technol. 2021. early access. [Google Scholar] [CrossRef]
Bross, B.; Chen, J.; Liu, S. Versatile Video Coding (Draft 4) JVET-M1001. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
JVET VTM Software—JVET. Available online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tree/VTM-4.0 (accessed on February 2019).
Zhao, X.; Li, X.; Luo, Y.; Liu, S. CE6: Fast DST-7/DCT-8 with dual implementation support JVET-M0497. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Luxán-Hernández, S.D.; George, V.; Ma, J.; Nguyen, T.; Schwarz, H.; Marpe, D.; Wiegand, T. CE3: Intra Sub-Partitions Coding Mode (Tests 1.1.1 and 1.1.2). In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Zhao, Y.; Gao, H.; Yang, H.; Chen, J. CE6:Sub-block transform for inter blocks (CE6.4.1). In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Budagavi, M.; Fuldseth, A.; Bjøntegaard, G.; Sze, V.; Sadafale, M. Core Transform Design in the High Efficiency Video Coding (HEVC) Standard. IEEE J. Sel. Top. Signal Process. 2013, 7, 1029–1041. [Google Scholar] [CrossRef]
Zhao, X.; Li, X.; Liu, S. CE6: Compound Orthonormal Transform JVET-M0200. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Feig, E.; Winograd, S. Fast algorithms for the discrete cosine transform. IEEE Trans. Signal Process. 1992, 40, 2174–2193. [Google Scholar] [CrossRef]
Choi, K.; Park, M.; Park, M.W.; Choi, W. CE6: Unified matrix for transform JVET-M0200. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Said, A.; Egilmez, H.; Seregin, V.; Karczewicz, M. Complexity Reduction for Adaptive Multiple Transforms (AMTs) using Adjustment Stages JVET-J0066. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 10th Meeting, San Diego, CA, USA, 10–20 April 2018. [Google Scholar]
Philippe, P. CE6: MTS simplification with TAF. In Proceedings of the Joint Video Experts Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]
Ochoa-Dominguez, H.; Rao, K.R. Discrete Cosine Transform, 2nd ed.; CRC Press: Boca Raton, FL, USA, 2019; pp. 133–186. [Google Scholar]
JVET VTM Software—JVET. Available online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftware_VTM/tree/VTM-3.0 (accessed on November 2018).
Zhang, Z.; Zhao, X.; Li, X.; Li, L.; Luo, Y.; Liu, S.; Li, Z. Fast DST-VII/DCT-VIII with Dual Implementation Support for Versatile Video Codec. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 355–371. [Google Scholar] [CrossRef]
Hanhart, P.; Boyce, J.; Choi, K.; Lin, J.-L. JVET common test conditions and evaluation procedures for 360° video. In Proceedings of the JVET of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 12th Meeting, Macao, China, 3–12 October 2018. [Google Scholar]
Lu, G.; Ouyang, W.; Xu, D.; Zhang, X.; Cai, C.; Gao, Z. DVC: An End-to-end Deep Video Compression Framework. In Proceedings of the CVPR, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
Bossen, F.; Boyce, J.; Li, X.; Seregin, V.; Sühring, K. JVET common test conditions and software reference configurations for SDR video. In Proceedings of the Joint Video Exploration Team (JVET) of ITU-T SG 16 WP 3 and ISO/IEC JTC 1/SC 29/WG 11, 13th Meeting, Marrakech, MA, USA, 9–18 January 2019. [Google Scholar]

Figure 1. Common sparse unified matrix.

Figure 2. Structure of unified DST-3 matrix (U).

Figure 3. DST-3 derivation mechanism from unified DST-3 matrix (U).

Figure 4. Proposed method to obtain the 16-point DST-3 transform kernel.

Figure 5. Illustration of the proposed DCT-2 transform kernel.

Figure 6. Illustration of kernel derivation using grouped DST-7 matrix.

Figure 7. An example of derivation process for the i-th group of 32-point DST-7.

Figure 8. Floating-point grouped 16-point DST-7 transform kernel elements.

Figure 9.

S_{7, (0, 0)}

dependent grouped 16-point DST-7 transform kernel.

Figure 9.

S_{7, (0, 0)}

dependent grouped 16-point DST-7 transform kernel.

Figure 10. An example of patterns observed between two consecutive row vectors of 16-point DST-7 transform kernel.

Figure 11. An example of 32−point permutation matrix (G).

Figure 12. Algorithm for the generation of the permutation matrix G.

Figure 13. An example of resulting G matrix for N = 8 by Figure 12.

Figure 14. Grouped 32−point DST−7 transform kernel elements.

Figure 15. An example of the case not deriving the DST−7 kernel.

Figure 16. Removing the case not obtaining full kernel matrix by tuning values highlighted. (a) Method 1; (b) Method 2.

Figure 17. Illustration of 16−point DST−7 kernel unique integer numbers [28].

Figure 18. Application of the proposed 16−point grouped DST−7 transform to a fast algorithm in VVC.

Figure 19. Comparison between the proposed and original transform kernels.

Table 1. MTS transform kernels based on prediction and block sizes in VTM-3.0.

	Prediction	Prediction Tools	Block Size Checks	Transform Kernels
MTS	Intra	ISP [18] (intra sub-partition)	4 × 4 up to 16 × 16	DCT-2, DST-7
	Intra	Normal intra-prediction mode	4 × 4 up to N × 32/32 × N	DCT-2, DST-7
	Inter	SBT [19] (sub-block transform)	4 × 4 up to N × 32/32 × N	DST-7, DCT-8 depends on SBT position

Table 2. Simulation results.

Test Sequences		AI (All Intra)			RA (Random Access)			LDB (Low Delay B)
Test Sequences		Y	U	V	Y	U	V	Y	U	V
Class A1 (4K)	Tango2	−0.01	−0.31	0.08	0.04	0.14	0.01	-	-	-
	FoodMarket4	0.01	−0.01	0.04	0.00	0.11	0.17	-	-	-
	Campfire	−0.01	0.08	−0.10	0.00	−0.04	0.01	-	-	-
Class A2 (4K)	CatRobot1	−0.01	−0.01	0.01	−0.04	0.10	0.17	-	-	-
	DaylightRoad2	0.00	−0.10	0.02	−0.04	0.19	0.07	-	-	-
	ParkRunning3	0.01	0.00	−0.03	−0.03	−0.06	0.01	-	-	-
Class B (1080p)	MarketPlace	0.02	−0.13	0.00	−0.02	0.28	−0.13	0.05	−0.04	−0.35
	RitualDance	0.00	0.07	0.10	−0.01	−0.02	0.00	0.03	−0.11	−0.27
	Cactus	0.01	0.01	0.05	−0.03	−0.04	0.14	−0.10	−0.45	−0.06
	BasketballDrive	−0.01	−0.02	0.07	0.05	−0.01	−0.14	0.01	−0.16	−0.13
	BQTerrace	0.00	0.15	0.09	0.01	0.69	0.17	−0.05	−0.68	−1.28
Class C (WVGA)	BasketballDrill	0.01	−0.18	−0.12	−0.01	0.13	0.15	0.04	0.55	0.10
	BQMall	−0.01	0.15	−0.06	−0.02	−0.12	0.18	0.16	0.61	0.11
	PartyScene	0.00	−0.04	−0.10	0.02	0.18	−0.18	0.04	−0.22	−0.08
	RaceHorses	0.01	−0.09	0.00	0.00	−0.15	0.24	−0.01	−0.10	0.28
Class D (WQVGA)	BasketballPass	0.01	−0.17	−0.08	−0.03	−0.09	−0.17	−0.09	0.36	−0.43
	BQSquare	0.00	0.23	0.28	−0.07	−0.95	−0.24	−0.02	0.55	0.59
	BlowingBubbles	0.00	−0.03	0.03	−0.01	0.01	0.30	0.00	−0.43	−0.83
	RaceHorses	0.00	−0.03	0.18	0.08	0.23	−0.47	−0.02	−0.16	−0.17
Class E (720p)	FourPeople	0.00	−0.05	−0.01	-	-	-	0.17	−0.77	−0.30
	Johnny	−0.02	0.18	−0.07	-	-	-	−0.22	2.81	−0.22
	KristenAndSara	0.02	−0.02	0.08	-	-	-	0.00	−0.43	−1.00
Overall		0.00	−0.02	0.00	−0.01	0.09	0.06	0.01	0.08	−0.27
Unit: %

Table 3. Comparison of the overall simulation results for the derivation of transform kernels with the proposed method (AI).

	COT [18]					Unified Matrix [20]					AMT [21]					TAF [22]					Proposed Method
	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT
Class A1	−0.007	−0.014	0.092	101	100	0.200	0.162	0.299	101	100	0.061	0.184	0.187	101	97	0.101	0.115	0.125	95	84	0.000	−0.080	0.000	100	100
Class A2	−0.050	−0.051	−0.012	102	100	0.152	0.089	0.085	101	100	0.063	0.027	0.105	102	97	0.139	0.076	0.030	97	88	0.000	−0.041	0.001	100	100
Class B	−0.136	−0.131	−0.125	102	98	−0.024	−0.147	−0.140	101	99	0.031	0.093	0.135	102	98	0.075	0.029	0.097	96	88	0.001	0.021	0.059	100	100
Class C	−0.289	−0.214	−0.310	101	99	−0.304	−0.354	−0.415	102	100	0.009	−0.005	0.113	105	105	0.015	0.065	−0.032	97	93	0.000	−0.040	−0.071	100	100
Class D	−0.314	−0.324	−0.307	104	104	−0.349	−0.356	−0.474	103	102	0.007	0.052	−0.009	107	109	0.042	−0.114	0.085	97	96	0.000	0.000	0.100	102	100
Class E	−0.335	−0.277	−0.132	98	98	−0.214	−0.350	−0.289	100	98	0.025	−0.063	0.107	109	106	0.063	0.044	0.110	96	89	0.001	0.041	0.001	100	100
Overall	−0.167	−0.141	−0.112	101	99	−0.058	−0.136	−0.115	101	99	0.033	0.054	0.107	104	101	0.075	0.060	0.064	96	88	0.000	−0.020	0.001	100	100
Unit: %

Table 4. Comparison of the overall simulation results for the derivation of transform kernels with the proposed method (RA).

Seq.	COT [18]					Unified Matrix [20]					AMT [21]					TAF [22]					Proposed Method
Seq.	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT
Class A1	0.002	0.021	0.048	101	99	0.175	0.259	0.324	101	101	0.032	0.059	0.139	101	99	0.090	0.106	0.079	99	98	0.011	0.070	0.060	101	100
Class A2	−0.021	0.168	0.093	99	99	0.140	0.190	0.311	99	99	0.040	0.118	0.215	100	100	0.055	0.195	0.239	99	98	−0.040	0.080	0.080	101	100
Class B	−0.069	0.060	0.012	101	99	0.067	0.172	−0.104	100	100	0.048	−0.128	0.052	101	99	0.047	0.248	0.117	99	98	0.002	0.178	0.011	100	100
Class C	−0.111	−0.046	0.037	101	100	−0.069	−0.043	0.049	98	96	0.000	0.041	0.027	99	98	0.056	−0.124	0.041	99	100	0.001	0.010	0.100	101	100
Class D	−0.136	−0.438	−0.341	101	101	−0.145	−0.445	−0.259	98	96	0.055	0.256	0.238	100	100	−0.017	−0.447	−0.511	99	100	−0.006	−0.201	−0.149	101	100
Overall	−0.057	0.045	0.042	101	99	0.067	0.136	0.105	99	99	0.035	0.060	0.130	100	99	0.060	0.110	0.114	99	99	−0.010	0.086	0.061	101	100
Unit: %

Table 5. Comparison of the overall simulation results for the derivation of transform kernels with the proposed method (LDB).

Seq.	COT [18]					Unified Matrix [20]					TAF [22]					Proposed Method
Seq.	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT	Y	U	V	EncT	DecT
Class B	−0.015	−0.263	0.078	100	100	0.168	0.016	0.326	98	96	0.048	−0.169	0.118	100	101	−0.010	−0.291	−0.420	100	100
Class C	0.015	0.437	0.174	99	96	0.148	0.341	0.388	98	98	0.013	−0.009	0.214	100	100	0.060	0.211	0.100	101	100
Class D	0.016	0.202	0.078	104	102	0.057	0.054	−0.170	98	97	−0.035	0.206	−0.171	100	100	−0.030	0.081	−0.210	101	100
Class E	0.098	1.063	0.158	102	98	0.186	0.812	−0.355	96	88	0.019	1.000	0.162	100	99	−0.020	0.540	−0.510	101	100
Overall	0.023	0.302	0.130	100	98	0.166	0.323	0.177	97	95	0.030	0.177	0.161	100	100	0.000	0.079	−0.270	101	100
Unit: %

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kumar, A.; Lee, B. An Analytic Transform Kernel Derivation Method for Video Codecs. Appl. Sci. 2021, 11, 9280. https://doi.org/10.3390/app11199280

AMA Style

Kumar A, Lee B. An Analytic Transform Kernel Derivation Method for Video Codecs. Applied Sciences. 2021; 11(19):9280. https://doi.org/10.3390/app11199280

Chicago/Turabian Style

Kumar, Ankit, and Bumshik Lee. 2021. "An Analytic Transform Kernel Derivation Method for Video Codecs" Applied Sciences 11, no. 19: 9280. https://doi.org/10.3390/app11199280

APA Style

Kumar, A., & Lee, B. (2021). An Analytic Transform Kernel Derivation Method for Video Codecs. Applied Sciences, 11(19), 9280. https://doi.org/10.3390/app11199280

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Analytic Transform Kernel Derivation Method for Video Codecs

Abstract

1. Introduction

2. Proposed Common Sparse Unified Matrix

2.1. Unified DST-3 Matrix (U)

2.2. Proposed Grouped DST-7 Matrix

2.3. Permutation Matrix (G)

2.4. Integer Kernels with VVC Standard

2.5. Application of the Proposed Method to the DST-7/DCT-8 Fast Algorithm in VTM

3. Experimental Results and Analysis

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI