Fast Algorithms for Short-Length Type VI Discrete Cosine Transform

Kitsela, Valentyna; Polyakova, Marina; Cariow, Aleksandr

doi:10.3390/electronics15030699

Open AccessArticle

Fast Algorithms for Short-Length Type VI Discrete Cosine Transform

by

Valentyna Kitsela

¹

,

Marina Polyakova

^2,* and

Aleksandr Cariow

^1,*

¹

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, Żołnierska 49, 71-210 Szczecin, Poland

²

Institute of Computer Systems, Odesa Polytechnic National University, Shevchenko Ave., 1, 65044 Odesa, Ukraine

^*

Authors to whom correspondence should be addressed.

Electronics 2026, 15(3), 699; https://doi.org/10.3390/electronics15030699

Submission received: 2 January 2026 / Revised: 27 January 2026 / Accepted: 3 February 2026 / Published: 5 February 2026

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

In this paper, new fast algorithms for computing the discrete cosine transform type VI (DCT-VI) are proposed, with a special emphasis on short input sequences of three to eight samples. Fast algorithms for small discrete trigonometric transformations are directly used for efficient processing of small data sets and also serve as fundamental building blocks for constructing algorithms for larger trigonometric transforms. By exploiting the intrinsic structural properties of the DCT-VI matrices of different sizes, the proposed methods significantly reduce arithmetic complexity compared to the conventional matrix–vector multiplication approach. The paper presents a detailed mathematical formulation of the algorithms, supported by data-flow graphs that illustrate the computational structure and facilitate the precise estimation of arithmetic operations. Optimized pseudocode implementations incorporating variable reuse are also introduced to facilitate practical realization in software environments. Performance analysis demonstrates a substantial reduction in the number of multiplications (up to 66%) and a slight decrease in additions (approximately 9%) for input sizes ranging from three to eight, thereby improving the execution speed of the considering transform. The proposed algorithms are well-suited for applications in video coding, data compression, and digital signal processing, where computational efficiency is critical.

Keywords:

DCT-VI; fast algorithm; computational complexity; video coding; matrix factorization; data-flow graph

1. Introduction

Raw, uncompressed video generates enormous data volumes. One minute of high-definition footage can require several gigabytes, making uncompressed video impractical for streaming, sharing, or archiving without compression. Video coding reduces file sizes by eliminating redundancies through lossy or lossless techniques, while preserving visual quality and enabling efficient bandwidth use to avoid buffering [1,2].

In recent approaches to image and video coding, the classical transform coding [2,3,4,5] and artificial neural networks were considered [6,7,8]. The use of artificial neural networks in video coding is limited due to the high cost of their training process. Codecs using neural networks are trained on samples from the coded image and operate as backward predictors. Their computational complexity is prohibitive, while the performance is still not as good as that of the best “classical” approaches [9]. Transform coding has been widely integrated into video coding due to its efficient spatial decorrelation capability [10]. Cooperating with other coding tools [11,12], it significantly reduces spatial correlations and achieves high compression rates while maintaining minimal quality loss. An example of such cooperation is the development of the Versatile Video Coding (VVC) standard, which has been motivated by the need to exceed the coding performance offered by High Efficiency Video Coding (HEVC) [13,14]. To investigate advanced coding tools, the Joint Exploration Model (JEM) was introduced, which demonstrated approximately a 30% gain in coding efficiency relative to HEVC [15]. These improvements arise from enhancements across the entire coding chain, with the transform module playing a central role. One of the key innovations evaluated within JEM is extending the set of transform types by incorporating discrete cosine transforms (DCTs) of types V and VIII and discrete sine transforms (DSTs) of types I and VII, in addition to the conventional type II DCT (DCT-II) and type VII DST used in HEVC [16].

JEM’s performance advantages come at the cost of significantly higher computational complexity. Compared to the HEVC reference model, JEM increases the complexity of both the encoder and decoder by up to seven times, especially in inter-code interaction configurations. This rise in computational cost poses a problem for the deployment of VVC, particularly in real-time applications on embedded platforms constrained by limited computational and memory resources [15]. While hardware accelerators can provide performance improvements, they are limited by computational memory and power resources, further emphasizing the need for efficient algorithmic design.

Because the type V DCT (DCT-V) is linearly related to the type VI DCT (DCT-VI), this study focuses on the DCT-VI and computationally efficient algorithms for its implementation. Below, we consider ways to design fast DCT-VI algorithms reported in the literature.

In this paper, we adopt the definitions of trigonometric transforms given in [17]. In contrast, in [18,19,20,21], the DCTs and DSTs of types VI and VII are defined in the reverse order.

1.1. Related Papers

According to the literature, eight types of DCT and DST are recognized [17]. Among them, the DCT-II, type II DST, type IV DCT, and type IV DST are the most widely used in image, video, speech, and audio processing [4,5]. The DCT and DST of types I–IV form the group of even sinusoidal transforms, whereas the much less familiar types V–VIII constitute the group of odd sinusoidal transforms.

Although numerous fast algorithms exist for even sinusoidal transforms, only a limited number of studies have focused on the fast computation of odd sinusoidal transforms, such as DCTs and DSTs of types V–VIII. Even sinusoidal transforms are associated with even-length fast Fourier transforms (FFTs) and can be efficiently computed using the split-radix algorithms [22,23,24,25]. In contrast, odd sinusoidal transforms correspond to odd-length FFTs.

Fast DCT-VI algorithms were developed using the FFT [15,18,19] or by exploiting the relationship between the DCT-VI and the DCT-II [20,21]. Using this strategy, Chivukula and Reznik [19] proposed fast algorithms for DCT/DST types VI and VII by exploiting their relationship with the discrete Fourier transform (DFT). They showed that an N-point DST of types VI and VII corresponds to a (2N + 1)-point DFT, enabling the use of the Winograd FFT algorithm [26,27], which is particularly efficient for DFTs of prime or prime-power lengths. Based on this approach, they developed fast algorithms for 4- and 8-point DST of types VI and VII, which map to 9- and 17-point DFTs, respectively, achieving a reduction in the number of multiplications compared to direct matrix computation.

In [15], Park, Lee, and Kim represent the N-point DCT-V using the equality between the DCT-VI and the (2N − 1)-point DFT, which enables fast computation of both the forward and inverse DCT-V. The linear relationship between the DCT-V and the DCT-VI is then exploited to further accelerate DCT-V processing in video coding. The (2N − 1)-point DFT can be efficiently evaluated using the Winograd FFT for prime-length factorizations and the prime factor FFT for composite lengths formed from relatively prime factors [22]. Additional computational savings are achieved by exploiting input symmetries within the FFT.

Several studies [20,21,28] have explored relationships between the DCT-VI and other types of DCTs or DSTs. In [20,28], Reznik showed that the (2N + 1)-point DCT-II matrix can be decomposed into an (N + 1)-point DCT-VI and an N-point type VII DST. Since the odd-length DCT-II can be treated as a real-valued DFT of the same length, the author showed how to employ DFT factorization known from the literature [22] to derive reduced-complexity algorithms for the (N + 1)-point DCT-VI and the N-point type VII DST.

In [21], Masera, Martina, and Masera demonstrated reduced-complexity factorizations and their corresponding data-flow graphs for the DCT-V and DCT-VI of lengths N = 4 and 8. To this end, the relationships between the DFT, DCT-II, DCT-VI, and type VII DST were exploited. As a result, a new relationship between the DCT-VI and the DCT-V was established. Hardware architectures implementing these transforms using the proposed reduced-complexity factorizations have been replicated on an FPGA, demonstrating lower complexity compared to a direct implementation based on matrix–vector multiplication.

Based on the brief review above of existing fast algorithms for the DCT-VI, we identified the limitations of these approaches and outlined directions for reducing the computational complexity of the DCT-VI.

1.2. The Main Contributions of the Paper

The above review reveals several limitations of existing methods for constructing fast algorithms for DCT-VI. Most algorithms described in the literature are designed exclusively for input sequences whose lengths are powers of two. Despite known algorithms reducing the number of required multiplications compared to direct matrix–vector multiplication, the computational cost of their implementation is still significant. Moreover, these algorithms are primarily intended for large transform sizes and yield a significant reduction in arithmetic complexity only in such cases.

To address these limitations, we propose fast algorithms for the DCT-VI based on the structural approach introduced in [29,30]. The effectiveness of this approach stems from its ability to identify and effectively exploit the block structures of the original matrices to be decomposed. In contrast to the fast transformation scheme presented in [14,31], the structured approach employs a richer set of matrix templates, which enables a more effective decomposition of the transform matrix. This enables the factorization of submatrices with repeated entries using the templates described in [29]. Furthermore, cyclic convolution blocks can be expressed as products of sparse matrices, yielding significant computational savings [32]. An additional advantage of the structural approach is that it naturally represents the resulting algorithms as data-flow graphs in which each input–output path contains only a single multiplication. This property reduces computational complexity by eliminating redundant operations and facilitates efficient organization of computations.

The primary contribution of this study is the development of reduced-complexity DCT-VI algorithms for small data sequence lengths, in particular for lengths ranging from three to eight. A set of these algorithms is obtained by successfully factoring the original DCT-VI matrices into sparse matrix products. The resulting data-flow graphs are well-suited for hardware implementation, while the corresponding pseudocodes enable efficient software realization. Compared to direct matrix–vector multiplication, the proposed algorithms exhibit lower computational complexity and, when combined with other techniques, apply to image, video, and audio coding, as well as to a broad range of signal and data processing applications.

The remainder of the paper is organized as follows. Section 2 presents the necessary mathematical background and notations. Section 3 introduces the proposed fast DCT-VI algorithms. Section 4 and Section 5 analyze their computational complexity. Section 6 concludes the paper. Appendix A provides the pseudocode of the proposed algorithms.

2. Preliminary Remarks

The DCT-VI is predominantly utilized in digital signal processing domains, including image and video coding. Some signal processing libraries and tools, such as LabVIEW, support DCT-VI as part of a general transform function set.

This transform is mathematically expressed as follows:

y_{k} = \sum_{n = 0}^{N - 1} σ_{kn} x_{n} \cos \frac{π n (2 k + 1)}{2 N - 1}, k = 0, 1, \dots, N - 1,

(1)

where

y_{k}

denotes the output signal after the application of the DCT-VI,

x_{n}

is an input signal, and

σ_{kn}

represents a normalization constant:

σ_{kn} = \{\begin{matrix} \sqrt{\frac{2}{N - 1}}, k = N - 1; n = 0, 1, \dots, N - 1; \\ \sqrt{\frac{2}{N - 1}}, k = 0, 1, \dots, N - 1; n = 0; \\ \frac{2}{\sqrt{N - 1}}, otherwise . \end{matrix}

Furthermore, the DCT-VI can be represented by the matrix–vector product:

Y_{N \times 1} = C_{N} X_{N \times 1},

(2)

where indices n and k range from 0 to N − 1, and

C_{N}

corresponds to the transform matrix:

C_{N} = [\begin{matrix} \begin{matrix} c_{0, 0} & c_{0, 1} \\ c_{1, 0} & c_{1, 1} \end{matrix} & \dots & \begin{matrix} c_{0, N - 1} \\ c_{1, N - 1} \end{matrix} \\ ⋮ & ⋱ & ⋮ \\ \begin{matrix} c_{N - 1, 0} & c_{N - 1, 1} \end{matrix} & \dots & c_{N - 1, N - 1} \end{matrix}], c_{k n} = σ_{kn} \cos \frac{π k (2 n + 1)}{2 N - 1},

Y_{N \times 1} = {[y_{0}, y_{1}, \dots, y_{N - 1}]}^{T}, X_{N \times 1} = {[x_{0}, x_{1}, \dots, x_{N - 1}]}^{T} .

In this research, we use the following notations:

$W_{M \times N}$ and $W_{N}$ are, respectively, M × N and N × N matrices describing pre-additions and post-additions;
$D_{N}$ is a diagonal matrix of order N, whose elements represent the calculated values of the cosines;
$I_{N}$ is an identity matrix of order N;
$P_{N}$ is a permutation matrix;
$H_{2}$ is the 2 × 2 Hadamard matrix;
$s_{m}^{(N)}$ is a cosine-value coefficient;
⊕ is the direct sum of two matrices;
⊗ is the Kronecker product of two matrices;
An empty cell in a matrix means it contains zero.

3. Fast Algorithms for the Short-Length DCT-VI

3.1. The Three-Point DCT-VI Algorithm

We represent the 3-point DCT-VI as follows:

Y_{3 \times 1} = C_{3} X_{3 \times 1},

(3)

where

Y_{3 \times 1} = {[y_{0}, y_{1}, y_{2}]}^{T}

,

X_{3 \times 1} = {[x_{0}, x_{1}, x_{2}]}^{T}

, and

C_{3} = [\begin{matrix} a_{3} & b_{3} & c_{3} \\ a_{3} & - c_{3} & - b_{3} \\ d_{3} & - a_{3} & a_{3} \end{matrix}]

with

a_{3} = \sqrt{\frac{2}{5}} \approx

0.6325,

b_{3} = \frac{2}{\sqrt{5}} \cos \frac{π}{5} \approx

0.7236,

c_{3} = \frac{2}{\sqrt{5}} \cos \frac{2 π}{5} \approx

0.2764, and

d_{3} = \frac{1}{\sqrt{5}} \approx

0.4472.

Here and throughout the paper, when we state cosine values, for example, in the expression

a_{3} = \sqrt{\frac{2}{5}} \approx

0.6325, we use numerical approximations.

To factorize the matrix

C_{3}

, we first invert the signs of the elements in the second row and then decompose the resulting matrix into two components:

C_{3}^{(a)} = C_{3}^{(b)} + C_{3}^{(c)},

(4)

where

C_{3}^{(b)} = [\begin{matrix} a_{3} \\ - a_{3} \\ d_{3} & - a_{3} & a_{3} \end{matrix}]

and

C_{3}^{(c)} = [\begin{matrix} b_{3} & c_{3} \\ c_{3} & b_{3} \end{matrix}]

.

The expression (3) can be represented as

Y_{3 \times 1} = C_{3}^{(b)} X_{3 \times 1} + C_{3}^{(c)} X_{3 \times 1},

(5)

Within the first column and third row of the matrix

C_{3}^{(b)}

, identical entries differ only in sign. This property enables a reduction in the computational complexity of the 3-point DCT-VI, reducing the number of arithmetic operations without requiring further matrix transformations. In matrix

C_{3}^{(c)}

, the row and column consisting exclusively of zero entries are removed. By applying the template

[\begin{matrix} a & b \\ b & a \end{matrix}]

from [29], we obtain the following factorization of the resulting matrix

C_{2}^{(c)} = [\begin{matrix} b_{3} & c_{3} \\ c_{3} & b_{3} \end{matrix}]

:

C_{2}^{(c)} = H_{2} \times diag ((b_{3} + c_{3}) / 2, (b_{3} - c_{3}) / 2) H_{2} .

(6)

Let us denote

s_{0}^{(3)} = d_{3}

,

s_{1}^{(3)} = s_{3}^{(3)} = a_{3}

, and

s_{2}^{(3)} = (b_{3} - c_{3}) / 2

. We also set

(b_{3} + c_{3}) / 2 = 1 / 2

. Then, based on factorization (6), we obtain the data-flow graph for multiplying the inputs by the matrix

C_{2}^{(c)}

, as shown in Figure 1.

The inputs of this graph are

x_{1}

and

x_{2}

, and the outputs are

y_{0}

and

y_{1}

, because the matrix

C_{2}^{(c)}

is obtained from the 2nd and 3rd columns and 1st and 2nd rows of the matrix

C_{3}^{(a)}

.

Next, we multiply the matrix

C_{3}^{(b)}

by the input vector

X_{3 \times 1} = {[x_{0}, x_{1}, x_{2}]}^{T}

, yielding the resulting vector

{[a_{3} x_{0}, - a_{3} x_{0}, d_{3} x_{0} - a_{3} x_{1} + a_{3} x_{2}]}^{T}

. The corresponding data-flow graph is presented in Figure 2.

By merging the data-flow graphs in Figure 1 and Figure 2, and inverting the sign of the second row, we obtain the data-flow graph for the 3-point DCT-VI algorithm, shown in Figure 3.

However, a number of additions required for this algorithm may be redundant. To reduce the number of additions, we construct a factorization of the 3-point DCT-VI matrix corresponding to the data-flow graph in Figure 3. This process is illustrated in Figure 4. For each subgraph at a different hierarchical level of the data-flow graph in Figure 3, we construct the corresponding adjacency matrix, for example, for subgraphs marked by green and blue rectangles.

The adjacency matrix of a graph in this paper is defined as an r × q matrix whose entries belong to {0, 1, −1}, where r and q denote the numbers of output and input vertices, respectively. The (i, j)-th entry of this matrix is equal to 1 if the j-th input vertex is connected to the i-th output vertex by an edge. An edge between the vertices of the graph may also be weighted by −1. A zero entry indicates that the corresponding edge is absent.

The matrix

W_{5 \times 3} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ - 1 \end{matrix}]

corresponds to the green rectangle, and the matrix

W_{3 \times 4} = [\begin{matrix} 1 & 1 & 1 \\ - 1 & 1 & - 1 \\ 1 \end{matrix}]

corresponds to the blue rectangle.

As a result, the 3-point DCT-VI matrix is factorized as follows:

Y_{3 \times 1} = P_{3} W_{3 \times 4} W_{4 \times 5} D_{5} W_{5 \times 3} W_{3}^{(0)} X_{3 \times 1},

(7)

where

W_{4 \times 5} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 & 1 \end{matrix}]

is defined as shown in the Figure 4,

D_{5} = diag (s_{0}^{(3)}, s_{1}^{(3)}, 1 / 2, s_{2}^{(4)}, s_{3}^{(4)})

,

P_{3} = [\begin{matrix} 1 \\ - 1 \\ 1 \end{matrix}]

, and

W_{3}^{(0)} = {1 \oplus H}_{2}

.

Based on factorization (7), we identify redundant additions using the adjacency matrices of the subgraphs defined in each hierarchical level of the data-flow graph in Figure 4. For example, in the matrix

W_{3 \times 4} = [\begin{matrix} 1 & 1 & 1 \\ - 1 & 1 & - 1 \\ 1 \end{matrix}]

the pair entries (1, 1) and (1, 3) as well as (2, 1) and (2, 3), which lie in the same columns, are repeated in the first and second rows up to a sign change. Therefore, the addition of the first and second inputs is repeated and hence redundant. One of these additions can be removed.

To achieve this, we first add the first and second inputs and then multiply the result by the matrix

W_{3}^{(1)} = H_{2} \oplus 1

. By replacing the matrices

W_{3 \times 4}

and

W_{4 \times 5}

, with the matrix

W_{3 \times 5} = [\begin{matrix} 1 & 1 \\ 1 \\ 1 & 1 \end{matrix}]

, we obtain the following factorization of the 3-point DCT-VI matrix:

Y_{3 \times 1} = W_{3}^{(1)} W_{3 \times 5} D_{5} W_{5 \times 3} W_{3}^{(0)} X_{3 \times 1} .

(8)

Based on the DCT-VI matrix factorization in (8), we present the data-flow graph for the 3-point DCT-VI algorithm in Figure 5. This data-flow graph does not include the redundant additions. By applying the proposed algorithm, the number of multiplications is reduced from 9 to 4, while the number of additions remains unchanged, and a single shift operation is introduced.

3.2. Data-Flow Graph Construction

In Section 3, we present algorithms for short-length DCT-VI using data-flow graphs. To construct a data-flow graph, the factorization of the short-length DCT-VI matrix into sparse matrices is first obtained. This factorization consists of a diagonal matrix containing scaling factors and several sparse matrices whose elements belong to the set {0, 1, 22121}.

The data-flow graph of a short-length DCT-VI algorithm has a hierarchical structure. At the first level, located on the left side of the graph, vertices corresponding to the algorithm inputs

x_{n}

,

n = 0, 1, \dots, N - 1

, are placed. Then, considering the factorization from right to left, each subsequent hierarchical level is constructed to correspond to the current sparse matrix in the DCT-VI matrix factorization. Each sparse matrix of the DCT-VI serves as the adjacency matrix for the corresponding hierarchical level of the data-flow graph.

An edge drawn with a solid line corresponds to the value 1 in the adjacency matrix, whereas an edge drawn with a dashed line corresponds to the value −1. For example, in Figure 3, within the green rectangle, two edges extend from the first vertex, corresponding to the (4, 3) and (5, 3) entries of the matrix

W_{3 \times 5}

. Since the (4, 3) entry equals 1 and the (5, 3) entry equals

-

1, the corresponding edges are marked with solid and dashed lines, respectively.

3.3. The 4-Point DCT-VI Algorithm

The four-point DCT-VI is defined as follows:

Y_{4 \times 1} = C_{4} X_{4 \times 1},

(9)

where

Y_{4 \times 1} = {[y_{0}, y_{1}, y_{2}, y_{3}]}^{T}

,

X_{4 \times 1} = {[x_{0}, x_{1}, x_{2}, x_{3}]}^{T}

, and

C_{4} = [\begin{matrix} a_{4} & b_{4} & c_{4} & d_{4} \\ a_{4} & d_{4} & - b_{4} & - c_{4} \\ a_{4} & - c_{4} & - d_{4} & b_{4} \\ e_{4} & - a_{4} & a_{4} & - a_{4} \end{matrix}]

with

a_{4} = \sqrt{\frac{2}{7}} \approx

0.5345,

b_{4} = \frac{2}{\sqrt{7}} \cos \frac{π}{7} \approx

0.6811,

c_{4} = \frac{2}{\sqrt{7}} \cos \frac{2 π}{7} \approx

0.4713,

d_{4} = \frac{2}{\sqrt{7}} \cos \frac{3 π}{7} \approx

0.1682, and

e_{4} = \frac{1}{\sqrt{7}} \approx

0.3780.

Let us consider the idea of developing the 4-point DCT-VI algorithm. First, we derive the factorization of the 4-point DCT-VI matrix; then, the data-flow graph is constructed, and the pseudocode is designed. To factorize the 4-point DCT-VI matrix, we decompose the original matrix into two matrices and factorize each one separately using the 3-point cyclic convolution pattern and a fan-like pattern that adds the same value to different outputs. Finally, the repeated additions are eliminated.

To implement this idea, we change the sign of the third column of the matrix

C_{4}

. The resulting matrix

C_{4}^{(a)}

is decomposed into two submatrices:

C_{4}^{(a)} = C_{4}^{(b)} + C_{4}^{(c)},

(10)

where

C_{4}^{(b)} = [\begin{matrix} a_{4} \\ a_{4} \\ a_{4} \\ e_{4} & - a_{4} & - a_{4} & - a_{4} \end{matrix}]

and

C_{4}^{(c)} = [\begin{matrix} b_{4} & - c_{4} & d_{4} \\ d_{4} & b_{4} & - c_{4} \\ - c_{4} & d_{4} & b_{4} \end{matrix}]

.

In the first column and third row of the matrix

C_{4}^{(b)}

, several entries are identical except for their signs. This property allows a reduction in the number of arithmetic operations without requiring additional transforms for this matrix. In the matrix

C_{4}^{(c)}

, we remove the row and column consisting entirely of zeros.

The resulting matrix conforms to the cyclic convolution pattern

H_{3} = [\begin{matrix} h_{1} & h_{0} & h_{2} \\ h_{2} & h_{1} & h_{0} \\ h_{0} & h_{2} & h_{1} \end{matrix}]

with parameters

h_{0} = - c_{4}

,

h_{1} = b_{4}

, and

h_{2} = d_{4}

. The factorization of this pattern is described in [30] as follows:

H_{3} = A_{3 \times 4} \times diag (s_{2}^{(4)}, s_{3}^{(4)}, s_{4}^{(4)}, s_{5}^{(4)}) A_{4 \times 3},

(11)

where

s_{2}^{(4)} = (- c_{4} + b_{4} + d_{4}) / 3,

s_{3}^{(4)} = (2 b_{4} + c_{4} - d_{4}) / 3

,

s_{4}^{(4)} = (b_{4} + 2 c_{4} + d_{4}) / 3

,

s_{5}^{(4)} = (- c_{4} + b_{4} - 2 d_{4}) / 3

, and

A_{3 \times 4} = [\begin{matrix} 1 & 0 & 1 & 1 \\ 1 & 1 & 0 & - 1 \\ 1 & - 1 & - 1 & 0 \end{matrix}]

,

A_{4 \times 3} = [\begin{matrix} 1 & 1 & 1 \\ 0 & 1 & - 1 \\ 1 & - 1 & 0 \\ 1 & 0 & - 1 \end{matrix}]

.

From expression (11) we derive the factorization of the four-point DCT-VI matrix as shown below:

Y_{4 \times 1} = W_{4 \times 5} W_{5 \times 7} D_{7} W_{7 \times 5} W_{5 \times 4} P_{4} X_{4 \times 1},

(12)

where

W_{4 \times 5} = [\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & - 1 \\ 1 & - 1 & - 1 \\ 1 \end{matrix}], P_{4} = [\begin{matrix} 1 \\ 1 \\ - 1 \\ 1 \end{matrix}], W_{5 \times 4} = [\begin{matrix} 1 \\ 1 & 1 & 1 \\ 1 & - 1 \\ 1 & - 1 \\ 1 & - 1 \end{matrix}],

W_{5 \times 7} = [\begin{matrix} 1 & 1 \\ 1 \\ 1 \\ 1 \\ 1 & - 1 \end{matrix}], W_{7 \times 5} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}], D_{7} = diag (s_{0}^{(4)}, s_{1}^{(4)}, s_{2}^{(4)}, s_{3}^{(4)}, s_{4}^{(4)}, s_{5}^{(4)}, s_{6}^{(4)}), s_{0}^{(4)} = e_{4}, s_{1}^{(4)} = s_{6}^{(4)} = a_{4} .

Figure 6 illustrates the data-flow graph of the four-point DCT-VI algorithm. Compared to the direct matrix–vector product, this approach reduces the number of multiplications from 16 to 7, while increasing the number of additions from 12 to 13.

The obtained data-flow graph has a regular structure; that is, a graph organization in which the same connectivity patterns and operation types are systematically repeated. In the graph shown in Figure 6, similar modules are presented on the left and right sides 8 of the scaling-factor line. However, some differences also exist between these modules.

3.4. Algorithm for the 5-Point DCT-VI

We express the five-point DCT-VI as a matrix–vector product:

Y_{5 \times 1} = C_{5} X_{5 \times 1},

(13)

where

Y_{5 \times 1} = {[y_{0}, y_{1}, y_{2}, y_{3}, y_{4}]}^{T}, X_{5 \times 1} = {[x_{0}, x_{1}, x_{2}, x_{3}, x_{4}]}^{T}, and C_{5} = [\begin{matrix} a_{5} & b_{5} & c_{5} & d_{5} & e_{5} \\ a_{5} & d_{5} & - d_{5} & - f_{5} & - d_{5} \\ a_{5} & - e_{5} & - b_{5} & d_{5} & c_{5} \\ a_{5} & - c_{5} & e_{5} & d_{5} & - b_{5} \\ d_{5} & - a_{5} & a_{5} & - a_{5} & a_{5} \end{matrix}] .

The constants are defined as

a_{5} = \sqrt{\frac{2}{9}} \approx

0.4714,

b_{5} = \frac{2}{\sqrt{9}} \cos \frac{π}{9} \approx

0.6265,

c_{5} = \frac{2}{\sqrt{9}} \cos \frac{2 π}{9} \approx

0.5107,

d_{5} = \frac{2}{\sqrt{9}} \cos \frac{π}{3} \approx

0.3333,

e_{5} = \frac{2}{\sqrt{9}} \cos \frac{4 π}{9} \approx

0.1158, and

f_{5} = \frac{1}{\sqrt{9}} \approx

0.6667. It should be noted that

f_{5} = 1 {- d}_{5}

.

The idea of developing the 5-point DCT-VI algorithm is the same as that used for the 4-point algorithm. However, the matrix with identical entries has a more complex structure. The final reduction in the number of additions is not provided because no repeated additions were found.

Next, we multiply the second column of the matrix

C_{5}

by

- 1

and split the resulting matrix

C_{5}^{(a)}

into two submatrices:

C_{5}^{(a)} = C_{5}^{(b)} + C_{5}^{(c)},

(14)

where

C_{5}^{(b)} = [\begin{matrix} a_{5} & d_{5} \\ a_{5} & - d_{5} & - d_{5} & 1 - d_{5} & - d_{5} \\ a_{5} & d_{5} \\ a_{5} & d_{5} \\ f_{5} & a_{5} & a_{5} & - a_{5} & a_{5} \end{matrix}]

and

C_{5}^{(c)} = [\begin{matrix} - b_{5} & c_{5} & e_{5} \\ e_{5} & - b_{5} & c_{5} \\ c_{5} & e_{5} & - b_{5} \end{matrix}]

.

Let us multiply the input vector by the matrix

C_{5}^{(b)}

. We obtain the output vector

x_{0}, x_{1}, x_{2}, x_{3}, x_{4}

[

a_{5} x_{0}

+

d_{5} x_{3}

,

a_{5} x_{0} - d_{5}

(

x_{1}

+

x_{2}

+

x_{3}

+

x_{4}

) +

x_{3}

,

a_{5} x_{0}

+

d_{5} x_{3}

,

a_{5} x_{0}

+

d_{5} x_{3}

,

f_{5} x_{0}

+

a_{5}

(

x_{1}

+

x_{2} - x_{3}

+

x_{4}

)]. The data-flow subgraph for calculation the entries of this output vector is shown by the green rectangle in Figure 7. By construction, the adjacency matrix at each hierarchical level of this subgraph is included in the factorization of the

C_{5}^{(b)}

. The final reducing of the number of additions is not performed.

Then, the matrix

C_{5}^{(b)}

is factorized taking into account the similar entries:

C_{5}^{(b)} = A_{5 \times 6} A_{6} \times diag (s_{0}^{(5)}, s_{1}^{(5)}, s_{2}^{(5)}, 1, s_{3}^{(5)}, s_{4}^{(5)}) A_{6 \times 5} A_{5},

(15)

where

A_{5 \times 6} = [\begin{matrix} 1 \\ 1 & 1 & 1 \\ 1 \\ 1 \\ 1 & 1 \end{matrix}], A_{5} = [\begin{matrix} 1 \\ 1 \\ 1 & 1 & 1 \end{matrix}],

A_{6 \times 5} = [\begin{matrix} 1 \\ 1 \\ 1 \\ - 1 \\ 1 \\ - 1 & 1 \end{matrix}], A_{6} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 & 1 \\ 1 \end{matrix}],

and

s_{0}^{(5)} = a_{5}

,

s_{1}^{(5)} = d_{5}

,

s_{2}^{(5)} = - d_{5}

,

s_{3}^{(5)} = d_{5}

, and

s_{4}^{(5)} = a_{5}

.

Next, the matrix

C_{5}^{(c)}

is factorized. From

C_{5}^{(c)}

, we remove the row and column containing only zero entries. The resulting matrix

C_{3}^{(d)} = [\begin{matrix} - b_{5} & c_{5} & e_{5} \\ e_{5} & - b_{5} & c_{5} \\ c_{5} & e_{5} & - b_{5} \end{matrix}]

corresponds to the cyclic convolution template

H_{3} = [\begin{matrix} h_{1} & h_{0} & h_{2} \\ h_{2} & h_{1} & h_{0} \\ h_{0} & h_{2} & h_{1} \end{matrix}]

, where

h_{0} = c_{5}

,

h_{1} = - b_{5}

, and

h_{2} = e_{5}

. Using Equation (11), the matrix is expressed as

C_{4}^{(d)} = A_{3 \times 4} \times diag ((h_{0} + h_{1} + h_{2}) / 3, (2 h_{1} - h_{0} - h_{2}) / 3, (h_{1} - 2 h_{0} + h_{2}) / 3, (h_{0} + h_{1} - 2 h_{2}) / 3) A_{4 \times 3} .

(16)

The first element of the diagonal matrix in expression (16) is approximately zero:

(h_{0} + h_{1} + h_{2}) / 3 = (c_{5} - b_{5} + e_{5}) / 3 \approx (0.5107 - 0.6265 + 0.1158) / 3 \approx 0 .

Therefore, Equation (16) can be reformulated as

C_{4}^{(d)} = A_{3}^{(1)} \times diag (s_{5}^{(5)}, s_{6}^{(5)}, s_{7}^{(5)}) A_{3}^{(0)},

(17)

where

A_{3}^{(1)} = [\begin{matrix} 1 & 0 & - 1 \\ - 1 & - 1 & 0 \\ 0 & 1 & 1 \end{matrix}], A_{3}^{(0)} = [\begin{matrix} 1 & - 1 & 0 \\ - 1 & 0 & 1 \\ 0 & - 1 & 1 \end{matrix}],

and

s_{5}^{(5)} = (- 2 b_{5} - e_{5} - c_{5}) / 3, s_{6}^{(5)} = (- b_{5} - 2 c_{5} + e_{5}) / 3,

s_{7}^{(5)} = (c_{5} - b_{5} - 2 e_{5}) / 3

.

Using expressions (14), (15), and (17), the factorization of the five-point DCT-VI matrix is derived:

Y_{5 \times 1} = W_{5 \times 9} W_{9} D_{9} W_{9 \times 8} W_{8 \times 5} X_{5 \times 1},

(18)

where

D_{9} = diag (s_{0}^{(5)}, s_{1}^{(5)}, s_{2}^{(5)}, 1, s_{3}^{(5)}, s_{4}^{(5)}, s_{5}^{(5)}, s_{6}^{(5)}, s_{7}^{(5)})

,

W_{9} = A_{6} \oplus A_{3}^{(1)}

,

W_{9 \times 8} = A_{6 \times 5} \oplus A_{3}^{(0)}

,

W_{5 \times 9} = [\begin{matrix} 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}], W_{8 \times 5} = [\begin{matrix} 1 \\ 1 \\ - 1 & 1 & 1 \\ - 1 \\ 1 \\ 1 \end{matrix}] .

In Figure 7, we present the data-flow graph of the five-point DCT-VI algorithm, which reduces the number of multiplications from 25 to 8 compared to the direct matrix–vector multiplication. The number of additions is also reduced from 20 to 16 (see Figure 7).

In this data-flow graph, two modules are explicitly identified. The module in the upper part of the graph (green rectangle) corresponds to the factorization of the matrix

C_{5}^{(b)}

with repeated entries. The resulting subgraph has a highly irregular structure due to the sparsity of the matrix

C_{5}^{(b)}

. The module in the lower part of the graph (blue rectangle) is related to the cyclic convolution factorization of the matrix

C_{3}^{(d)}

and is also irregular.

3.5. Algorithm for 6-Point DCT-VI

Let us derive an algorithm for the six-point DCT-VI by expressing this transform as

Y_{6 \times 1} = C_{6} X_{6 \times 1},

(19)

where

Y_{6 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \end{matrix}]

,

X_{6 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{matrix}]

, and

C_{6}

is the DCT-VI transform matrix defined as

C_{6} = [\begin{matrix} a_{6} & b_{6} & c_{6} & d_{6} & e_{6} & f_{6} \\ a_{6} & d_{6} & - f_{6} & - c_{6} & - b_{6} & - e_{6} \\ a_{6} & f_{6} & - b_{6} & - e_{6} & c_{6} & d_{6} \\ a_{6} & - e_{6} & - d_{6} & b_{6} & - f_{6} & - c_{6} \\ a_{6} & - c_{6} & e_{6} & f_{6} & - d_{6} & b_{6} \\ g_{6} & - a_{6} & a_{6} & - a_{6} & a_{6} & - a_{6} \end{matrix}]

with

a_{6} = \sqrt{\frac{2}{11}} \approx

0.4264,

b_{6} = \frac{2}{\sqrt{11}} \cos \frac{π}{11} \approx

0.5786,

c_{6} = \frac{2}{\sqrt{11}} \cos \frac{2 π}{11} \approx

0.5073,

d_{6} = \frac{2}{\sqrt{11}} \cos \frac{3 π}{11} \approx

0.3949,

e_{6} = \frac{2}{\sqrt{11}} \cos \frac{4 π}{11} \approx

0.2505,

f_{6} = \frac{2}{\sqrt{11}} \cos \frac{5 π}{11} \approx

0.0858, and

g_{6} = \frac{1}{\sqrt{11}} \approx

0.3015.

The idea of developing the 6-point DCT-VI algorithm is the same as that used for the 4-point DCT-VI algorithm. However, in this case, we first apply a permutation of the matrix columns and rows so that both the 5-point cyclic convolution pattern and the fan-like pattern, where the same value is added to multiple outputs, can be utilized. Finally, repeated additions are eliminated by replacing the fan-like structure with an adder tree. This technique is discussed in detail in this subsection.

First, we introduce the permutations

π_{1} = (\begin{matrix} 1 & 2 \\ 1 & 5 \end{matrix} \begin{matrix} 3 & 4 & 5 & 6 \\ 3 & 2 & 6 & 4 \end{matrix}) and π_{2} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & 5 & 4 & 2 & 3 & 6 \end{matrix}) .

(20)

The order of the columns and rows of the matrix

C_{6}

is changed using

π_{1}

and

π_{2}

, respectively. In addition, the signs of the second and third columns of the permuted matrix are inverted. The corresponding transformation matrices are

P_{6}^{(0)} = [\begin{matrix} 1 \\ - 1 \\ - 1 \\ 1 \\ 1 \\ 1 \end{matrix}] and P_{6}^{(1)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

After applying these operations, we obtain the matrix

C_{6}^{(a)} = [\begin{matrix} a_{6} & - e_{6} & - c_{6} & b_{6} & f_{6} & d_{6} \\ a_{6} & d_{6} & - e_{6} & - c_{6} & b_{6} & f_{6} \\ a_{6} & f_{6} & d_{6} & - e_{6} & - c_{6} & b_{6} \\ a_{6} & b_{6} & f_{6} & d_{6} & - e_{6} & - c_{6} \\ a_{6} & - c_{6} & b_{6} & f_{6} & d_{6} & - e_{6} \\ g_{6} & - a_{6} & - a_{6} & - a_{6} & - a_{6} & - a_{6} \end{matrix}]

which is decomposed into the sum of two submatrices:

C_{6}^{(a)} = C_{6}^{(b)} + C_{6}^{(c)},

(21)

where

C_{6}^{(b)} = [\begin{matrix} a_{6} \\ a_{6} \\ a_{6} \\ a_{6} \\ a_{6} \\ g_{6} & - a_{6} & - a_{6} & - a_{6} & - a_{6} & - a_{6} \end{matrix}]

and

C_{6}^{(c)} = [\begin{matrix} - e_{6} & - c_{6} & b_{6} & f_{6} & d_{6} \\ d_{6} & - e_{6} & - c_{6} & b_{6} & f_{6} \\ f_{6} & d_{6} & - e_{6} & - c_{6} & b_{6} \\ b_{6} & f_{6} & d_{6} & - e_{6} & - c_{6} \\ - c_{6} & b_{6} & f_{6} & d_{6} & - e_{6} \end{matrix}]

.

Next, we remove the zero rows and columns from the matrix

C_{6}^{(c)}

. The resulting matrix is given by

C_{5}^{(d)} = [\begin{matrix} - e_{6} & - c_{6} & b_{6} & f_{6} & d_{6} \\ d_{6} & - e_{6} & - c_{6} & b_{6} & f_{6} \\ f_{6} & d_{6} & - e_{6} & - c_{6} & b_{6} \\ b_{6} & f_{6} & d_{6} & - e_{6} & - c_{6} \\ - c_{6} & b_{6} & f_{6} & d_{6} & - e_{6} \end{matrix}],

which is the circular convolution matrix for N = 5:

H_{5} = [\begin{matrix} h_{0} & h_{4} & h_{3} & h_{2} & h_{1} \\ h_{1} & h_{0} & h_{4} & h_{3} & h_{2} \\ h_{2} & h_{1} & h_{0} & h_{4} & h_{3} \\ h_{3} & h_{2} & h_{1} & h_{0} & h_{4} \\ h_{4} & h_{3} & h_{2} & h_{1} & h_{0} \end{matrix}] .

Here

h_{0} = - e_{6}

,

h_{1} = d_{6}

,

h_{2} = f_{6}

,

h_{3} = b_{6}

, and

h_{4} = - c_{6}

.

For this matrix, the following factorization is obtained based on [30]:

C_{5}^{(d)} = W_{5} W_{5 \times 7} W_{7 \times 10} D_{10} W_{10 \times 7} W_{7 \times 5} W_{5} P_{5},

(22)

where

D_{10} = diag (s_{2}^{(6)}, s_{3}^{(6)}, s_{4}^{(6)}, s_{5}^{(6)}, s_{6}^{(6)}, s_{7}^{(6)}, s_{8}^{(6)}, s_{9}^{(6)}, s_{10}^{(6)}, s_{11}^{(6)})

,

s_{2}^{(6)} = (- e_{6} + d_{6} + f_{6} + b_{6} - c_{6}) / 5,

s_{3}^{(6)} = - e_{6} + f_{6} - b_{6} + c_{6}, s_{4}^{(6)} = - e_{6} - c_{6} - b_{6} - d_{6},

s_{5}^{(6)} = e_{6} + b_{6},

s_{6}^{(6)} = - e_{6} + d_{6} - f_{6} + c_{6},

s_{7}^{(6)} = - e_{6} + b_{6} - f_{6} - d_{6},

s_{8}^{(6)} = e_{6} + f_{6},

s_{9}^{(6)} = e_{6} - c_{6},

s_{10}^{(6)} = e_{6} + d_{6},

s_{11}^{(6)} = - e_{6} - s_{0}^{(6)},

W_{5 \times 7} = 1 \oplus T_{2 \times 3} \otimes I_{2}

,

T_{2 \times 3} = [\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}]

,

W_{7 \times 10} = 1 \oplus T_{2 \times 3} \oplus T_{2 \times 3} \oplus T_{2 \times 3}

,

W_{10 \times 7} = 1 \oplus T_{3 \times 2} \oplus T_{3 \times 2} \oplus T_{3 \times 2}

,

T_{3 \times 2} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix}]

,

W_{7 \times 5} = 1 \oplus T_{3 \times 2} \otimes I_{2}

,

W_{5}

=

[\begin{matrix} 1 & 1 & 1 & 1 & 1 \\ 1 & - 1 \\ 1 & - 1 \\ 1 & - 1 \\ 1 & - 1 \end{matrix}]

, and

P_{5}

=

[\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

.

To factorize the initial 6-point DCT-VI matrix, we add to the factorization (22) matrices that take into account the entries of the matrix

C_{6}^{(b)}

:

W_{7 \times 6} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 & 1 & 1 & 1 & 1 \end{matrix}] and W_{6 \times 8} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}] .

Then, the factorization (22) is transformed into the following factorization of the initial 6-point DCT-VI matrix:

Y_{6 \times 1} = P_{6}^{(1)} W_{6 \times 8} W_{8} W_{8 \times 10} W_{10 \times 13} D_{13} W_{13 \times 9} W_{9 \times 7} W_{7} W_{7 \times 6} P_{6}^{(3)} P_{6}^{(0)} X_{6 \times 1},

(23)

where

D_{13} = s_{0}^{(6)} \oplus s_{1}^{(6)} \oplus D_{10} \oplus s_{12}^{(6)}, s_{0}^{(6)} = g_{6}, s_{1}^{(6)} = a_{6}, s_{12}^{(6)} = - a_{6}, W_{7} = 1 \oplus W_{5} \oplus 1, W_{8 \times 10} = I_{3} \oplus (T_{2 \times 3} \otimes I_{2}) \oplus 1, W_{9 \times 7} = I_{2} \oplus (T_{3 \times 2} \otimes I_{2}) \oplus 1, W_{13 \times 9} = [\begin{matrix} 1 & 0 \\ 1 & 0 \\ 0 & 1 \end{matrix}] \oplus T_{3 \times 2} \oplus T_{3 \times 2} \oplus T_{3 \times 2} \oplus 1, W_{10 \times 13} = I_{3} \oplus T_{2 \times 3} \oplus T_{2 \times 3} \oplus T_{2 \times 3} \oplus 1, W_{8} = I_{2} \oplus W_{5} \oplus 1, P_{6}^{(3)} = (1 \oplus P_{5}) .

The data-flow graph corresponding to the factorization (23) is shown in Figure 8. In this data-flow graph, the additions in the fan-like structures marked by the green and blue rectangles are repeated and, therefore, redundant. We remove the left fan-like structure in the green rectangle and the right fan-like structure in the blue rectangle. After this, the final factorization is obtained.

Based on factorization (22), the matrices

W_{6} = 1 \oplus W_{5}

,

W_{6 \times 8} = I_{2} \oplus (T_{2 \times 3} \otimes I_{2})

, and

W_{8 \times 6} = I_{2} \oplus (T_{3 \times 2} \otimes I_{2})

are introduced. Subsequently, the factorization of the six-point DCT-VI matrix can be expressed as follows:

Y_{6 \times 1} = P_{6}^{(2)} W_{6} W_{6 \times 8} W_{8 \times 13} D_{13} W_{13 \times 8} W_{8 \times 6} W_{6} P_{6}^{(3)} P_{6}^{(0)} X_{6 \times 1},

(24)

where

P_{6}^{(2)} = P_{6}^{(1)} [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}],

P_{6}^{(0)} = [\begin{matrix} 1 \\ - 1 \\ - 1 \\ 1 \\ 1 \\ 1 \end{matrix}], W_{13 \times 8} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 \end{matrix}],

W_{8 \times 13} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]

Figure 9 illustrates the six-point DCT-VI algorithm. Compared to the direct matrix–vector implementation, the number of multiplications is reduced from 36 to 13, while the number of additions increases from 30 to 33.

The data-flow graph presented in Figure 9 includes two permutation modules that rearrange the order of data samples without changing their values. These modules are placed near the input and output vertices of the graph. Next, the repeated computational modules are presented. In particular, the subgraphs consisting of five butterfly modules are located at levels neighboring the permutation modules. In addition, repeated computational modules appear on both the left and right sides of the scaling-factor line. With a few exceptions, the graph can be divided into stages with symmetrical topology relative to the scaling-factor line.

3.6. Algorithm for Seven-Point DCT-VI

Let us construct the algorithm for the 7-point DCT-VI. This transform can be represented as follows:

Y_{7 \times 1} = C_{7} X_{7 \times 1},

(25)

where

Y_{7 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \\ y_{6} \end{matrix}]

,

X_{7 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \end{matrix}]

, and

C_{7} = [\begin{matrix} a_{7} & b_{7} & c_{7} & d_{7} & e_{7} & f_{7} & g_{7} \\ a_{7} & d_{7} & g_{7} & - e_{7} & - b_{7} & - c_{7} & - f_{7} \\ a_{7} & f_{7} & - d_{7} & - c_{7} & g_{7} & b_{7} & e_{7} \\ a_{7} & - g_{7} & - b_{7} & f_{7} & c_{7} & - e_{7} & - d_{7} \\ a_{7} & - e_{7} & - f_{7} & b_{7} & - d_{7} & - g_{7} & c_{7} \\ a_{7} & - c_{7} & e_{7} & - g_{7} & - f_{7} & d_{7} & - b_{7} \\ h_{7} & - a_{7} & a_{7} & - a_{7} & a_{7} & - a_{7} & a_{7} \end{matrix}]

with

a_{7} = \sqrt{\frac{2}{13}} \approx

0.3922,

b_{7} = \frac{2}{\sqrt{13}} \cos \frac{π}{13} \approx

0.5386,

c_{7} = \frac{2}{\sqrt{13}} \cos \frac{2 π}{13} \approx

0.4912,

d_{7} = \frac{2}{\sqrt{13}} \cos \frac{3 π}{13} \approx

0.4152,

e_{7} = \frac{2}{\sqrt{13}} \cos \frac{4 π}{13} \approx

0.3151,

f_{7} = \frac{2}{\sqrt{13}} \cos \frac{4 π}{13} \approx

0.1967,

g_{7} = \frac{2}{\sqrt{13}} \cos \frac{5 π}{13} \approx

0.0669,

h_{7} = \frac{1}{\sqrt{13}} \approx 0.2774

.

The development of the 7-point DCT-VI algorithm follows the same approach as that used for the 6-point DCT-VI algorithm. Specifically, we define the permutations,

π_{3} = (\begin{matrix} 1 & 2 \\ 1 & 4 \end{matrix} \begin{matrix} 3 & 4 & 5 & 6 & 7 \\ 2 & 3 & 5 & 6 & 7 \end{matrix}) and π_{4} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 1 & 2 & 3 & 5 & 6 & 4 & 7 \end{matrix}),

and apply

π_{3}

to reorder the rows of the matrix

C_{7}

and

π_{4}

to reorder its columns. In addition, the signs of the third, fourth, and seventh columns are changed. The corresponding permutation matrices are as follows:

P_{7}^{(1)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] and P_{7}^{(0)} = [\begin{matrix} 1 \\ 1 \\ - 1 \\ - 1 \\ 1 \\ 1 \\ - 1 \end{matrix}] .

Then, the resulting matrix

C_{7}^{(a)} = [\begin{matrix} a_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} \\ a_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} \\ a_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} \\ a_{7} & f_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} \\ a_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} \\ a_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} & b_{7} \\ h_{7} & - a_{7} & - a_{7} & - a_{7} & - a_{7} & - a_{7} & - a_{7} \end{matrix}]

is decomposed into the sum of two matrices:

C_{7}^{(a)} = C_{7}^{(b)} + C_{7}^{(c)},

(26)

where

C_{7}^{(b)} = [\begin{matrix} a_{7} \\ a_{7} \\ a_{7} \\ a_{7} \\ a_{7} \\ a_{7} \\ h_{7} & - a_{7} & - a_{7} & - a_{7} & - a_{7} & - a_{7} & - a_{7} \end{matrix}], C_{7}^{(c)} = [\begin{matrix} b_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} \\ - g_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} \\ d_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} \\ f_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} \\ - e_{7} & f_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} \\ - c_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} & b_{7} \end{matrix}]

Next, we remove the rows and columns containing only zero entries from

C_{7}^{(c)}

to obtain the matrix

C_{6}^{(d)} = [\begin{matrix} b_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} \\ - g_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} & d_{7} \\ d_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} & f_{7} \\ f_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} & - e_{7} \\ - e_{7} & f_{7} & d_{7} & - g_{7} & b_{7} & - c_{7} \\ - c_{7} & - e_{7} & f_{7} & d_{7} & - g_{7} & b_{7} \end{matrix}] .

The matrix

C_{6}^{(d)}

is a circular convolution matrix, which can be presented as [30]

H_{6} = [\begin{matrix} h_{0} & h_{5} & h_{4} & h_{3} & h_{2} & h_{1} \\ h_{1} & h_{0} & h_{5} & h_{4} & h_{3} & h_{2} \\ h_{2} & h_{1} & h_{0} & h_{5} & h_{4} & h_{3} \\ h_{3} & h_{2} & h_{1} & h_{0} & h_{5} & h_{4} \\ h_{4} & h_{3} & h_{2} & h_{1} & h_{0} & h_{5} \\ h_{5} & h_{4} & h_{3} & h_{2} & h_{1} & h_{0} \end{matrix}],

where

h_{0} = b_{7}

,

h_{1} = - g_{7}

,

h_{2} = d_{7}

,

h_{3} = f_{7}

,

h_{4} = - e_{7}

, and

h_{5} = - c_{7}

.

Using the factorization of the matrix

H_{6}

from the [30], the following expression is obtained:

H_{6} = W_{6}^{(a)} W_{6}^{(1)} W_{6} W_{6 \times 8} \times diag (s_{2}^{(7)}, s_{3}^{(7)}, s_{4}^{(7)}, s_{5}^{(7)}, s_{6}^{(7)}, s_{7}^{(7)}, s_{8}^{(7)}, s_{9}^{(7)}) W_{8 \times 6} W_{6} W_{6}^{(0)} W_{6}^{(a)},

(27)

where

W_{6 \times 8} = W_{3 \times 4} \oplus W_{3 \times 4}

,

W_{8 \times 6} = W_{4 \times 3} \oplus W_{4 \times 3}

,

W_{6}^{(a)} = H_{2} \otimes I_{3}

,

W_{6} = [\begin{matrix} 1 & 1 & 1 \\ 1 & - 1 & 0 \\ 1 & 0 & - 1 \end{matrix}] \oplus [\begin{matrix} 1 & 1 & 1 \\ 1 & - 1 & 0 \\ 1 & 0 & - 1 \end{matrix}]

,

W_{3 \times 4} = [\begin{matrix} 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}]

,

W_{4 \times 3} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 & 1 \end{matrix}]

,

W_{6}^{(0)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ - 1 \end{matrix}]

,

W_{6}^{(1)} = I_{4} \oplus (- 1) \oplus 1

,

s_{2}^{(7)} = (b_{7} - g_{7} + d_{7} + f_{7} - e_{7} - c_{7}) / 6

,

s_{3}^{(7)} = (d_{7} - c_{7} - b_{7} - f_{7}) / 2

,

s_{4}^{(7)} = (- g_{7} - e_{7} - b_{7} - f_{7}) / 2

,

s_{5}^{(7)} = (2 b_{7} + g_{7} - d_{7} + 2 f_{7} + e_{7} + c_{7}) / 6

,

s_{6}^{(7)} = (b_{7} + g_{7} + d_{7} - f_{7} - e_{7} + c_{7}) / 6

,

s_{7}^{(7)} = (d_{7} + c_{7} - b_{7} + f_{7}) / 2

,

s_{8}^{(7)} = (- e_{7} + g_{7} - b_{7} + f_{7}) / 2

,

s_{9}^{(7)} = (2 b_{7} - g_{7} - d_{7} - 2 f_{7} + e_{7} - c_{7}) / 6

.

To reduce the number of arithmetic operations for the 7-point DCT-VI, we also consider the entries in the first column and the last row of the matrix

C_{7}^{(b)}

, which differ only in a sign. Exploiting this property leads to the following factorization of the matrix

C_{7}

for the 7-point DCT-VI:

Y_{7 \times 1} = P_{7}^{(1)} W_{7 \times 8} W_{8}^{(1)} W_{8} W_{8 \times 11} D_{11} W_{11 \times 7} W_{7}^{(0)} W_{7} P_{7}^{(0)} X_{7 \times 1},

(28)

where

D_{11} = diag (s_{0}^{(7)}, s_{1}^{(7)}, s_{2}^{(7)}, s_{3}^{(7)}, s_{4}^{(7)}, s_{5}^{(7)}, s_{6}^{(7)}, s_{7}^{(7)}, s_{8}^{(7)}, s_{9}^{(7)} s_{10}^{(7)})

,

s_{0}^{(7)} = a_{7}

,

s_{1}^{(7)} = h_{7}

,

s_{10}^{(7)} = - a_{7}

,

W_{8 \times 11} = [\begin{matrix} 1 \\ 1 & 1 \end{matrix}] \oplus T_{2 \times 3} \oplus 1 \oplus T_{2 \times 3} \oplus 1

,

W_{7} = 1 \oplus W_{6}^{(a)}

,

W_{8} = 1 \oplus W_{6} \oplus 1

,

W_{8}^{(1)} = 1 \oplus W_{6}^{(1)} W_{6} \oplus 1

,

W_{7}^{(0)} = 1 \oplus [\begin{matrix} 1 & 1 & 1 \\ 1 & - 1 \\ 1 & - 1 \end{matrix}] \oplus [\begin{matrix} 1 & - 1 & 1 \\ 1 & - 1 \\ 1 & 1 \end{matrix}]

,

W_{7 \times 8} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 & 1 \end{matrix}]

,

W_{16 \times 7} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 \end{matrix}]

.

Consequently, a fast algorithm for the 7-point DCT-VI has been developed, as illustrated in Figure 10 with a data-flow graph. This algorithm reduces the number of multiplications from 49 to 11, while the number of additions is slightly reduced from 42 to 36.

The data-flow graph shown in Figure 10 includes permutation modules at the input and output that perform index reordering of data samples, enabling regular computational structures in fast transform implementations without introducing additional arithmetic operations. As a result, the graph also contains repeated modules on both the left and right sides of the scaling-factor line, in particular butterfly modules near the input and output permutations. To add the same value to all outputs, a fan-like structure is included before the output permutation.

3.7. Algorithm for Eight-Point DCT-VI

To design the algorithm for the eight-point DCT-VI, the transform can be expressed as follows:

Y_{8 \times 1} = C_{8} X_{8 \times 1},

(29)

where

Y_{8 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \\ y_{6} \\ y_{7} \end{matrix}]

,

X_{8 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \\ x_{7} \end{matrix}]

, and

C_{8} = [\begin{matrix} a_{8} & b_{8} & c_{8} & d_{8} & e_{8} & f_{8} & g_{8} & h_{8} \\ a_{8} & d_{8} & g_{8} & - g_{8} & - d_{8} & - p_{8} & - d_{8} & - g_{8} \\ a_{8} & f_{8} & - f_{8} & - p_{8} & - f_{8} & f_{8} & p_{8} & f_{8} \\ a_{8} & h_{8} & - b_{8} & - g_{8} & c_{8} & f_{8} & - d_{8} & - e_{8} \\ a_{8} & - g_{8} & - d_{8} & d_{8} & g_{8} & - p_{8} & g_{8} & d_{8} \\ a_{8} & - e_{8} & - h_{8} & d_{8} & - b_{8} & f_{8} & g_{8} & - c_{8} \\ a_{8} & - c_{8} & e_{8} & - g_{8} & - h_{8} & f_{8} & - d_{8} & b_{8} \\ f_{8} & - a_{8} & a_{8} & - a_{8} & a_{8} & - a_{8} & a_{8} & - a_{8} \end{matrix}]

with

a_{8} = \sqrt{\frac{2}{15}} \approx

0.3651,

b_{8} = \frac{2}{\sqrt{15}} \cos \frac{π}{15} \approx

0.5051,

c_{8} = \frac{2}{\sqrt{15}} \cos \frac{2 π}{15} \approx

0.4718,

d_{8} = \frac{2}{\sqrt{15}} \cos \frac{π}{5} \approx

0.4178,

e_{8} = \frac{2}{\sqrt{15}} \cos \frac{4 π}{15} \approx

0.3455,

f_{8} = \frac{2}{\sqrt{15}} \cos \frac{π}{3} \approx

0.2582,

g_{8} = \frac{2}{\sqrt{15}} \cos \frac{2 π}{5} \approx

0.1596,

h_{8} = \frac{2}{\sqrt{15}} \cos \frac{7 π}{15} \approx

0.0540,

p_{8} = \frac{1}{\sqrt{15}} \approx

0.5164.

To design the 8-point DCT-VI algorithm, we first permute the rows and columns of the initial DCT-VI matrix and invert the signs of certain rows or columns. As a result, we obtain a matrix in which submatrices correspond to patterns identified in [29,30]. These patterns are then extracted, and their factorizations, as presented in [29,30], are applied. Finally, the factorizations and data-flow graphs of the individual submatrices are merged to form the factorization and data-flow graph of the original 8-point DCT-VI matrix.

To implement this approach, we reorder the columns and rows of the matrix

C_{8}

using the permutations

π_{5} = (\begin{matrix} 1 & 2 \\ 1 & 6 \end{matrix} \begin{matrix} 3 & 4 & 5 & 6 & 7 & 8 \\ 4 & 7 & 2 & 3 & 5 & 8 \end{matrix}) and π_{6} = (\begin{matrix} 1 & 2 \\ 1 & 4 \end{matrix} \begin{matrix} 3 & 4 & 5 & 6 & 7 & 8 \\ 6 & 7 & 2 & 5 & 3 & 8 \end{matrix})

.

Then, the signs of the sixth and seventh columns of

C_{8}

are inverted. The corresponding permutation matrices

P_{8}^{(0)}

and

P_{8}^{(1)}

are expressed as follows:

P_{8}^{(0)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ - 1 \\ - 1 \\ 1 \end{matrix}] and P_{8}^{(1)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Next, the resulting matrix

C_{8}^{(a)}

is expressed as the sum of two matrices:

C_{8}^{(a)} = C_{8}^{(b)} + C_{8}^{(c)},

(30)

where

C_{8}^{(a)} = [\begin{matrix} a_{8} & f_{8} & d_{8} & g_{8} & b_{8} & - c_{8} & - e_{8} & h_{8} \\ a_{8} & f_{8} & - g_{8} & - d_{8} & h_{8} & b_{8} & - c_{8} & - e_{8} \\ a_{8} & f_{8} & d_{8} & g_{8} & - e_{8} & h_{8} & b_{8} & - c_{8} \\ a_{8} & f_{8} & - g_{8} & - d_{8} & - c_{8} & - e_{8} & h_{8} & b_{8} \\ a_{8} & - p_{8} & - g_{8} & - d_{8} & d_{8} & - g_{8} & d_{8} & - g_{8} \\ a_{8} & - p_{8} & d_{8} & g_{8} & - g_{8} & d_{8} & - g_{8} & d_{8} \\ a_{8} & f_{8} & - p_{8} & p_{8} & f_{8} & f_{8} & f_{8} & f_{8} \\ f_{8} & - a_{8} & - a_{8} & a_{8} & - a_{8} & - a_{8} & - a_{8} & - a_{8} \end{matrix}],

C_{8}^{(b)} = [\begin{matrix} a_{8} & f_{8} & d_{8} & g_{8} \\ a_{8} & f_{8} & - g_{8} & - d_{8} \\ a_{8} & f_{8} & d_{8} & g_{8} \\ a_{8} & f_{8} & - g_{8} & - d_{8} \\ a_{8} & - p_{8} & - g_{8} & - d_{8} & d_{8} & - g_{8} & d_{8} & - g_{8} \\ a_{8} & - p_{8} & d_{8} & g_{8} & - g_{8} & d_{8} & - g_{8} & d_{8} \\ a_{8} & f_{8} & - p_{8} & p_{8} & f_{8} & f_{8} & f_{8} & f_{8} \\ f_{8} & - a_{8} & - a_{8} & a_{8} & - a_{8} & - a_{8} & - a_{8} & - a_{8} \end{matrix}],

C_{8}^{(c)} = [\begin{matrix} b_{8} & - c_{8} & - e_{8} & h_{8} \\ h_{8} & b_{8} & - c_{8} & - e_{8} \\ - e_{8} & h_{8} & b_{8} & - c_{8} \\ - c_{8} & - e_{8} & h_{8} & b_{8} \end{matrix}] .

The submatrix

C_{4}^{(d)} = [\begin{matrix} b_{8} & - c_{8} & - e_{8} & h_{8} \\ h_{8} & b_{8} & - c_{8} & - e_{8} \\ - e_{8} & h_{8} & b_{8} & - c_{8} \\ - c_{8} & - e_{8} & h_{8} & b_{8} \end{matrix}]

of the matrix

C_{8}^{(c)}

is the a circular convolution matrix [30] for N = 4 which can be represented as

H_{4} = [\begin{matrix} h_{0} & h_{3} & h_{2} & h_{1} \\ h_{1} & h_{0} & h_{3} & h_{2} \\ h_{2} & h_{1} & h_{0} & h_{3} \\ h_{3} & h_{2} & h_{1} & h_{0} \end{matrix}]

with entries

h_{0} = b_{8}

,

h_{1} = h_{8}

,

h_{2} = - e_{8}

, and

h_{3} = - c_{8}

.

Using the entries of

H_{4}

, we define the vector

A_{5 \times 1}

of scaling factors to factorize the circular convolution matrix

C_{4}^{(d)}

:

A_{5 \times 1} = 1 / 4 \times diag (1, 1, - 2, - 2, 2) \times A_{5 \times 4}^{(0)} A_{4} \times {[h_{0}, h_{1}, h_{2}, h_{3}]}^{T},

(31)

where

A_{5 \times 4}^{(0)} = H_{2} \oplus [\begin{matrix} 1 & 1 \\ 1 & - 1 \\ 1 \end{matrix}]

and

A_{4} = H_{2} \otimes I_{2}

.

Then, the matrix

C_{4}^{(d)}

is factorized as follows:

C_{4}^{(d)} = A_{4} A_{4 \times 5} \times diag (A_{5 \times 1}) \times A_{5 \times 4} P_{4} A_{4},

(32)

where

A_{5 \times 1} = (s_{9}^{(8)}, s_{10}^{(8)}, s_{11}^{(8)}, s_{12}^{(8)}, s_{13}^{(8)})

with

s_{9}^{(8)} = (b_{8} + h_{8} - e_{8} - c_{8}) / 4, s_{10}^{(8)} = (b_{8} - h_{8} - e_{8} - c_{8}) / 4, s_{11}^{(8)} = (- b_{8} - h_{8} - e_{8} - c_{8}) / 2, s_{12}^{(8)} = (- b_{8} + h_{8} - e_{8} - c_{8}) / 2, s_{13}^{(8)} = (b_{8} + e_{8}) / 2 .

The matrices

A_{5 \times 4}

and

A_{4 \times 5}

are constructed as

A_{5 \times 4} = H_{2} \oplus T_{3 \times 2}

and

A_{4 \times 5} = H_{2} \oplus T_{2 \times 3}

, respectively, and

P_{4}

is

P_{4} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

Further, the submatrix

B_{6} = [\begin{matrix} d_{8} & g_{8} & b_{8} & - c_{8} & - e_{8} & h_{8} \\ - g_{8} & - d_{8} & h_{8} & b_{8} & - c_{8} & - e_{8} \\ d_{8} & g_{8} & - e_{8} & h_{8} & b_{8} & - c_{8} \\ - g_{8} & - d_{8} & - c_{8} & - e_{8} & h_{8} & b_{8} \\ - g_{8} & - d_{8} & d_{8} & - g_{8} & d_{8} & - g_{8} \\ d_{8} & g_{8} & - g_{8} & d_{8} & - g_{8} & d_{8} \end{matrix}]

of the matrix

C_{8}^{(a)}

is factorized by decomposing its 2 × 2 submatrices. It can be observed that the submatrix

A_{2} = [\begin{matrix} d_{8} & g_{8} \\ {- g}_{8} & - d_{8} \end{matrix}]

exhibits structural similarity to the template

[\begin{matrix} a & b \\ - b & - a \end{matrix}]

. The submatrix

B_{2} = [\begin{matrix} d_{8} & - g_{8} \\ - g_{8} & d_{8} \end{matrix}]

is similar to the template

[\begin{matrix} a & b \\ b & a \end{matrix}]

. Then, the submatrices

A_{2}

and

B_{2}

are decomposed as

A_{2} = {\bar{I}}_{2} H_{2} [(d_{8} + g_{8}) / 2 \oplus (d_{8} - g_{8}) / 2] H_{2}, B_{2} = H_{2} [(d_{8} - g_{8}) / 2 \oplus (d_{8} + g_{8}) / 2] H_{2},

(33)

where

{\bar{I}}_{2} = [\begin{matrix} 1 & 0 \\ 0 & - 1 \end{matrix}]

. Using expressions (32) and (33), the factorization of the matrix

B_{6}

is obtained:

B_{6} = W_{6 \times 8} W_{8}^{(1)} W_{8 \times 9} \times diag (s_{5}^{(8)}, s_{6}^{(8)}, \dots {, s}_{13}^{(8)}) \times W_{9 \times 8} W_{8}^{(0)} W_{8 \times 6},

(34)

where

s_{5}^{(8)} = (d_{8} + g_{8}) / 2, s_{6}^{(8)} = (d_{8} - g_{8}) / 2, s_{7}^{(8)} = (d_{8} - g_{8}) / 2, s_{8}^{(8)} = (d_{8} + g_{8}) / 2,

W_{8}^{(1)} = {\bar{I}}_{2} H_{2} \oplus H_{2} \oplus A_{4}, W_{8}^{(0)} = H_{2} \oplus H_{2} \oplus P_{4} A_{4}, W_{9 \times 8} = I_{4} \oplus H_{2} \oplus T_{3 \times 2}, W_{8 \times 9} = I_{4} \oplus H_{2} \oplus T_{2 \times 3},

W_{6 \times 8} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \end{matrix}], W_{8 \times 6} = [\begin{matrix} 1 \\ 1 \\ 1 & 1 \\ 1 & 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

As a result, the matrix of the eight-point DCT-VI is factorized as

Y_{8 \times 1} = P_{8}^{(1)} W_{8 \times 16} W_{16 \times 14} W_{14 \times 16} D_{16} W_{16 \times 8} W_{8} W_{8}^{(2)} P_{8}^{(0)} X_{8 \times 1},

(35)

where

D_{16} = diag (s_{0}^{(8)}, s_{1}^{(8)}, \dots {, s}_{15}^{(8)}), s_{0}^{(8)} = f_{8}, s_{1}^{(8)} = a_{8}, s_{2}^{(8)} = f_{8}, s_{3}^{(8)} = - p_{8}, s_{4}^{(8)} = p_{8}, s_{14}^{(8)} = - a_{8}, s_{15}^{(8)} = f_{8},

W_{8}^{(2)} = I_{4} \oplus [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & - 1 \\ 1 & - 1 \end{matrix}], W_{8} = I_{2} \oplus H_{2} \oplus H_{2} \oplus I_{2},

W_{16 \times 14} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] \oplus H_{2} \oplus (H_{2} \otimes I_{2}) \oplus I_{2},

W_{14 \times 16} = [\begin{matrix} 1 \\ 1 & 1 \\ 1 & 1 \\ 1 \end{matrix}] \oplus {\bar{I}}_{2} H_{2} \oplus I_{2} \oplus H_{2} \oplus T_{2 \times 3} \oplus I_{2},

W_{8 \times 16} = [\begin{matrix} 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 \end{matrix}],

W_{16 \times 8} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ - 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 & 1 & 1 \\ 1 \end{matrix}] .

Figure 11 shows the data-flow graph of the 8-point DCT-VI algorithm based on factorization (35), reducing the number of additions from 56 to 38 and multiplications from 64 to 16.

The data-flow graph shown in Figure 11 includes permutation modules at the input and output that perform index reordering of data samples, enabling regular computational structures in fast transform implementations without introducing additional arithmetic operations. As a result, the graph contains butterfly modules on both the left and right sides of the scaling-factor line. The repeated entries allow efficient computation using adder trees and a circular convolution structure module, similar to the 4- to 7-point DCT-VI algorithm cases.

4. Results

In this section, we experimentally confirm the correctness of the DCT-VI matrix factorizations presented in Section 3. Using MATLAB R2023b, the original DCT-VI matrices were generated according to (3), (9), (13), (19), (25) and (29) and were subsequently compared with the products of the corresponding factorized matrices obtained from (8), (12), (18), (24), (28) and (35). For initial matrix sizes from three to eight, the factorized matrices were found to coincide exactly with the original matrices, thereby confirming the correctness of the proposed algorithms.

The arithmetic complexity of the proposed DCT-VI algorithms was then evaluated and compared with that of direct matrix–vector multiplication. The number of multiplications was determined by counting the vertices labeled with multiplicative factors (circles) in the associated data-flow graphs, while the number of additions was estimated by counting the vertices at which two edges merge. Contributions corresponding to multiplications by powers of two were accounted for accordingly.

Table 1 summarizes the number of multiplications and additions for the proposed DCT-VI algorithms. The first column lists the transform size N. Each row of Table 1 reports the results for the corresponding N-point DCT-VI algorithm. The second and third columns present the number of additions and multiplications required for the direct matrix–vector implementation of the DCT-VI. The fourth and fifth columns report the numbers of additions and multiplications required by the proposed algorithms, with the percentage differences relative to direct matrix–vector multiplication indicated in parentheses. A plus sign denotes an increase in the number of operations, whereas a minus sign indicates a reduction.

Moreover, the N-point DCT-VI can be realized as the real part of a (2N − 1)-point DFT [15]. The corresponding arithmetic operation counts for these implementations are reported in the sixth and seventh columns of Table 1. For N = 8, the generalized split-radix DFT algorithm [22] is used. For N = 3, 4, and 5, DFT algorithms from [27] are considered, whereas for N = 6 and 7, Winograd DFT algorithms [26,33] are employed.

Table 2 presents a comparison of the numbers of additions and multiplications required by other existing DCT-VI algorithms, together with the corresponding percentage differences. The first column of Table 2 lists the authors of the existing algorithms, while the second column provides the corresponding reference and year of publication. In [21], four-point and eight-point DCT-VI algorithms were developed; the numbers of additions and multiplications required for these algorithms are reported in the third, fourth, seventh, and eighth columns. In [28], a five-point DCT-VI algorithm was proposed, and the corresponding numbers of arithmetic operations are listed in the fifth and sixth columns of Table 2. In parentheses, the additional number of multiplications required by these existing algorithms is indicated, since the normalization constant is not taken into account in [21,28].

In Table 3, we present the number of multiplications and additions required by existing algorithms for computing short-length DCT-II and DCT-VIII transforms. The fast DCT-II algorithms considered are reported in [22]. For N = 4 and N = 8, the number of arithmetic operations is evaluated for radix-2 algorithms, whereas for N = 5 and N = 7, it is evaluated for radix-q algorithms.

The fast DCT-VIII algorithms considered are presented in [21,34]. For N = 3, 4, 5, 6, and 7, the number of arithmetic operations is evaluated for algorithms based on the structural approach [34]. For N = 8, the number of arithmetic operations is evaluated for the DFT-based algorithm [21]. As can be seen from Table 1 and Table 3, the number of arithmetic operations required by the proposed DCT-VI algorithms is similar to that of the existing DCT-II and DCT-VIII algorithms.

5. Discussion

The analysis of the results has shown the following.

First, the proposed DCT-VI algorithms achieve a substantial reduction in both the number of multiplications and additions compared to direct matrix–vector multiplication. In particular, the number of multiplications is reduced by nearly 66%, while the number of additions decreases by approximately 9%.

Second, the developed fast DCT-VI algorithms were compared with the DFT-based algorithms reported in [15]. In this case, a significant reduction in the number of additions is achieved (more than twofold). In addition, we considered the DCT-VI algorithms proposed in [21,28]. In those studies, the normalization constants were not taken into account during the construction of the fast algorithms. Therefore, the corresponding number of multiplications is added in parentheses to equalize the experimental results. As a result, the computational complexity of some existing algorithms and the proposed ones becomes approximately the same.

Table 4 presents the memory requirements of the proposed DCT-VI algorithms. The number of required memory cells was determined based on the pseudocode implementations described in Appendix A. On average, the proposed algorithms require approximately 40% more memory than the direct matrix–vector product approach, when evaluated over input sizes ranging from three to eight.

It is worth noting that memory consumption is highly dependent on implementation-specific factors, including the target platform, programming technique, and the developer’s expertise. Unlike arithmetic complexity, which constitutes an objective and implementation-independent performance metric, memory usage may vary substantially across different execution environments. The proposed DCT-VI algorithms are primarily intended for software implementations, where memory requirements, execution latency, and computational resources can differ considerably depending on system architecture and optimization strategies. In practice, memory cells may be shared or reused across multiple stages of computation, and the algorithms may be executed in sequential, parallel, or hybrid sequential–parallel modes, each of which influences overall latency and memory utilization. Consequently, the assessment of memory efficiency is inherently subjective. In contrast, arithmetic complexity remains the most reliable and consistent criterion for evaluating the efficiency of the proposed algorithms.

6. Conclusions

This paper presents novel fast algorithms for computing DCT-VI, with a particular focus on short-length input sequences (ranging from three to eight samples). A detailed, step-by-step description of each computational stage is provided, including intermediate outputs, thereby ensuring transparency and reproducibility of the proposed algorithms. The proposed algorithms achieve a substantial reduction in the number of multiplications required for transform computation when compared with the direct matrix–vector products. A comparative complexity analysis demonstrates that, for input sequence lengths ranging from three to eight, the proposed fast algorithms reduce the number of multiplications by approximately 66% on average, while achieving an average reduction of nearly 9% in the number of additions relative to direct computation methods.

To further support practical implementation, data-flow graphs are introduced to illustrate the space–time structure of the computational processes. These graphs not only clarify the flow of operations but also enable accurate evaluation of arithmetic complexity in terms of multiplications and additions. Notably, each path from input to output in the proposed data-flow graphs contains only a single multiplication operation, which represents a significant advantage over alternative designs where multiple sequential multiplications may occur along the same path. This characteristic enables faster execution and facilitates efficient hardware implementation.

In addition, optimized pseudocode implementations incorporating variable reuse are developed to reduce memory requirements in software-based implementations [35]. The resulting algorithms are readily applicable to a wide range of signal processing tasks, including video and image coding.

Author Contributions

Conceptualization, A.C.; methodology, A.C. and V.K.; software, M.P.; validation, V.K. and M.P.; formal analysis, A.C., M.P. and V.K.; investigation, V.K., M.P. and A.C.; writing—original draft preparation, M.P. and A.C.; writing—review and editing, A.C. and M.P.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

In the appendix, we present the pseudocode for the proposed fast DCT-VI algorithms. These pseudocodes are used in Section 4 to evaluate the number of memory cells required by the algorithms introduced in Section 3. To minimize the memory requirements of the implementations, variables are reused where possible. As a result, the variables that must be retained as outputs, along with their required ordering, are explicitly specified. Thus, in Table A1, we show the pseudocode of the proposed three-point DCT-VI algorithm. The inputs of the pseudocode are

x_{0}, x_{1},

and

x_{2} .

The scaling factors are

s_{0}^{(3)}

,

s_{1}^{(3)}

, and

s_{2}^{(3)}

. We reuse variables and, as a result, the outputs of the pseudocode are

x_{0}, x_{1},

and

x_{2} .

Variables

p_{0}

and

p_{1}

are additional.

Table A1. The pseudocode for the constructed fast DCT-VI algorithm for N = 3 with variable reuse.

Step 1	Step 2
$p_{0} = (x_{1} + x_{2}) / 2$ , $p_{1} = x_{1} - x_{2}$ ,	$x_{1} = x_{0} - p_{0},$
$x_{2} = x_{0} s_{0}^{(3)} - p_{1} s_{1}^{(3)}$ , $x_{0} = x_{0} s_{1}^{(3)} + p_{1} s_{2}^{(3)}$ ;	$x_{0} = x_{0} + p_{0}$ .

We present the pseudocode for the four-point DCT-VI algorithm in Table A2. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

and

x_{3} .

Also the scaling factors

s_{0}^{(4)}

,

s_{1}^{(4)}

,

s_{2}^{(4)}

,

s_{3}^{(4)}

,

s_{4}^{(4)}

, and

s_{5}^{(4)}

are used to create the pseudocode because

s_{6}^{(4)} = s_{1}^{(4)}

. The variables were reused; additional variables are

p_{1}, p_{2},

and

p_{3} .

The outputs of the pseudocode are

x_{0}, x_{1}, x_{2},

and

x_{3} .

.

Table A2. The pseudocode for the developed fast 4-point DCT-VI algorithm with variable reuse.

Step 1	Step 2
$p_{0} = x_{0} s_{0}^{(4)},$ $p_{1} = x_{0} s_{1}^{(4)},$ $p_{2} = x_{1} - x_{2} + x_{3}, p_{3} = (- x_{2} - x_{3}) s_{3}^{(4)}$ , $p_{4} = (x_{1} + x_{2}) s_{4}^{(4)}$ , $p_{5} = (x_{1} - x_{3}) s_{5}^{(4)}$ ;	$x_{3} = p_{2} s_{1}^{(4)}$ , $x_{0} = p_{2} s_{2}^{(4)},$ $x_{3} = p_{0} - x_{3}$ , $x_{0} = p_{1} + x_{0}$ , $x_{1} = - p_{5} + p_{3} + x_{0},$ $x_{2} = - p_{4} - p_{3} + x_{0},$ $x_{0} = p_{4} + p_{5} + x_{0};$

We present the pseudocode for the designed five-point DCT-VI algorithm in Table A3. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

and

x_{4} .

We use the scaling factors

s_{0}^{(5)}

,

s_{1}^{(5)}

,

s_{5}^{(5)}

,

s_{6}^{(5)}

, and

s_{7}^{(5)}

because

s_{2}^{(5)} = - s_{1}^{(5)}

,

s_{3}^{(5)} = s_{1}^{(5)}

, and

s_{4}^{(5)} = s_{0}^{(5)}

. The outputs of the pseudocode are

x_{0}, x_{1}, x_{2}, x_{3},

and

x_{4}

. Variables

p_{0}

,

p_{5}

,

p_{6}

, and

p_{7}

is additional.

Table A3. The pseudocode for the designed 5-point DCT-VI algorithm with variable reuse.

Step 1	Step 2
$p_{0} = - x_{1} + x_{2} + x_{4},$	$x_{4} = x_{0} s_{1}^{(5)} + (p_{0} - x_{3}) s_{0}^{(5)}$ , $x_{0} = x_{3} s_{1}^{(5)} + x_{0} s_{0}^{(5)}$ ,
$p_{5} = (- x_{1} - x_{2}) s_{5}^{(5)}$ , $p_{6} = (x_{4} + x_{1}) s_{6}^{(5)}$ ,	$x_{1} = x_{0} - p_{0} s_{1}^{(5)} - x_{3}$ , $x_{2} = x_{0} - p_{5} - p_{6}$ ,
$p_{7} = (x_{4} - x_{2}) s_{7}^{(5)}$ ;	$x_{3} = x_{0} + p_{6} + p_{7}$ , $x_{0} = x_{0} + p_{5} - p_{7}$ .

In Table A4, the pseudocode of the proposed six-point DCT-VI algorithm is presented. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

x_{4},

and

x_{5} .

The scaling factors are

s_{0}^{(6)}

,

s_{1}^{(6)}

, …,

s_{11}^{(6)}

, because

s_{12}^{(6)} = - s_{10}^{(6)}

. We reuse variables. So, the outputs of the pseudocode are

x_{0}, x_{3}, x_{4}, x_{2}, x_{1},

and

p_{0} .

Variables

p_{1}

,

p_{2}

, …,

p_{5}

are additional.

Table A4. The pseudocode for the developed 6-point DCT-VI algorithm with variable reuse.

Step 1	Step 2	Step 3	Step 4	Step 5
$p_{4} = x_{1}$ , $p_{5} = - x_{2}$ , $x_{1} = - x_{4} + x_{3} + x_{5} + p_{4} + p_{5},$ $x_{2} = - x_{4} - x_{3},$ $x_{3} = - x_{4} - x_{5},$ $x_{5} = - x_{4} - p_{5},$ $x_{4} = - x_{4} - p_{4}$ ;	$p_{1} = x_{2} + x_{4},$ $p_{2} = x_{3} + x_{5},$ $p_{3} = (x_{2} + x_{3}) s_{5}^{(6)}$ , $p_{4} = (x_{4} + x_{5}) s_{8}^{(6)}$ , $p_{5} = (p_{1} + p_{2}) s_{11}^{(6)}$ , $p_{0} = x_{0} s_{0}^{(6)} - x_{1} s_{1}^{(6)}$ ;	$x_{0} = x_{0} s_{1}^{(6)} + x_{1} s_{2}^{(6)},$ $x_{1} = x_{2} s_{3}^{(6)} + p_{3}$ , $x_{2} = x_{3} s_{4}^{(6)} + p_{3}$ , $x_{3} = x_{4} s_{6}^{(6)} + p_{4}$ , $x_{4} = x_{5} s_{7}^{(6)} + p_{4}$ , $x_{5} = p_{1} s_{9}^{(6)} + p_{5}$ ;	$p_{1} = p_{2} s_{10}^{(6)} + p_{5}$ , $p_{2} = x_{1} + x_{5},$ $p_{3} = x_{2} + p_{1},$ $p_{4} = x_{3} + x_{5}$ , $p_{5} = x_{4} + p_{1};$	$x_{1} = x_{0} - p_{2},$ $x_{2} = x_{0} - p_{3},$ $x_{3} = x_{0} - p_{4},$ $x_{4} = x_{0} - p_{5},$ $x_{0} = x_{0} + p_{2} + p_{3} + p_{4} + p_{5} .$

In Table A5, the pseudocode of the constructed seven-point DCT-VII algorithm is shown. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

x_{4},

x_{5},

and

x_{6} .

We use only scaling factors

s_{0}^{(7)}

,

s_{1}^{(7)}

, …,

s_{8}^{(7)}

, and

s_{9}^{(7)}

because

s_{10}^{(7)} = - s_{0}^{(7)}

. We reuse variables. So, the outputs of the pseudocode are

x_{1}, x_{3}, x_{4}, x_{2}, x_{5}, x_{6},

and

p_{0} .

Variables

p_{1}

,

p_{2}

, …,

p_{6}

are additional.

Table A5. The pseudocode for the designed 7-point DCT-VI algorithm with variable reuse.

Step 1	Step 2	Step 3	Step 4	Step 5
$p_{1} = x_{1} + x_{5},$ $p_{2} = x_{3} - x_{2},$ $p_{3} = - x_{4} - x_{6}$ , $p_{4} = x_{1} - x_{5},$ $p_{5} = - x_{3} - x_{2},$ $p_{6} = - x_{4} + x_{6}$ ;	$x_{1} = p_{1} + p_{2} + p_{3},$ $x_{2} = p_{1} - p_{3},$ $x_{3} = p_{1} - p_{2},$ $p_{1} = (x_{2} + x_{3}) s_{5}^{(7)}$ , $x_{2} = x_{2} s_{3}^{(7)}$ , $x_{3} = x_{3} s_{4}^{(7)}$ , $x_{4} = (p_{4} - p_{5} + p_{6}) s_{6}^{(7)}$ , $x_{5} = p_{4} - p_{6},$ $x_{6} = p_{4} + p_{5},$ $p_{3} = (x_{5} + x_{6}) s_{9}^{(7)}$ $x_{5} = x_{5} s_{7}^{(7)}$ , $x_{6} = x_{6} s_{8}^{(7)}$ ;	$p_{0} = - x_{1} s_{0}^{(7)}$ , $x_{1} = x_{1} s_{2}^{(7)}$ , $x_{2} = p_{1} + x_{2},$ $x_{3} = p_{1} + x_{3},$ $x_{5} = p_{3} + x_{5},$ $x_{6} = p_{3} + x_{6};$	$p_{1} = x_{1} + x_{2} + x_{3},$ $p_{2} = x_{1} - x_{2},$ $p_{3} = x_{1} - x_{3},$ $p_{4} = x_{4} + x_{5} + x_{6},$ $p_{5} = x_{5} - x_{4},$ $p_{6} = x_{4} - x_{6};$	$p_{0} = x_{0} s_{1}^{(7)} + p_{0}$ , $x_{0} = x_{0} s_{0}^{(7)}$ , $x_{1} = x_{0} + p_{1},$ $x_{2} = x_{0} + p_{2},$ $x_{3} = x_{0} + p_{3},$ $x_{4} = x_{1} - p_{4},$ $x_{5} = x_{2} - p_{5},$ $x_{6} = x_{3} - p_{6}$ , $x_{1} = x_{1} + p_{4},$ $x_{2} = x_{2} + p_{5},$ $x_{3} = x_{3} + p_{6}$ .

In Table A6, the pseudocode of the constructed eight-point DCT-VI algorithm is presented. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

x_{4},

x_{5},

x_{6}

, and

x_{7} .

We use scaling factors

s_{0}^{(8)}

,

s_{1}^{(8)}

,

s_{3}^{(8)}

,

s_{5}^{(8)}

,

s_{6}^{(8)}

,

s_{9}^{(8)}

,

s_{10}^{(8)}

,

s_{11}^{(8)}

,

s_{12}^{(8)}

, and

s_{13}^{(8)}

because

s_{2}^{(8)} = s_{0}^{(8)}

,

s_{4}^{(8)} = - s_{3}^{(8)}

,

s_{7}^{(8)} = s_{6}^{(8)}

,

s_{8}^{(8)} = s_{5}^{(8)}

,

s_{14}^{(8)} = - s_{1}^{(8)}

, and

s_{15}^{(8)} = s_{0}^{(8)}

. We reuse variables. So, the outputs of the pseudocode are

x_{0}, x_{1}, x_{2}, x_{3}, x_{4}, x_{5}, x_{6},

and

x_{7} .

Variables

p_{0}

, …,

p_{8}

,

p_{12}

,

p_{13}

,

p_{14}

,

p_{15}

are additional.

Table A6. The pseudocode for the constructed 8-point DCT-VI algorithm with variable reuse.

Step 1	Step 2	Step 3	Step 4	Step 5	Step 6
$p_{0} = x_{0} s_{0}^{(8)},$ $p_{1} = x_{0} s_{1}^{(8)},$ $p_{2} = x_{5} s_{0}^{(8)},$ $p_{3} = x_{5} s_{3}^{(8)},$ $p_{5} = s_{5}^{(8)} (x_{3} + x_{6}),$ $x_{3} = x_{3} - x_{6},$ $p_{6} = x_{3} s_{6}^{(8)},$ $p_{4} = x_{3} s_{3}^{(8)}$ ;	$p_{12} = x_{1} - x_{4},$ $p_{13} = x_{7} - x_{2},$ $p_{14} = - x_{7} - x_{2},$ $p_{15} = x_{1} + x_{4},$ $p_{7} = p_{12} + p_{13},$ $p_{8} = p_{12} - p_{13};$	$x_{1} = p_{7} s_{9}^{(8)},$ $x_{2} = p_{8} s_{10}^{(8)},$ $x_{4} = p_{8} s_{5}^{(8)},$ $x_{6} = p_{14} s_{11}^{(8)},$ $x_{7} = p_{15} s_{12}^{(8)},$ $p_{13} = s_{13}^{(8)} (p_{14} + p_{15}),$ $p_{14} = - s_{1}^{(8)} (x_{5} + {x_{3} + p}_{7});$	$p_{15} = s_{0}^{(8)} p_{7},$ $p_{7} = s_{6}^{(8)} p_{7},$ $p_{2} = p_{1} + p_{2},$ $p_{3} = p_{1} + p_{3},$ $p_{1} = p_{5} + p_{6},$ $p_{6} = p_{6} - p_{5},$ $p_{5} = x_{1} + x_{2};$	$x_{2} = x_{1} - x_{2},$ $x_{6} = p_{13} + x_{6},$ $x_{7} = p_{13} + x_{7},$ $x_{1} = p_{7} + x_{4},$ $x_{4} = p_{7} - x_{4},$ $x_{0} = p_{5} + x_{6},$ $x_{3} = x_{2} + x_{7},$ $x_{5} = p_{5} - x_{6},$ $x_{6} = x_{2} - x_{7},$ $x_{7} = p_{0} + p_{14};$	$p_{0} = p_{2} + p_{1}$ , $p_{5} = p_{2} + p_{6}$ , $x_{0} = p_{0} + x_{0}$ , $x_{6} = p_{5} + x_{6}$ , $x_{3} = p_{5} + x_{3}$ , $x_{5} = p_{0} + x_{5}$ , $x_{1} = p_{3} + p_{6} + x_{1}$ , $x_{4} = p_{3} + p_{1} + x_{4}$ , $x_{2} = p_{4} + p_{2} + p_{15}$ .

References

Richardson, I.E. Coding Video: A Practical Guide to HEVC and Beyond; John Wiley & Sons Ltd.: Chichester, UK, 2024. [Google Scholar]
Choi, K. A study on fast and low-complexity algorithms for Versatile Video Coding. Sensors 2022, 22, 8990. [Google Scholar] [CrossRef] [PubMed]
Zeng, Y.; Sun, H.; Katto, J.; Fan, Y. Approximated reconfigurable transform architecture for VVC. In 2021 IEEE International Symposium on Circuits and Systems (ISCAS); IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
Werda, I.; Belghith, F.; Maraoui, A.; Masmoudi, N. DCT-II transform hardware-based acceleration for VVC standard. In 2021 IEEE International Conference on Design & Test of Integrated Micro & Nano-Systems (DTS); IEEE: New York, NY, USA, 2021; pp. 1–5. [Google Scholar] [CrossRef]
Abramova, V.; Lukin, V.; Abramov, S.; Kryvenko, S.; Lech, P.; Okarma, K. A fast and accurate prediction of distortions in DCT-based lossy image compression. Electronics 2023, 12, 2347. [Google Scholar] [CrossRef]
Li, H.; Wei, G.; Wang, T.; Bui, T.O.; Zeng, Q.; Wang, R. Reducing video coding complexity based on CNN-CBAM in HEVC. Appl. Sci. 2023, 13, 10135. [Google Scholar] [CrossRef]
Wang, X. Strategies for enhancing deep video encoding efficiency using the convolutional neural network in a hyperautomation mechanism. Sci. Rep. 2025, 15, 1079. [Google Scholar] [CrossRef] [PubMed]
Huo, S.; Liu, H.; Gu, J.; Jin, D.; Lei, M.; Huang, B. Deep network-based adaptive quantization for practical video coding. IEEE Trans. Circuits Syst. Video Technol. 2025; early access. [Google Scholar] [CrossRef]
Das, T.; Liang, X.; Choi, K. Versatile Video coding-post processing feature fusion: A post-processing convolutional neural network with progressive feature fusion for efficient video enhancement. Appl. Sci. 2024, 14, 8276. [Google Scholar] [CrossRef]
Zieliński, T.P. Digital Signal Processing—From Theory to Applications, 2nd ed.; WKL: Warszawa, Poland, 2005. [Google Scholar]
Zhao, X.; Kim, S.-H.; Zhao, Y.; Egilmez, H.E.; Koo, M.; Liu, S.; Lainema, J.; Karczewicz, M. Transform coding in the VVC standard. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3878–3890. [Google Scholar] [CrossRef]
Kolodziejski, W.; Domanski, R.; Agostini, L. FastGW: A machine learning-based early skip for the AV1 global warped motion compensation. IEEE Trans. Circuits Syst. I Reg. Pap. 2025, 72, 977–988. [Google Scholar] [CrossRef]
Bross, B.; Wang, Y.-K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.-R. Overview of the Versatile Video Coding (VVC) Standard and its applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
Zhang, Z.; Zhao, X.; Li, X.; Li, L.; Luo, Y.; Liu, S.; Li, Z. Fast DST-7/DCT-8 with dual implementation support for versatile video coding. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 355–368. [Google Scholar] [CrossRef]
Park, W.; Lee, B.; Kim, M. Fast computation of integer DCT-V, DCT-VIII, and DST-VII for video coding. IEEE Trans. Image Process. 2019, 28, 5839–5851. [Google Scholar] [CrossRef] [PubMed]
Alshina, E.; Sullivan, G.J.; Ohm, J.-R.; Boyce, J.; Chen, J. Algorithm description of joint exploration test model 4. In Proceedings of the JVET-D1001, Joint Video Exploration Team (JVET) of ITU-T SG16 WP3 and ISO/IEC JTC1/SC29/WG11, 4th Meeting, Chengdu, China, 15–21 October 2016. [Google Scholar]
Britanak, V.; Yip, P.C.; Rao, K.R. Discrete Cosine and Sine Transforms: General Properties, Fast Algorithms and Integer Approximations; Elsevier/Academic Press: Amsterdam, The Netherlands, 2007. [Google Scholar]
Murty, M.N.; Panda, H. Mapping between discrete cosine transform of Type-VI/VII and discrete Fourier transform. Int. J. Eng. Res. Appl. 2016, 6, 60–62. [Google Scholar]
Chivukula, R.K.; Reznik, Y.A. Fast computing of discrete cosine and sine transforms of types VI and VII. In SPIE 8135, Applications of Digital Image Processing XXXIV; SPIE: Bellingham, WA, USA, 2011; pp. 1–10. [Google Scholar] [CrossRef]
Reznik, Y.A. Relationship between DCT-II, DCT-VI, and DST-VII transforms. In Proceedings of the 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
Masera, M.; Martina, M.; Masera, G. Odd type DCT/DST for video coding: Relationships and low-complexity implementations. In 2017 IEEE International Workshop on Signal Processing Systems (SiPS); IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar] [CrossRef]
Bi, G.; Zeng, Y. Transforms and Fast Algorithms for Signal Analysis and Representations, 1st ed.; Birkhäuser: Boston, MA, USA, 2004. [Google Scholar]
Johnson, S.G.; Frigo, M. A modified split-radix FFT with fewer arithmetic operations. IEEE Trans. Signal Process. 2007, 55, 111–119. [Google Scholar] [CrossRef]
Stasiński, R. Split multiple radix FFT. In 2022 30th European Signal Processing Conference (EUSIPCO); IEEE: New York, NY, USA, 2022; pp. 2251–2255. [Google Scholar] [CrossRef]
Stasiński, R. Fast discrete Fourier transform algorithms requiring less than O(N log N) multiplications. arXiv 2023, arXiv:2303.02647. [Google Scholar]
Winograd, S. On computing the discrete Fourier transform. Math. Comput. 1978, 32, 175–199. [Google Scholar] [CrossRef]
Majorkowska-Mech, D.; Cariow, A. Some FFT algorithms for small-length real-valued sequences. Appl. Sci. 2022, 12, 4700. [Google Scholar] [CrossRef]
Saxena, A.; Fernandes, F.C.; Reznik, Y.A. Fast transforms for intra-prediction-based image and video coding. In Proceedings of the 2013 Data Compression Conference (DCC), Snowbird, UT, USA, 20–22 March 2013. [Google Scholar]
Cariow, A. Strategies for the synthesis of fast algorithms for the computation of the matrix-vector product. J. Signal Process. Theory Appl. 2014, 3, 1–19. [Google Scholar] [CrossRef]
Cariow, A.; Papliński, J. Algorithmic structures for realizing short-length circular convolutions with reduced complexity. Electronics 2021, 10, 2800. [Google Scholar] [CrossRef]
Zhang, Z.; Zhao, X.; Li, X.; Li, Z.; Liu, S. Fast adaptive multiple transform for versatile video coding. In 2019 Data Compression Conference (DCC); IEEE: New York, NY, USA, 2019; pp. 63–72. [Google Scholar] [CrossRef]
Polyakova, M.; Witenberg, A.; Cariow, A. The design of fast type-V discrete cosine transform algorithms for short-length input sequences. Electronics 2024, 13, 4165. [Google Scholar] [CrossRef]
Sidney Burrus, C. Fast Fourier Transforms (Burrus); LibreTexts: Davis, CA, USA, 2025; Available online: https://eng.libretexts.org/Bookshelves/Electrical_Engineering/Signal_Processing_and_Modeling/Fast_Fourier_Transforms_%28Burrus%29/06%3A_Winograd%27s_Short_DFT_Algorithms/6.02%3A_Winograd_Fourier_Transform_Algorithm_%28WFTA%29 (accessed on 2 February 2026).
Raciborski, M.; Polyakova, M.; Cariow, A. Fast DCT-VIII algorithms for short-length input sequences. Electronics 2026, 15, 207. [Google Scholar] [CrossRef]
Im, S.-K.; Pearmain, A.J. Unequal error protection with the H.264 flexible macroblock ordering. In Visual Communications and Image Processing 2005; SPIE: Beijing, China, 2005; Volume 5960, p. 596032. [Google Scholar] [CrossRef]

Figure 1. The data-flow graph for multiplying the inputs by the matrix

C_{2}^{(c)}

.

Figure 1. The data-flow graph for multiplying the inputs by the matrix

C_{2}^{(c)}

.

Figure 2. The data-flow graph for multiplying the inputs by the matrix

C_{3}^{(b)}

.

Figure 2. The data-flow graph for multiplying the inputs by the matrix

C_{3}^{(b)}

.

Figure 3. The data-flow graph for the 3-point DCT-VI algorithm.

Figure 4. Construction of adjacency matrices based on the data-flow graph.

Figure 5. The data-flow graph of the algorithm for the 3-point DCT-VI.

Figure 6. The data-flow graph of the four-point DCT-VI algorithm.

Figure 7. Data-flow graph of the algorithm for the five-point DCT-VI.

Figure 8. The data-flow graph of the algorithm for the 6-point DCT-VI.

Figure 9. The data-flow graph of the algorithm for the 6-point DCT-VI.

Figure 10. The data-flow graph implementing the 7-point DCT-VI.

Figure 11. The data-flow graph for the computation of the 8-point DCT-VI.

Table 1. The number of multiplications and additions for the matrix–vector product, the proposed and existing fast algorithms for DCT-VI.

N	Matrix–Vector Product		Proposed DCT-VI Algorithms		DFT-Based DCT-VI Algorithms
	Adds.	Mults.	Adds.	Mults.	Adds.	Mults.
3	6	9	6 (0%)	4 (−56%)	13	5
4	12	16	13 (+1%)	7 (−56%)	30	8
5	20	25	16 (−20%)	8 (−68%)	36	10
6	30	36	33 (+10%)	13 (−64%)	84	21
7	42	49	36 (−14%)	11 (−77%)	94	21
8	56	64	38 (−32%)	16 (−75%)	84	15

Table 2. The number of operations for the existing algorithms.

Algorithm	Reference, Year of Publication	N = 4		N = 5		N = 8
Algorithm	Reference, Year of Publication	Mults.	Adds.	Mults.	Adds.	Mults.	Adds.
Masera, Martina, Masera	[21], 2017	4 (+4)	13	–	–	16 (+8)	36
Saxena, Fernandes, Reznik	[28], 2013	–	–	3 (+5)	15	–	–
Proposed algorithm	–	7	13	8	16	16	38

Table 3. The number of multiplications and additions for the existing short-length DCT-II and DCT-VIII algorithms.

N	DCT-II		DCT-VIII
N	Mults.	Adds.	Mults.	Adds.
3	–	–	4	11
4	4	9	5	11
5	4	13	18	23
6	–	–	18	48
7	18	24	16	34
8	12	29	21	77

Table 4. The number of memory cells for the matrix–vector product and the proposed fast algorithms for DCT-VI.

N	Matrix–Vector Product	Proposed DCT-VI Algorithms
3	6	8 (+33%)
4	10	13 (+30%)
5	11	15 (+36%)
6	18	24 (+33%)
7	17	23 (+35%)
8	18	31 (+72%)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kitsela, V.; Polyakova, M.; Cariow, A. Fast Algorithms for Short-Length Type VI Discrete Cosine Transform. Electronics 2026, 15, 699. https://doi.org/10.3390/electronics15030699

AMA Style

Kitsela V, Polyakova M, Cariow A. Fast Algorithms for Short-Length Type VI Discrete Cosine Transform. Electronics. 2026; 15(3):699. https://doi.org/10.3390/electronics15030699

Chicago/Turabian Style

Kitsela, Valentyna, Marina Polyakova, and Aleksandr Cariow. 2026. "Fast Algorithms for Short-Length Type VI Discrete Cosine Transform" Electronics 15, no. 3: 699. https://doi.org/10.3390/electronics15030699

APA Style

Kitsela, V., Polyakova, M., & Cariow, A. (2026). Fast Algorithms for Short-Length Type VI Discrete Cosine Transform. Electronics, 15(3), 699. https://doi.org/10.3390/electronics15030699

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast Algorithms for Short-Length Type VI Discrete Cosine Transform

Abstract

1. Introduction

1.1. Related Papers

1.2. The Main Contributions of the Paper

2. Preliminary Remarks

3. Fast Algorithms for the Short-Length DCT-VI

3.1. The Three-Point DCT-VI Algorithm

3.2. Data-Flow Graph Construction

3.3. The 4-Point DCT-VI Algorithm

3.4. Algorithm for the 5-Point DCT-VI

3.5. Algorithm for 6-Point DCT-VI

3.6. Algorithm for Seven-Point DCT-VI

3.7. Algorithm for Eight-Point DCT-VI

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI