3.1. The Three-Point DCT-VI Algorithm
We represent the 3-point DCT-VI as follows:
where
,
, and
with
0.6325,
0.7236,
0.2764, and
0.4472.
Here and throughout the paper, when we state cosine values, for example, in the expression 0.6325, we use numerical approximations.
To factorize the matrix
, we first invert the signs of the elements in the second row and then decompose the resulting matrix into two components:
where
and
.
The expression (3) can be represented as
Within the first column and third row of the matrix
, identical entries differ only in sign. This property enables a reduction in the computational complexity of the 3-point DCT-VI, reducing the number of arithmetic operations without requiring further matrix transformations. In matrix
, the row and column consisting exclusively of zero entries are removed. By applying the template
from [
29], we obtain the following factorization of the resulting matrix
:
Let us denote
,
, and
. We also set
. Then, based on factorization (6), we obtain the data-flow graph for multiplying the inputs by the matrix
, as shown in
Figure 1.
The inputs of this graph are and , and the outputs are and , because the matrix is obtained from the 2nd and 3rd columns and 1st and 2nd rows of the matrix .
Next, we multiply the matrix
by the input vector
, yielding the resulting vector
. The corresponding data-flow graph is presented in
Figure 2.
By merging the data-flow graphs in
Figure 1 and
Figure 2, and inverting the sign of the second row, we obtain the data-flow graph for the 3-point DCT-VI algorithm, shown in
Figure 3.
However, a number of additions required for this algorithm may be redundant. To reduce the number of additions, we construct a factorization of the 3-point DCT-VI matrix corresponding to the data-flow graph in
Figure 3. This process is illustrated in
Figure 4. For each subgraph at a different hierarchical level of the data-flow graph in
Figure 3, we construct the corresponding adjacency matrix, for example, for subgraphs marked by green and blue rectangles.
The adjacency matrix of a graph in this paper is defined as an r × q matrix whose entries belong to {0, 1, −1}, where r and q denote the numbers of output and input vertices, respectively. The (i, j)-th entry of this matrix is equal to 1 if the j-th input vertex is connected to the i-th output vertex by an edge. An edge between the vertices of the graph may also be weighted by −1. A zero entry indicates that the corresponding edge is absent.
The matrix corresponds to the green rectangle, and the matrix corresponds to the blue rectangle.
As a result, the 3-point DCT-VI matrix is factorized as follows:
where
is defined as shown in the
Figure 4,
,
, and
.
Based on factorization (7), we identify redundant additions using the adjacency matrices of the subgraphs defined in each hierarchical level of the data-flow graph in
Figure 4. For example, in the matrix
the pair entries (1, 1) and (1, 3) as well as (2, 1) and (2, 3), which lie in the same columns, are repeated in the first and second rows up to a sign change. Therefore, the addition of the first and second inputs is repeated and hence redundant. One of these additions can be removed.
To achieve this, we first add the first and second inputs and then multiply the result by the matrix
. By replacing the matrices
and
, with the matrix
, we obtain the following factorization of the 3-point DCT-VI matrix:
Based on the DCT-VI matrix factorization in (8), we present the data-flow graph for the 3-point DCT-VI algorithm in
Figure 5. This data-flow graph does not include the redundant additions. By applying the proposed algorithm, the number of multiplications is reduced from 9 to 4, while the number of additions remains unchanged, and a single shift operation is introduced.
3.2. Data-Flow Graph Construction
In
Section 3, we present algorithms for short-length DCT-VI using data-flow graphs. To construct a data-flow graph, the factorization of the short-length DCT-VI matrix into sparse matrices is first obtained. This factorization consists of a diagonal matrix containing scaling factors and several sparse matrices whose elements belong to the set {0, 1, 22121}.
The data-flow graph of a short-length DCT-VI algorithm has a hierarchical structure. At the first level, located on the left side of the graph, vertices corresponding to the algorithm inputs , , are placed. Then, considering the factorization from right to left, each subsequent hierarchical level is constructed to correspond to the current sparse matrix in the DCT-VI matrix factorization. Each sparse matrix of the DCT-VI serves as the adjacency matrix for the corresponding hierarchical level of the data-flow graph.
An edge drawn with a solid line corresponds to the value 1 in the adjacency matrix, whereas an edge drawn with a dashed line corresponds to the value −1. For example, in
Figure 3, within the green rectangle, two edges extend from the first vertex, corresponding to the (4, 3) and (5, 3) entries of the matrix
. Since the (4, 3) entry equals 1 and the (5, 3) entry equals
1, the corresponding edges are marked with solid and dashed lines, respectively.
3.3. The 4-Point DCT-VI Algorithm
The four-point DCT-VI is defined as follows:
where
,
, and
with
0.5345,
0.6811,
0.4713,
0.1682, and
0.3780.
Let us consider the idea of developing the 4-point DCT-VI algorithm. First, we derive the factorization of the 4-point DCT-VI matrix; then, the data-flow graph is constructed, and the pseudocode is designed. To factorize the 4-point DCT-VI matrix, we decompose the original matrix into two matrices and factorize each one separately using the 3-point cyclic convolution pattern and a fan-like pattern that adds the same value to different outputs. Finally, the repeated additions are eliminated.
To implement this idea, we change the sign of the third column of the matrix
. The resulting matrix
is decomposed into two submatrices:
where
and
.
In the first column and third row of the matrix , several entries are identical except for their signs. This property allows a reduction in the number of arithmetic operations without requiring additional transforms for this matrix. In the matrix , we remove the row and column consisting entirely of zeros.
The resulting matrix conforms to the cyclic convolution pattern
with parameters
,
, and
. The factorization of this pattern is described in [
30] as follows:
where
,
,
, and
,
.
From expression (11) we derive the factorization of the four-point DCT-VI matrix as shown below:
where
Figure 6 illustrates the data-flow graph of the four-point DCT-VI algorithm. Compared to the direct matrix–vector product, this approach reduces the number of multiplications from 16 to 7, while increasing the number of additions from 12 to 13.
The obtained data-flow graph has a regular structure; that is, a graph organization in which the same connectivity patterns and operation types are systematically repeated. In the graph shown in
Figure 6, similar modules are presented on the left and right sides 8 of the scaling-factor line. However, some differences also exist between these modules.
3.4. Algorithm for the 5-Point DCT-VI
We express the five-point DCT-VI as a matrix–vector product:
where
The constants are defined as 0.4714, 0.6265, 0.5107, 0.3333, 0.1158, and 0.6667. It should be noted that .
The idea of developing the 5-point DCT-VI algorithm is the same as that used for the 4-point algorithm. However, the matrix with identical entries has a more complex structure. The final reduction in the number of additions is not provided because no repeated additions were found.
Next, we multiply the second column of the matrix
by
and split the resulting matrix
into two submatrices:
where
and
.
Let us multiply the input vector by the matrix
. We obtain the output vector
[
+
,
(
+
+
+
) +
,
+
,
+
,
+
(
+
+
)]. The data-flow subgraph for calculation the entries of this output vector is shown by the green rectangle in
Figure 7. By construction, the adjacency matrix at each hierarchical level of this subgraph is included in the factorization of the
. The final reducing of the number of additions is not performed.
Then, the matrix
is factorized taking into account the similar entries:
where
and
,
,
,
, and
.
Next, the matrix
is factorized. From
, we remove the row and column containing only zero entries. The resulting matrix
corresponds to the cyclic convolution template
, where
,
, and
. Using Equation (11), the matrix is expressed as
The first element of the diagonal matrix in expression (16) is approximately zero:
Therefore, Equation (16) can be reformulated as
where
and
.
Using expressions (14), (15), and (17), the factorization of the five-point DCT-VI matrix is derived:
where
,
,
,
In
Figure 7, we present the data-flow graph of the five-point DCT-VI algorithm, which reduces the number of multiplications from 25 to 8 compared to the direct matrix–vector multiplication. The number of additions is also reduced from 20 to 16 (see
Figure 7).
In this data-flow graph, two modules are explicitly identified. The module in the upper part of the graph (green rectangle) corresponds to the factorization of the matrix with repeated entries. The resulting subgraph has a highly irregular structure due to the sparsity of the matrix . The module in the lower part of the graph (blue rectangle) is related to the cyclic convolution factorization of the matrix and is also irregular.
3.5. Algorithm for 6-Point DCT-VI
Let us derive an algorithm for the six-point DCT-VI by expressing this transform as
where
,
, and
is the DCT-VI transform matrix defined as
with
0.4264,
0.5786,
0.5073,
0.3949,
0.2505,
0.0858, and
0.3015.
The idea of developing the 6-point DCT-VI algorithm is the same as that used for the 4-point DCT-VI algorithm. However, in this case, we first apply a permutation of the matrix columns and rows so that both the 5-point cyclic convolution pattern and the fan-like pattern, where the same value is added to multiple outputs, can be utilized. Finally, repeated additions are eliminated by replacing the fan-like structure with an adder tree. This technique is discussed in detail in this subsection.
First, we introduce the permutations
The order of the columns and rows of the matrix
is changed using
and
, respectively. In addition, the signs of the second and third columns of the permuted matrix are inverted. The corresponding transformation matrices are
After applying these operations, we obtain the matrix
which is decomposed into the sum of two submatrices:
where
and .
Next, we remove the zero rows and columns from the matrix
. The resulting matrix is given by
which is the circular convolution matrix for
N = 5:
Here , , , , and .
For this matrix, the following factorization is obtained based on [
30]:
where
,
,
,
,
,
,
,
=
, and
=
.
To factorize the initial 6-point DCT-VI matrix, we add to the factorization (22) matrices that take into account the entries of the matrix
:
Then, the factorization (22) is transformed into the following factorization of the initial 6-point DCT-VI matrix:
where
The data-flow graph corresponding to the factorization (23) is shown in
Figure 8. In this data-flow graph, the additions in the fan-like structures marked by the green and blue rectangles are repeated and, therefore, redundant. We remove the left fan-like structure in the green rectangle and the right fan-like structure in the blue rectangle. After this, the final factorization is obtained.
Based on factorization (22), the matrices
,
, and
are introduced. Subsequently, the factorization of the six-point DCT-VI matrix can be expressed as follows:
where
Figure 9 illustrates the six-point DCT-VI algorithm. Compared to the direct matrix–vector implementation, the number of multiplications is reduced from 36 to 13, while the number of additions increases from 30 to 33.
The data-flow graph presented in
Figure 9 includes two permutation modules that rearrange the order of data samples without changing their values. These modules are placed near the input and output vertices of the graph. Next, the repeated computational modules are presented. In particular, the subgraphs consisting of five butterfly modules are located at levels neighboring the permutation modules. In addition, repeated computational modules appear on both the left and right sides of the scaling-factor line. With a few exceptions, the graph can be divided into stages with symmetrical topology relative to the scaling-factor line.
3.6. Algorithm for Seven-Point DCT-VI
Let us construct the algorithm for the 7-point DCT-VI. This transform can be represented as follows:
where
,
, and
with
0.3922,
0.5386,
0.4912,
0.4152,
0.3151,
0.1967,
0.0669,
.
The development of the 7-point DCT-VI algorithm follows the same approach as that used for the 6-point DCT-VI algorithm. Specifically, we define the permutations,
and apply
to reorder the rows of the matrix
and
to reorder its columns. In addition, the signs of the third, fourth, and seventh columns are changed. The corresponding permutation matrices are as follows:
Then, the resulting matrix
is decomposed into the sum of two matrices:
where
Next, we remove the rows and columns containing only zero entries from
to obtain the matrix
The matrix
is a circular convolution matrix, which can be presented as [
30]
where
,
,
,
,
, and
.
Using the factorization of the matrix
from the [
30], the following expression is obtained:
where
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
.
To reduce the number of arithmetic operations for the 7-point DCT-VI, we also consider the entries in the first column and the last row of the matrix
, which differ only in a sign. Exploiting this property leads to the following factorization of the matrix
for the 7-point DCT-VI:
where
,
,
,
,
,
,
,
,
,
,
.
Consequently, a fast algorithm for the 7-point DCT-VI has been developed, as illustrated in
Figure 10 with a data-flow graph. This algorithm reduces the number of multiplications from 49 to 11, while the number of additions is slightly reduced from 42 to 36.
The data-flow graph shown in
Figure 10 includes permutation modules at the input and output that perform index reordering of data samples, enabling regular computational structures in fast transform implementations without introducing additional arithmetic operations. As a result, the graph also contains repeated modules on both the left and right sides of the scaling-factor line, in particular butterfly modules near the input and output permutations. To add the same value to all outputs, a fan-like structure is included before the output permutation.
3.7. Algorithm for Eight-Point DCT-VI
To design the algorithm for the eight-point DCT-VI, the transform can be expressed as follows:
where
,
, and
with
0.3651,
0.5051,
0.4718,
0.4178,
0.3455,
0.2582,
0.1596,
0.0540,
0.5164.
To design the 8-point DCT-VI algorithm, we first permute the rows and columns of the initial DCT-VI matrix and invert the signs of certain rows or columns. As a result, we obtain a matrix in which submatrices correspond to patterns identified in [
29,
30]. These patterns are then extracted, and their factorizations, as presented in [
29,
30], are applied. Finally, the factorizations and data-flow graphs of the individual submatrices are merged to form the factorization and data-flow graph of the original 8-point DCT-VI matrix.
To implement this approach, we reorder the columns and rows of the matrix
using the permutations
.
Then, the signs of the sixth and seventh columns of
are inverted. The corresponding permutation matrices
and
are expressed as follows:
Next, the resulting matrix
is expressed as the sum of two matrices:
where
The submatrix
of the matrix
is the a circular convolution matrix [
30] for
N = 4 which can be represented as
with entries
,
,
, and
.
Using the entries of
, we define the vector
of scaling factors to factorize the circular convolution matrix
:
where
and
.
Then, the matrix
is factorized as follows:
where
with
The matrices
and
are constructed as
and
, respectively, and
is
Further, the submatrix
of the matrix
is factorized by decomposing its 2 × 2 submatrices. It can be observed that the submatrix
exhibits structural similarity to the template
. The submatrix
is similar to the template
. Then, the submatrices
and
are decomposed as
where
. Using expressions (32) and (33), the factorization of the matrix
is obtained:
where
As a result, the matrix of the eight-point DCT-VI is factorized as
where
Figure 11 shows the data-flow graph of the 8-point DCT-VI algorithm based on factorization (35), reducing the number of additions from 56 to 38 and multiplications from 64 to 16.
The data-flow graph shown in
Figure 11 includes permutation modules at the input and output that perform index reordering of data samples, enabling regular computational structures in fast transform implementations without introducing additional arithmetic operations. As a result, the graph contains butterfly modules on both the left and right sides of the scaling-factor line. The repeated entries allow efficient computation using adder trees and a circular convolution structure module, similar to the 4- to 7-point DCT-VI algorithm cases.