Previous Article in Journal
Speech Emotion Recognition: Comparative Analysis of CNN-LSTM and Attention-Enhanced CNN-LSTM Models
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

The Fast Discrete Tchebichef Transform Algorithms for Short-Length Input Sequences

by
Aleksandr Cariow
1,* and
Marina Polyakova
2,*
1
Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, Zołnierska 49, 71-210 Szczecin, Poland
2
Institute of Computer Systems, Odesa Polytechnic National University, Shevchenko 1, 65044 Odesa, Ukraine
*
Authors to whom correspondence should be addressed.
Signals 2025, 6(2), 23; https://doi.org/10.3390/signals6020023
Submission received: 12 March 2025 / Revised: 8 April 2025 / Accepted: 29 April 2025 / Published: 9 May 2025

Abstract

:
In this article, the fast algorithms for the discrete Tchebichef transform (DTT) are proposed for input sequences of lengths in the range from 3 to 8. At present, DTT is widely applied in signal processing, image compression, and video coding. The review of the articles related to fast DTT algorithms has shown that such algorithms are mainly developed for input signal lengths 4 and 8. However, several problems exist for which signal and image processing with different apertures is required. To avoid this shortcoming, the structural approach and a sparse matrix factorization are applied in this paper to develop fast real DTT algorithms for short-length input signals. According to the structural approach, the rows and columns of the transform matrix are rearranged, possibly by changing the signs of some rows or columns. Next, the matched submatrix templates are extracted from the matrix structure and decomposed into a matrix product to construct the factorization of an initial matrix. A sparse matrix factorization assumes that the butterfly architecture can be extracted from the transform matrix. Combining the structural approach with a sparse matrix factorization, we obtained the matrix representation with reduced computational complexity. Based on the obtained matrix representation, the fast algorithms were developed for the real DTT via the data flow graphs. The fast algorithms for integer DTT can be easily obtained using the constructed data flow graphs. To confirm the correctness of the designed algorithms, the MATLAB R2023b software was applied. The constructed factorizations of the real DTT matrices reduce the number of multiplication operations by 78% on average compared to the direct matrix-vector product at signal lengths in the range from 3 to 8. The number of additions decreased by 5% on average within the same signal length range.

1. Introduction

In recent years, discrete Tchebichef transform (DTT) became widely applied in image compression [1,2,3,4,5,6,7,8], video coding [8,9,10], signal processing [11], image segmentation [12], image denoising [13], speech recognition [14,15], etc. DTT is a linear orthonormal transform obtained from orthogonal Tchebichef polynomials. Compared with discrete cosine transform (DCT), DTT demonstrates similar energy compactness and decorrelation for images of nature scenes. Furthermore, DTT outperforms DCT in images with significant illumination variations. Owing to its energy compactness and decorrelation, DTT is extensively used and highly effective in image compression [1,2,3,4,5,6,7,8,16,17]. For example, in [10], the proposed DTT approximation was embedded in the software library x264 [18] for encoding video streams into the H.264/AVC standard [19]. It requires 38% fewer additions and eliminates bit-shifting operations compared to the 8-point integer approximation of the DCT currently used in H.264/AVC [20,21]. The latter requires 32 additions and 14 bit-shifting operations.
The signal processing in the DTT domain is applied in neural signal compression when the neural recording microsystems are designed with specific requirements for the power consumption, chip area, and the wireless interfacing data rate [22].
In [23], DTT was embedded in the wavelet-based still image coder SPECK. The resulting coder HLDTT has shown a significant improvement in peak signal-to-noise ratio at lower bit rates. Also, a reduction in encoding and decoding time over DCT-based embedded coders was achieved to realize a low-power image compression system for hand-held portable devices [7].
Other areas of DTT application are also of interest. In [24], a digital watermarking algorithm using DTT was proposed as more robust to Gaussian noise attacks. To add a watermark, the initial image is split into nonoverlapped blocks, and DTT is performed for each of them. A watermark image was embedded into the largest DTT coefficients of each block, resulting in time savings compared to DCT [24].
In various applications, reducing the computational time for DTT is essential for achieving real-time or high-performance implementations. The literature describes two primary types of DTT based on their applications: real DTT and integer DTT. The original real DTT matrix can be derived by multiplying each row of the integer DTT matrix by a specific coefficient. Consequently, fast algorithms for real DTT can be efficiently developed based on the fast algorithms designed for integer DTT [17].
Because the main application of DTT is image compression and video coding, the developing of fast DTT algorithms is oriented towards the standards of these applications. As a consequence, fast algorithms were constructed for the 4 × 4 and 8 × 8 two-dimensional (2D) integer DTT and its approximations [1,8,10,25,26,27,28,29,30,31,32,33]. The authors are not aware of any articles devoted to designing fast DTT algorithms for input sequences of lengths different from 4 and 8. However, along with image compression, DTT is utilized for various tasks such as signal and image denoising, segmentation, and classification, where the result is often affected by the size of the processing window. In these situations, restricting the window size to values of 4 and 8 is not relevant. Moreover, short-length DTT algorithms can be included as typical modules to develop long-length algorithms [34,35].
Therefore, an urgent problem is constructing fast DTT algorithms for short-length input sequences. Next, to determine the appropriate strategy for developing such fast DTT algorithms, the articles related to this problem are considered.

1.1. State of the Art of the Problem

To design fast DTT algorithms, the polynomial algebraic approach, sparse matrix factorization, and different heuristics are applied [22,25,26,27,28,29,30,31,32,33]. Recursive expressions for obtaining the entries for DTT matrices do not facilitate the development of fast radix-type algorithms, although such algorithms are efficiently and widely employed for orthogonal trigonometric transforms [36]. Thus, in [30], the polynomial algebraic approach is used to construct the fast algorithm for the 2D 4 × 4 real DTT. The expressions for transform matrix entries were regrouped to determine the products of Tchebichef polynomials via the linear combinations of new basis functions. These results reduce multiplications by 50% and additions by 31% compared to the naive product bearing in mind the separability and symmetry of the DTT matrix. However, very cumbersome mathematical calculations of the algebraic approach are an obstacle to obtaining fast algorithms for DTTs with other input sequence lengths.
In [1], multiplier-less algorithms were proposed for one-dimensional (1D) and 2D integer 8 × 8 DTTs. These algorithms were obtained heuristically, although they used butterfly architecture. The sparse matrix factorization or data flow graphs are not presented for them. Despite this, these algorithms are considered to be among the most effective. In the 1D case, the number of additions was reduced by 50% compared to a naive method when considering the separability and symmetry properties of the DTT. In addition, 29 shifts were required in the 1D case. However, the above heuristics were proposed to construct an algorithm specifically for the case of an 8 × 8 integer DTT. Generalization of it to cases of other lengths of data sequence was not considered.
The primary way to obtain fast DTT algorithms is sparse matrix factorization and/or construction of data flow graphs with butterfly architecture [28,29,30,31,32,33]. For example, in [28,29], sparse matrix factorization was proposed for a 4 × 4 real 2D DTT. Although an additional 20 shifts were required, it reduced multiplications by 81% and additions by 17% compared with the naive product bearing in mind the separability and symmetry of the DTT matrix. Proposed in [10], a fast algorithm with butterfly architecture for a 1D DTT approximation required only 20 additions, reducing operations by 55% compared to [1]. In [8], fast algorithms for 4 × 4 and 8 × 8 integer 1D DTTs were presented for data flow graphs based on butterfly architecture. The algorithms for 4 × 4/8 × 8 2D DTTs were also implemented using Verilog HDL [8].
Hence, efficient fast DTT algorithms are developed via sparse matrix factorization or data flow graphs with butterfly architecture. In general, such an approach is appropriate for designing fast DTT algorithms for even lengths of input sequences, including lengths different from 4 and 8. However, when dealing with odd lengths, problems arise in constructing butterflies, and thus the choice of an approach for constructing fast DTT algorithms for odd lengths remains open. Thus, the above short review of existing fast DTT algorithms allowed us to determine unsolved parts of the general problem of developing fast DTT algorithms for short-length input sequences.

1.2. The Main Contributions of This Paper

As previously mentioned, fast N-point DTT algorithms for short-length signals can be developed via data flow graphs with a butterfly architecture and sparse matrix factorization. However, for some values of N, constructing such architecture can be challenging. To address this shortcoming, we propose using the structural approach in addition to sparse matrix factorization to develop fast real DTT algorithms for short-length input signals. This approach was successfully applied to factorize the matrices of trigonometric transforms in [37,38,39,40]. As a result, the number of arithmetic operations for these transforms was significantly reduced. According to the structural approach, the rows and columns of the transform matrix are rearranged, possibly changing the sign of some rows or columns. Then, the predefined templates are searched and extracted from the transform matrix. Based on the predetermined factorization of templates, the factorization of the entire transform matrix is obtained.
The structural approach to DTT matrix decomposition is due to the symmetrical structure of the DTT coefficients matrix. However, as N increases, template extraction becomes a cumbersome process. That is why it is relevant to integrate the structural approach into the construction of data flow graphs with butterfly architecture. The efficiency of this trick owes to the symmetry of DTT coefficients, which can be paired for constructing butterfly modules. Thus, this research aims to reduce the computational complexity of a 1D real DTT by developing fast algorithms. These algorithms combine sparse matrix factorization with the structural approach for input sequences of length N= 3, 4, 5, 6, 7, 8.
The paper is organized as follows. The problem of reducing the computational complexity of DTT and the aim of the research are introduced in Section 1. Notations and a mathematical background are presented in Section 2. Next, in Section 3, fast algorithms for a real DTT are designed for N in the range from 3 to 8. A discussion of the research results is presented in Section 4 and Section 5. Also, the computational costs required to implement the developed and existing algorithms are compared. In Section 6 the conclusions are provided.

2. Short Background

Real 1D DTT can be expressed as follows [1,17,22,23,24,25,26,27,28,29,30,31,32,33]:
y k = n = 0 N 1 x n   t k ( n ) ,   k = 0 , 1 , ,   N 1 ,
where y k is the output sequence after direct DTT; t k ( n ) is a kernel of DTT; x n is the input data sequence; and N is the number of signal samples. The kernel of DTT can be obtained as
t k ( n ) = a 1 ( n + a 2 ) t k 1 ( n ) + a 3 t k 2 ( n ) ,   k = 2 , ,   N 1 ,   n = 0 , 1 , ,   N 1 , t 0 ( n ) = 1 N ,   t 1 ( n ) = ( 2 n + 1 N ) 3 N ( N 2 1 ) ,   a 1 = 2 k 4 k 2 1 N 2 k 2 ,   a 2 = 1 N k 4 k 2 1 N 2 k 2 ,   a 3 = 1 k k × 2 k + 1 2 k 3 N 2 ( k 1 ) 2 N 2 k 2 .
DTT satisfies the separability and even symmetry properties. Specifically, t k ( N 1 n ) = ( 1 ) k t k ( n ) , and 2D DTT can be computed via 1D DTT by applying a row-column decomposition. In matrix notation, real 1D DTT is defined as follows [1,17,23,24,32]:
Y N × 1 = T N X N × 1 ,
where T N = t 0 ( 0 ) t 0 ( 1 ) t 0 ( N 1 ) t 1 ( 0 ) t 1 ( 1 ) t 1 ( N 1 ) t N 1 ( 0 ) t N 1 ( 1 ) t N 1 ( N 1 ) , Y N × 1 = y 0 y 1 y N 1 , X N × 1 = x 0 x 1 x N 1 .
The factorization of the matrix T N on integer and float numbers can be obtained as:
T N = diag ( s N × 1 )   C N ,   C N = ( diag ( s N × 1 ) ) 1 T N ,
where C N is the integer 1D DTT matrix, s N × 1 = s 0 ( N ) ,   s 1 ( N ) , ,   s N 1 ( N ) T ,
s k ( N ) = p k d ( k ,   N ) ,   k = 0 , 1 , ,   N 1 .
The numbers p k and d ( k ,   N ) are defined as [17]:
p k = 1 ,   i f   k = 0 , 1 ; k ! k + 1 ,   i f   k   i s   e v e n ; k ! N k ,   i f   k   i s   o d d ;   d ( k , N ) = N ( N 2 1 ) ( N 2 2 2 ) ( N 2 k 2 ) ( 2 k + 1 ) .
In this paper, we use the following notations: I N is an order N identity matrix; H 2 is a 2 × 2 Hadamard matrix; 1 N × M is an N×M matrix of ones (a matrix where every element is equal to one); ⊗ is the Kronecker product of two matrices; ⊕ is the direct sum of two matrices. An empty cell in a matrix means it contains zero. The multipliers were marked as s k ( N ) .

3. Reduced Complexity DTT Algorithms for Short-Length Input Sequences

3.1. Algorithm for 3-Point DTT

To design the algorithm for three-point real DTT, we express the latter as a matrix-vector product:
Y 3 × 1 = diag ( s 3 × 1 ) C 3 X 3 × 1 ,
where Y 3 × 1 = y 0 ,   y 1 ,   y 2 T , X 3 × 1 = x 0 ,   x 1 , x 2 T , C 3 = 1 1 1 1 0 1 1 2 1 with s 3 × 1 = s 0 ( 3 ) ,   s 1 ( 3 ) , s 2 ( 3 ) T . The vector s 3 × 1 is computed by expression (5), where s 1 ( 3 ) = 2 p 1 d ( 1 ,   N ) , s 2 ( 3 ) = p 2 N d ( 2 ,   N ) . As a result, we yield s 3 × 1 = 1 3 ,   1 2 ,   1 6 T or s 3 × 1 0.5774 , 0.7071 , 0.4082 T .
We developed the 3-point real DTT algorithm via construction of the data flow graph that is shown in Figure 1. If we apply the three-point DTT algorithm, we can reduce the number of multiplications from 8 to 3, while the number of additions remains unchanged.

3.2. Algorithm for 4-Point DTT

Let us construct the algorithm for 4-point real-valued DTT, which is expressed as
Y 4 × 1 = diag ( s 4 × 1 )   C 4 X 4 × 1 ,
where Y 4 × 1 = y 0 ,   y 1 ,   y 2 ,   y 3 T , X 4 × 1 = x 0 ,   x 1 ,   x 2 ,   x 3 T , C 4 = 1 1 1 1 3 1 1 3 1 1 1 1 1 3 3 1 . Vector s 4 × 1 = s 0 ( 4 ) ,   s 1 ( 4 ) , s 2 ( 4 ) , s 3 ( 4 ) T was computed using expression (5) as s 4 × 1 = 1 2 ,   1 20 ,   1 2 , 1 20 T or s 4 × 1 0.5000 , 0.2236 , 0.5000 , 0.2236 T .
We use the structural approach [37,38,39,40] to design the algorithm for 4-point real DTT. Let us define the permutations π 1 = 1 2 3 4 1 3 2 4 and π 2 = 1 2 3 4 1 2 4 3 for altering the order of the rows and columns of matrix C 4 , respectively. The obtained matrix C 4 ( a ) matches the matrix pattern A 2 A 2 B 2 B 2 where A 2 = H 2 , B 2 = 3 1 1 3 . Therefore, C 4 ( a ) = ( A 2 B 2 ) ( H 2 I 2 ) . The matrix B 2 can be decomposed as [37,38]:
B 2 = T 2 × 3 ( 4 ) diag ( 2 , 4 , 1 ) T 3 × 2 ( 3 ) ,
where T 3 × 2 ( 3 ) = 1 0 0 1 1 1 ,   T 2 × 3 ( 4 ) = 1 0 1 0 1 1 .
We give the matrix factorization for the algorithm of 4-point DTT as follows:
Y 4 × 1 = diag ( s 4 × 1 ) P 4 ( 1 ) W 4 × 5 D 5 W 5 × 4 W 4 P 4 ( 0 ) X 4 × 1 ,
where
D 5 = diag 1 , 1 , 2 , 4 , 1 ;   W 4 × 5 = I 2 T 2 × 3 ( 4 ) , W 5 × 4 = H 2 T 3 × 2 ( 3 ) , W 4 = H 2 I 2 ,
P 4 ( 1 ) = 1 1 1 1 ,   P 4 ( 0 ) = 1 1 1 1 .
Figure 2 shows a data flow graph of the proposed four-point DTT algorithm. Note that for N = 4, the initial 4-point real DTT requires 8 real-valued multiplications, 8 shifts, and 12 additions. If we use the proposed four-point DTT algorithm, then the number of multiplications, additions, and shifts may be reduced from 8 to 2, from 12 to 9, and from 8 to 5, respectively.

3.3. Algorithm for 5-Point DTT

To develop the algorithm for 5-point 1D DTT, we express this transform as follows:
Y 5 × 1 = diag ( s 5 × 1 ) C 5 X 5 × 1 ,
where C 5 = 1 1 1 1 1 2 1 0 1 2 2 1 2 1 2 1 2 0 2 1 1 4 6 4 1 , Y 5 × 1 = y 0 y 1 y 2 y 3 y 4 , X 5 × 1 = x 0 x 1 x 2 x 3 x 4 .
The vector s 5 × 1 = s 0 ( 5 ) ,   s 1 ( 5 ) , s 2 ( 5 ) , s 3 ( 5 ) , s 4 ( 5 ) T is computed by expression (5) where s 1 ( 5 ) = 2 p 1 d ( 1 ,   N ) , s 3 ( 5 ) = 2 p 3 d ( 3 ,   N ) , s 4 ( 5 ) = p 4 N d ( 4 ,   N ) . As a result, we yield s 5 × 1 = 1 5 ,   1 10 ,   1 14 ,   1 10 ,   1 70 T or s 5 × 1 = 0.4472 , 0.3162 , 0.2673 , 0.3162 , 0.1195 T .
Denoting a 5 = 1 ;   b 5 = 2 ;   c 5 = 4 ;   d 5 = 6 , we obtain the matrix
C 5 ( a ) = a 5 a 5 a 5 a 5 a 5 b 5 a 5 0 a 5 b 5 b 5 a 5 b 5 a 5 b 5 a 5 b 5 0 b 5 a 5 a 5 c 5 d 5 c 5 a 5 .
We again apply the structural approach [37,38,39,40] to develop the algorithm for 5-point 1D real DTT. The permutations π 3 and π 4 are defined to alter the order of columns and rows of matrix C 5 , respectively, as
π 3 = 1 2 1 2 3 4 5 3 5 4   and   π 4 = 1 2 3 5 3 4 5 1 4 2 .
After permutation, the obtained matrix C 5 ( b ) = b 5 a 5 b 5 b 5 a 5 a 5 c 5 d 5 a 5 c 5 a 5 a 5 a 5 a 5 a 5 a 5 b 5 0 a 5 b 5 b 5 a 5 0 b 5 a 5 is decomposed into two components [37,39]:
C 5 ( b ) = C 5 ( c ) + C 5 ( d ) ,
where C 5 ( c ) = b 5 d 5 a 5 a 5 a 5 a 5 a 5 0 0 and C 5 ( d ) = b 5 a 5 b 5 a 5 a 5 c 5 a 5 c 5 a 5 b 5 a 5 b 5 b 5 a 5 b 5 a 5 .
The structure of matrix C 5 ( c ) allows for reducing the number of operations without further transforms because the elements in the third row are identical except for the sign. After eliminating the rows and columns having only zero elements in matrix C 5 ( d ) , we yield matrix C 4 ( d ) :
C 4 ( d ) = b 5 a 5 b 5 a 5 a 5 c 5 a 5 c 5 a 5 b 5 a 5 b 5 b 5 a 5 b 5 a 5 .
Matrix C 4 ( d ) matches the template A 2 ( 0 ) A 2 ( 0 ) B 2 ( 0 ) B 2 ( 0 ) , where A 2 ( 0 ) = b 5 a 5 a 5 c 5 and B 2 ( 0 ) = a 5 b 5 b 5 a 5 . Hence, matrix C 4 ( d ) can be factorized as [37,38]:
C 4 ( d ) = ( A 2 ( 0 ) B 2 ( 0 ) ) ( H 2 I 2 ) .
Let us swap the rows of matrix A 2 ( 0 ) and alter the sign of the second row of matrix B 2 ( 0 ) . Then, the obtained matrices A 2 ( 1 ) = a 5 c 5 b 5 a 5 and B 2 ( 1 ) = a 5 b 5 b 5 a 5 match the templates a b c a and e d d e , where a = a 5 ;   b = c 5 ;   c = b 5 ;   d = b 5 ;   e = a 5 . Then [37,40],
C 4 ( d ) = P 4 ( T 2 × 3 ( 3 ) T 2 × 3 ( 4 ) )   diag ( b 5 a 5 ,   c 5 + a 5 ,   a 5 , a 5 b 5 ,   a 5 b 5 ,   b 5 ) ( T 3 × 2 ( 4 ) T 3 × 2 ( 3 ) ) ( H 2 I 2 ) ,
where P 4 = 1 1 1 1 , T 2 × 3 ( 3 ) = 0 1 1 1 0 1 , T 3 × 2 ( 4 ) = 1 0 0 1 1 1 . Matrices T 2 × 3 ( 4 ) and T 3 × 2 ( 3 ) are defined as in expression (9).
Based on Equations (16) and (19), we derive the factorization of the 5-point DTT matrix:
Y 5 × 1 = diag ( s 5 × 1 ) P 5 ( 1 ) W 5 × 6 W 6 ( 1 ) W 6 × 8 D 8 W 8 × 6 W 6 ( 0 ) W 6 × 5 P 5 ( 0 ) X 5 × 1 ,
where
W 6 × 8 = T 2 × 3 ( 3 ) I 2 T 2 × 3 ( 4 ) ,   W 8 × 6 = T 3 × 2 ( 4 ) I 2 T 3 × 2 ( 3 ) ,   D 8 = diag ( 1 , 3 , 1 , 2 , 1 , 3 , 1 , 2 ) ,
P 5 ( 0 ) = 1 1 1 1 1 ,   P 5 ( 1 ) = 1 1 1 1 1 ,
W 6 ( 0 ) = 1 1 1 1 1 1 1 1 1 1 ,   W 6 ( 1 ) = 1 1 1 1 1 1 ,
W 6 × 5 = 1 1 1 1 1 1 1 1 1 ,   W 5 × 6 = 1 1 1 3 1 / 2 1 1 1 .
As was mentioned in Section 2, an empty cell in matrices means it contains zero.
A data flow graph of the developed 5-point DTT algorithm is presented in Figure 3. Note that the original 5-point real DTT requires 23 real-valued multiplications and 18 additions, as two entries in the transform matrix are equal to zero. Using the proposed 5-point DTT algorithm, the required number of multiplications can be reduced from 23 to 5. The number of additions is increased from 18 to 19, and 6 shifts are required.

3.4. Algorithm for 6-Point DTT

Let us obtain the algorithm for 6-point 1D DTT, which is expressed as follows:
Y 6 × 1 = diag ( s 6 × 1 ) C 6 X 6 × 1 ,
where C 6 = 1 1 1 1 1 1 5 3 1 1 3 5 5 1 4 4 1 5 5 7 4 4 7 5 1 3 2 2 3 1 1 5 10 10 5 1 , Y 6 × 1 = y 0 y 1 y 2 y 3 y 4 y 5 , X 6 × 1 = x 0 x 1 x 2 x 3 x 4 x 5 .
The vector s 6 × 1 = s 0 ( 6 ) ,   s 1 ( 6 ) , s 2 ( 6 ) , s 3 ( 6 ) , s 4 ( 6 ) , s 5 ( 6 ) T is computed by expression (5), where s 2 ( 6 ) = 2 p 2 3 d ( 2 ,   N ) , s 3 ( 6 ) = 2 p 3 3 d ( 3 ,   N ) . As a result, we yield s 6 × 1 = 1 6 ,   1 70 ,   1 2 21 ,   1 2 45 ,   1 28 ,   1 252 T or s 6 × 1 0.4082 , 0.1195 , 0.1091 , 0.0745 , 0.1890 , 0.0630 T .
Let us denote a 6 = 1 ;   b 6 = 2 ;   c 6 = 3 ;   d 6 = 4 ;   e 6 = 5 ; f 6 = 7 ;   g 6 = 10 ; then, matrix C 6 is represented as
C 6 ( a ) = a 6 a 6 a 6 a 6 a 6 a 6 e 6 c 6 a 6 a 6 c 6 e 6 e 6 a 6 d 6 d 6 a 6 e 6 e 6 f 6 d 6 d 6 f 6 e 6 a 6 c 6 b 6 b 6 c 6 a 6 a 6 e 6 g 6 g 6 e 6 a 6 .
The columns and rows of C 6 are permutated according to π 5 and π 6 , which are defined in the following form: π 5 = 1 2 1 6 3 4 5 6 3 4 2 5 and π 6 = 1 2 2 4 3 4 5 6 1 6 3 5 . The permutation matrices are as follows:
P 6 ( 1 ) = 1 1 1 1 1 1 ,   P 6 ( 0 ) = 1 1 1 1 1 1 .
The resulting matrix
C 6 ( b ) = e 6 e 6 a 6 a 6 c 6 c 6 e 6 e 6 d 6 d 6 f 6 f 6 a 6 a 6 a 6 a 6 a 6 a 6 a 6 a 6 g 6 g 6 e 6 e 6 e 6 e 6 d 6 d 6 a 6 a 6 a 6 a 6 b 6 b 6 c 6 c 6
contains repeated elements that enable butterfly module extraction and construction of the data flow graph for 6-point 1D DTT calculation. A derived data flow graph of the proposed 6-point 1D DTT algorithm is shown in Figure 4.
Further, we compose the factorization of the matrix of 6-point 1D DTT from the matrices for calculations at each layer of the graph. Thus, at the first layer, we obtain the matrix W 6 = W 2 W 2 W 2 , where W 2 = 1 1 1 1 .
The second layer of calculations is described by matrix W 16 × 6 which can be expressed via matrix ( W 16 × 6 ) T as follows:
( W 16 × 6 ) T = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .
In this matrix and the matrices below, an empty cell indicates that it contains zero. We have diagonal matrix D 16 = diag ( 1 , 1 , 4 , 1 , 4 , 2 , 4 , 4 , 1 , 2 , 2 , 4 , 2 , 1 , 1 , 1 ) at the third layer of calculation and matrices W 14 × 16 and W 6 × 14 at the fourth and fifth layers of calculation:
W 14 × 16 = 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 ,
W 6 × 14 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .
Next, we take into consideration the permutation matrices P 6 ( 1 ) and P 6 ( 0 ) and matrices W 6 , W 16 × 6 , W 14 × 16 , W 6 × 14 at different computing layers of the data flow graph from Figure 4 to obtain factorization of the 6-point 1D DTT matrix:
Y 6 × 1 = diag ( s 6 × 1 ) P 6 ( 1 ) W 6 × 14 W 14 × 16 D 16 W 16 × 6 W 6 P 6 ( 0 ) X 6 × 1 .
Let us consider the data flow graph of the developed 6-point 1D DTT algorithm (Figure 4). It should be noted that the initial 6-point real DTT requires 36 real-valued multiplications and 30 additions. If we apply the developed 6-point 1D DTT algorithm, the number of real-valued multiplications is reduced to 6. The number of additions is decreased from 30 to 27, and 15 shifts are required.

3.5. Algorithm for 7-Point DTT

To design the algorithm for the 7-point 1D DTT we represent this transform as follows:
Y 7 × 1 = diag ( s 7 × 1 )   C 7 X 7 × 1 ,
where C 7 = 1 1 1 1 1 1 1 3 2 1 0 1 2 3 5 0 3 4 3 0 5 1 1 1 0 1 1 1 3 7 1 6 1 7 3 1 4 5 0 5 4 1 1 6 15 20 15 6 1 , Y 7 × 1 = y 0 y 1 y 2 y 3 y 4 y 5 y 6 , X 7 × 1 = x 0 x 1 x 2 x 3 x 4 x 5 x 6 .
Vector s 7 × 1 = s 0 ( 7 ) ,   s 1 ( 7 ) , s 2 ( 7 ) , s 3 ( 7 ) , s 4 ( 7 ) , s 5 ( 7 ) , s 6 ( 7 ) T is computed by expression (5), where s 1 ( 7 ) = 2 p 1 d ( 1 ,   N ) , s 3 ( 7 ) = 5 p 3 d ( 3 ,   N ) , s 5 ( 7 ) = 3 p 5 d ( 5 ,   N ) , s 6 ( 7 ) = p 6 N d ( 6 ,   N ) . As a result, we yield s 7 × 1 = [ 1 7 , 1 28 , 1 2 21 , 1 6 , 1 154 , 1 2 21 , 1 924 ] T or s 7 × 1 ≈ [0.3780, 0.1890, 0.1091, 0.4082, 0.0806, 0.1091, 0.0329 ] T .
If we denote a 7 = 1 ;   b 7 = 2 ;   c 7 = 3 ;   d 7 = 4 ;   e 7 = 5 ;   f 7 = 6 ;   g 7 = 7 ;   h 7 = 15 ;   k 7 = 20 ; matrix C 7 can be represented as
C 7 = a 7 a 7 a 7 a 7 a 7 a 7 a 7 c 7 b 7 a 7 0 a 7 b 7 c 7 e 7 0 c 7 d 7 c 7 0 e 7 a 7 a 7 a 7 0 a 7 a 7 a 7 c 7 g 7 a 7 f 7 a 7 g 7 c 7 a 7 d 7 e 7 0 e 7 d 7 a 7 a 7 f 7 h 7 k 7 h 7 f 7 a 7 .
We developed the 7-point real DTT algorithm using a sparse matrix factorization [8,28,29,41] and the structural approach [37,40]. The structural approach was used to decompose the initial DTT matrix into submatrices. Sparse matrix factorization allows for the design of the data flow graph of the algorithm. To change the order of columns and rows, we define permutations π 7 and π 8 as follows:
π 7 = 1 2 1 7 3 4 5 6 7 2 4 6 3 5 ,   π 8 = 1 2 2 4 3 4 5 6 7 6 1 3 5 7 .
The permutation matrices are expressed as:
P 7 ( 1 ) = 1 1 1 1 1 1 1 ; P 7 ( 0 ) = 1 1 1 1 1 1 1 .
After permutation, matrix C 7 acquires the following structure:
C 7 ( a ) = c 7 c 7 b 7 0 b 7 a 7 a 7 a 7 a 7 a 7 0 a 7 a 7 a 7 a 7 a 7 d 7 0 d 7 e 7 e 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 e 7 e 7 0 d 7 0 c 7 c 7 c 7 c 7 g 7 f 7 g 7 a 7 a 7 a 7 a 7 f 7 k 7 f 7 h 7 h 7 .
Next, matrix C 7 ( a ) is decomposed into two components:
C 7 ( a ) = C 7 ( b ) + C 7 ( c ) ,
where
C 7 ( b ) = a 7 a 7 a 7 a 7 a 7 a 7 a 7 d 7 f 7 k 7 , C 7 ( c ) = c 7 c 7 b 7 b 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 d 7 d 7 e 7 e 7 e 7 e 7 c 7 c 7 c 7 c 7 g 7 g 7 a 7 a 7 a 7 a 7 f 7 f 7 h 7 h 7 .
Matrix C 7 ( b ) has identical entries in its fourth row. This allows us to reduce the number of operations without further transformations. Next, we eliminate the row and column containing only zero entries in matrix C 7 ( c ) and obtain matrix C 6 ( d ) :
C 6 ( d ) = c 7 c 7 b 7 b 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 a 7 d 7 d 7 e 7 e 7 e 7 e 7 0 0 c 7 c 7 c 7 c 7 g 7 g 7 a 7 a 7 a 7 a 7 f 7 f 7 h 7 h 7 .
Matrix C 6 ( d ) contains repeated elements, enabling the design of the data flow graph for the 7-point 1D DTT calculation. Figure 5 shows the obtained data flow graph of the developed 7-point 1D DTT algorithm.
Further, we create the factorization of the matrix of 7-point 1D DTT using the matrices for calculations at each layer of the graph. As a result, we obtain matrix W 10 × 7 at the first layer of calculations:
W 10 × 7 = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .
The second layer of calculations is described by matrix W 10 which can be expressed as follows: W 10 = W 2 W 3 W 2 I 3 , where W 3 = 1 1 1 1 1 . Matrices W 19 × 10 and W 19 provide calculations at the third and fifth layers of the data flow graph in Figure 5:
W 19 × 10 = 1 2 × 1 1 3 × 1 1 3 × 1 1 1 2 × 1 1 2 × 1 1 3 × 1 I 3 ,
W 19 = 1 1 1 H 2 1 1 1 1 I 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .
We have diagonal matrix
D 19 = diag ( 2 , 1 , 4 , 1 , 1 , 2 , 2 , 1 , 1 , 8 , 1 , 1 , 4 , 2 , 1 , 1 , 16 , 4 , 2 , 16 )
at the fourth layer of calculation, and matrix W 7 × 19 at the sixth layer of calculation as follows:
W 7 × 19 = = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .
Combining the permutation matrices P 7 ( 1 ) and P 7 ( 0 ) and matrices W 10 × 7 , W 10 , W 19 × 10 , W 19 , and W 7 × 19 at different computing layers of the data flow graph from Figure 5, we derive the factorization of the 7-point 1D DTT matrix:
Y 7 × 1 = diag ( s 7 × 1 ) P 7 ( 1 ) W 7 × 19 W 19 D 19 W 19 × 10 W 10 W 10 × 7 P 7 ( 0 ) X 7 × 1 .
Let us consider the data flow graph of the designed 7-point 1D DTT algorithm from Figure 5. We notice that the initial 7-point real DTT requires 44 real-valued multiplications and 37 additions because five elements of the matrix C 7 are zeros. If the data are transformed by the proposed 7-point 1D DTT algorithm, then the number of real-valued multiplications is reduced to 7. The number of additions remains the same, and 22 shifts are required.

3.6. Algorithm for 8-Point DTT

Let us design the algorithm for 8-point DTT. Eight-point DTT is expressed as follows:
Y 8 × 1 = diag ( s 8 × 1 )   C 8 X 8 × 1 ,
where C 8 = 1 1 1 1 1 1 1 1 7 5 3 1 1 3 5 7 7 1 3 5 5 3 1 7 7 5 7 3 3 7 5 7 7 13 3 9 9 3 13 7 7 23 17 15 15 17 23 7 1 5 9 5 5 9 5 1 1 7 21 35 35 21 7 1 , Y 8 × 1 = y 0 y 1 y 2 y 3 y 4 y 5 y 6 y 7 , X 8 × 1 = x 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 .
The vector s 8 × 1 = s 0 ( 8 ) ,   s 1 ( 8 ) , s 2 ( 8 ) , s 3 ( 8 ) , s 4 ( 8 ) , s 5 ( 8 ) , s 6 ( 8 ) , s 7 ( 8 ) T is computed by expression (5). As a result, we obtain s 8 × 1 = [ 1 2 2 , 1 168 , 1 168 , 1 264 , 1 616 , 1 2184 , 1 264 , 1 3432 ] T or s 8 × 1 ≈ [0.3536, 0.0772, 0.0772, 0.0615, 0.0403, 0.0214, 0.0615, 0.0171 ] T . If we denote a 8 = 1 ;   b 8 = 3 ;   c 8 = 5 ;   d 8 = 7 ;   f 8 = 13 ;   g 8 = 21 ;   h 8 = 35 ;   k 8 = 23 ;   l 8 = 17 ;   m 8 = 15 ; then matrix C 8 can be represented as
C 8 ( a ) = a 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 d 8 c 8 b 8 a 8 a 8 b 8 c 8 d 8 d 8 a 8 b 8 c 8 c 8 b 8 a 8 d 8 d 8 c 8 d 8 b 8 b 8 d 8 c 8 d 8 d 8 f 8 b 8 e 8 e 8 b 8 f 8 d 8 d 8 k 8 l 8 m 8 m 8 l 8 k 8 d 8 a 8 c 8 e 8 c 8 c 8 e 8 c 8 a 8 a 8 d 8 g 8 h 8 h 8 g 8 d 8 a 8 .
The order of columns and rows is altered with the permutations of matrix C 8 ( a ) ,   π 7 = 1 2 1 8 3 4 5 6 7 8 2 7 3 6 4 5 and π 8 = 1 2 1 3 3 4 5 6 7 8 2 4 5 7 6 8 , respectively. The permutation matrices are as follows:
P 8 ( 1 ) = 1 1 1 1 1 1 1 1 ; P 8 ( 0 ) = 1 1 1 1 1 1 1 1 .
Matrix C 8 ( a ) acquires the following structure:
C 8 ( b ) = a 8 a 8 a 8 a 8 a 8 a 8 a 8 a 8 d 8 d 8 a 8 a 8 b 8 b 8 c 8 c 8 d 8 d 8 c 8 c 8 b 8 b 8 a 8 a 8 d 8 d 8 c 8 c 8 d 8 d 8 b 8 b 8 d 8 d 8 f 8 f 8 b 8 b 8 e 8 e 8 a 8 a 8 c 8 c 8 e 8 e 8 c 8 c 8 d 8 d 8 k 8 k 8 l 8 l 8 m 8 m 8 a 8 a 8 d 8 d 8 g 8 g 8 h 8 h 8 .
The repeated elements are contained in matrix C 8 ( b ) . They facilitate the design of butterfly modules in the data flow graph for the 8-point 1D DTT calculation. An obtained data flow graph of the developed 8-point 1D DTT algorithm is presented in Figure 6.
Let us derive the factorization of the matrix of 8-point 1D DTT. We consider the matrices for computations on each layer of the graph. The matrices
W 8 = H 2 H 2 H 2 H 2   and   W 1 8 × 8 = 1 2 × 1 1 2 × 1 1 3 × 1 1 3 × 1 1 2 × 1 1 2 × 1 1 2 × 1 1 2 × 1
are obtained at the first and second layers of the graph, respectively.
We have the diagonal matrix
D 18 = diag ( 1 , 7 , 7 , 1 , 1 , 1 , 4 , 5 , 2 , 1 , 2 , 1 , 4 , 1 , 1 , 4 , 1 , 2 )
at the third layer of calculation. The fourth, fifth and sixth layers of computations of the data flow graph in Figure 6 are implemented with the matrices W 22 × 18 , W 23 × 22 , and D 23 :
W 22 × 18 = I 5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 ,
W 23 × 22 = I 13 1 1 1 I 6 1 1 ,
D 23 = diag ( 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 4 , 1 , 1 , 1 , 1 , 1 , 1 , 4 ) .
Let us define matrices T 3 × 2 ( 1 ) = 1 1 1 1 and T 4 = 1 1 1 1 1 1 .
Then, we obtain matrix W 25 × 23 = I 7 T 3 × 2 ( 1 ) I 4 T 4 I 4 T 3 × 2 ( 3 ) , which provides calculations at the seventh layer of the data flow graph in Figure 6. Matrix T 3 × 2 ( 3 ) is defined by Equation (9). The computations at the eighth and ninth layers of the data flow graph in Figure 6 are provided with diagonal matrix D 25 and matrix W 8 × 25 :
D 25 = diag ( 1 , 1 , 1 , 1 , 1 , 1 , 2 , 1 , 1 , 2 , 1 , 4 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 1 , 2 , 1 , 1 , 3 , 1 ) ,
W 8 × 25 = = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 .
We multiply matrices P 8 ( 1 ) , P 8 ( 0 ) , W 8 × 25 , D 25 , W 25 × 23 , D 23 , W 23 × 22 , W 22 × 18 , D 18 , W 1 8 × 8 , and W 8 with each other. As a result, the factorization of the 8-point 1D DTT matrix is obtained:
Y 8 × 1 = diag ( s 8 × 1 ) P 8 ( 1 ) W 8 × 25 D 25 W 25 × 23 D 23 W 23 × 22 W 22 × 18 D 18 W 1 8 × 8 W 8 P 8 ( 0 ) X 8 × 1 .
We have presented some elements of diagonal matrices D 18 and D 25 as 7 = 8 − 1, 5 = 4 + 1, 3 = 2 + 1. It should be noted that initial 8-point real DTT requires 64 real-valued multiplications and 56 additions. Applying the proposed 8-point 1D DTT algorithm, we can reduce the number of real-valued multiplications to 8. Therefore, the total number of operations for the proposed 8-point 1D DTT algorithm is 53 additions, 8 real-valued multiplications, and 27 shifts.

3.7. Generalization of the Proposed Algorithms

Let us describe a generalization for N > 8 regarding the proposed N-point DTT algorithms, where N ranges from 3 to 8. The N-point DTT with N > 8 is also expressed by Equation (4). Further, we consider the even and odd lengths of input sequences separately.
At first, we suppose that N is even. Based on the symmetry of integer DTT, the following permutation of columns of the integer 1D DTT matrix C N can be used:
π N = 1 2 3 4 5 6 N 1 N 1 N 2 N 1 3 N 2 N / 2 N / 2 + 1 .
Rearranging the rows is not necessary at all; however, if it is feasible to organize the rows such that pairs of equal-signed elements create 2 × 2 blocks, which can help minimize the number of additions and shifts required. This trick offers a reducing the number of integer multiplications.
Next, we start designing the data flow graph from inputs to outputs by adding the permutation of inputs at the first computational level. Then, we pairwise combine the outputs from the first computational level into butterfly modules and the outputs of each such module are transmitted to a line of integer multipliers. Every multiplication by an integer can be implemented by additions and shifts. The results of the multiplications are added together to obtain the outputs of integer DTT, and permutation of outputs is performed. The outputs of real DTT are calculated using the outputs of integer DTT multiplied by the specific coefficients as described in Equation (4).
Let us consider an odd N. The order of the columns of the integer DTT matrix C N is changed using the permutation
π N = 1 2 3 4 N / 2 N / 2 + 1 N / 2 + 2 1 N 2 N 1 N / 2 N / 2 + 1 N / 2 + 2 ,
where N / 2 denotes the whole part of N / 2 . Columns N / 2 , N / 2 + 1 , and N / 2 + 2 remain in their places, and the remaining columns are grouped in pairs as in Equation (60). The symmetry of DTT is taken into account again.
The first and [N/2] + 1 rows are swapped. The order of the remaining rows can be left unchanged, or they can be swapped to reduce the number of multiplications and shifts as in the case of even N.
Next, we represent the transform matrix C N ( a ) after permutations as the sum of two matrices C N ( b ) and C N ( c ) of the same size as matrix C N ( a ) :
C N ( a ) = C N ( b ) + C N ( c )
.
Matrix C N ( b ) includes the [N/2] + 1 row and the [N/2] + 1 column of matrix C N ( a ) , and the rest of its elements are zero. In matrix C N ( c ) , the [N/2] + 1 row and the [N/2] + 1 column are filled with zeros, and the rest of the elements are the same as in matrix C N ( a ) . Then, matrix C N ( b ) has identical entries in its [N/2] + 1 row. This allows us to reduce the number of operations without further transformations. Next, we eliminate the row and column containing only zero entries in matrix C N ( c ) and obtain matrix C N 1 ( c ) . Matrix C N 1 ( c ) has an even size. Therefore, a data flow graph of the fast algorithm for the transform with matrix C N 1 ( c ) can be constructed similarly to the previous case.
To obtain the data flow graph for the transform with matrix C N ( a ) , we add the path, corresponding to the [N/2] + 1 output after the column permutation, to the data flow graph for the transform with matrix C N 1 ( c ) . The edges of the resulting graph that enter the node at this path immediately after column permutation are constructed using a structural approach similar to the case of N = 7. Again, for odd N, the outputs of integer DTT after row permutation are multiplied by the specific coefficients to obtain the outputs of real DTT.

4. Results

We used the MATLAB R2023b software for the experimental research of the proposed algorithms. The correctness of the designed algorithms was tested. For this, the DTT matrices were obtained using Equations (4)–(6) for N = 3, 4, 5, 6, 7, 8 at first. Next, the factorizations of 1D DTT matrices were calculated with the expressions (7), (10), (20), (32), (46), and (58). The correctness of the proposed algorithms was indicated via the coincidence of the elements of 1D DTT matrices and the elements of the products of the matrices included in the factorizations of the DTT matrices for the same N.
Further, the computational complexity of the developed algorithms was evaluated, and the results are illustrated in Table 1. In parentheses, we show the difference in the number of operations as a percentage. The minus sign indicates a reduction in the number of operations compared to the direct method, while the plus sign indicates an increase. As can be seen from Table 1, the number of multiplications decreased by an average of 78% for N = 3, 4, 5, 6, 7, 8. At the same time, the number of additions decreased by an average of 5% for the same N. The calculation of the direct matrix-vector product for N = 4 additionally requires 8 shifts.
After evaluating the effectiveness of the solutions by calculating the number of arithmetic operations required to implement the obtained algorithms, the additional evaluations’ memory consumption is provided in Figure 7. The proposed real DTT algorithms required 54% less memory than the direct matrix-vector product.
The evaluation of memory costs for the proposed algorithms demonstrates their limitations. However, unlike the evaluation of arithmetic complexity, the memory costs depend on the method of implementing the algorithm and the platform on which the algorithm is implemented. In addition, memory consumption is also affected by the skill and experience of the designer. In general, the proposed algorithms allow both software and hardware implementation. Moreover, both software and hardware implementation can use various approaches and tricks, that lead to certain memory costs, time delays, and hardware resource usage. As for memory, the calculations can be implemented so that the same memory cells are used at each subsequent stage of algorithm implementation if such a possibility exists. The proposed algorithms can be implemented sequentially, in parallel, or in parallel sequentially, depending on the structure and capabilities of the implementation platform. This implementation affects the time delay in obtaining results.
A deeper analysis shows that when implementing the proposed algorithms at the hardware level using maximum parallelization of calculations, intermediate memory in asynchronous mode may not be needed. If we organize pipeline processing, then we have to insert time delays that align the data flow. In this case, additional buffer memory will be needed. Thus, the evaluation of the efficiency of algorithmic solutions in terms of memory requirements, although it somehow characterizes these solutions, is very subjective. Therefore, since we do not consider implementation issues, it is the evaluation of arithmetic complexity that is, although not exhaustive, the most reliable and accurate characteristic of the efficiency of the proposed solutions.

5. Discussion of Computational Complexity

In Table 2, we present the number of additions, shifts, and multiplications for the 1D DTT algorithms known from the literature. It is important to note that in references [1,41], the number of additions and shifts for integer DTT was evaluated. Therefore, we added to the corresponding rows of Table 2 the minimal number of multiplications needed to implement real DTT. As a consequence, the results in Table 2 show that the number of multiplications for the proposed algorithm is the same for N = 4 and N = 8 as compared with [1,41]. At the same time, we reduced the multiplication by 67–72% compared with [8]. However, the number of additions increased by 12–66%, and additional shifts are required.
The advantage of the proposed solutions is that the length of the input data sequence for the proposed DTT fast algorithms is not limited by 4 and 8. The obtained algorithms are multiplier-free for integer DTT. They can calculate integer DTT and real DTT. The data flow graphs constructed for the proposed algorithms have a modular structure that is well-suited for hardware implementation. A carefully selected heuristic for certain values of N can greatly decrease the computational complexity of the DTT algorithm, as in [1]. However, extending the proposed algorithms to cases where N is greater than 8 enables the design of a fast DTT algorithm for input sequences of any length.

6. Conclusions

In this paper, the algorithms for real DTT with reduced multiplicative complexity are developed by combining sparse matrix factorization with the structural approach. The obtained algorithms have butterfly architecture and are multiplication-free for the integer DTT. The combination of the structural approach with sparse matrix factorization enables the design of fast algorithms without restricting input sequence length to 4 or 8. A matrix factorization of each proposed solution was constructed with sparse diagonal and quasi-diagonal matrices. Another advantage of the obtained algorithms is that their representation via the data flow graphs provides a layout for the VLSI implementation of the proposed solutions.
The computational complexity of the proposed algorithms was compared with the computational complexity of direct matrix-vector products. As a result, it was found that the obtained factorizations of real DTT matrices reduce the number of multiplications by an average of 78% in the range of signal lengths from 3 to 8. Meanwhile, the average number of additions in the same range of signal lengths decreased by nearly 5%. Although additional shifts are required instead of multiplications, they are less time- and resource-consuming.
Comparison with fast DTT algorithms known from the literature was provided for input signal lengths 4 and 8. It was shown that the proposed algorithms have the same or slightly worse computational complexity. The latter is caused by the versatility of combining the structural approach with a sparse matrix factorization compared to the specificity of existing solutions.
Future research development may involve constructing a fast orthogonal projection of DTT using the Kronecker product. Reducing the computational complexity of this form of DTT may be used in applications such as speech enhancement [42] and filter design [43].

Author Contributions

Conceptualization, A.C.; methodology, A.C. and M.P.; software, M.P.; validation, A.C. and M.P.; formal analysis, A.C. and M.P.; investigation, M.P.; writing—original draft preparation, M.P. and A.C.; writing—review and editing, A.C. and M.P.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Prattipati, S.; Ishwar, S.; Swamy, M.N.S.; Meher, P.K. A fast 8 × 8 integer Tchebichef transform and comparison with integer cosine transform for image compression. In Proceedings of the 2013 IEEE 56th International Midwest Symposium on Circuits and Systems (MWSCAS), Columbus, OH, USA, 4–7 August 2013. [Google Scholar]
  2. Kouadria, N.; Mechouek, K.; Harize, S.; Doghmane, N. Region-of-interest based image compression using the discrete Tchebichef transform in wireless visual sensor networks. Comput. Electr. Eng. 2019, 73, 194208. [Google Scholar] [CrossRef]
  3. Kouadria, N.; Mansri, I.; Harize, S.; Doghmane, N.; Mechouek, K. Lossy compression of color images based on discrete Tchebichef transform. In Proceedings of the 2019 International Symposium on Signals, Circuits and Systems (ISSCS), Iasi, Romania, 11–22 July 2019. [Google Scholar]
  4. Xiao, B.; Shi, W.; Lu, G.; Li, W. An optimized quantization technique for image compression using discrete Tchebichef transform. Pattern Recognit. Image Anal. 2018, 28, 371378. [Google Scholar] [CrossRef]
  5. Mefoued, A.; Kouadria, N.; Harize, S.; Doghmane, N. Improved discrete Tchebichef transform approximations for efficient image compression. J. Real-Time Image Process. 2023, 21, 12. [Google Scholar] [CrossRef]
  6. Coutinho, V.A.; Cintra, R.J.; Bayer, F.M.; Oliveira, P.A.M.; Oliveira, R.S.; Madanayake, A. Pruned discrete Tchebichef transform approximation for image compression. Circuits Syst. Signal Process. 2018, 37, 43634383. [Google Scholar] [CrossRef]
  7. Yang, L.; Liu, X.; Hu, Z. Advance and prospect of discrete Tchebichef transform and its application. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020. [Google Scholar]
  8. Gogoi, S.; Peesapati, R. Design and implementation of low power 4 × 4/8 × 8 2D-DTT architecture for image and video compression. In Proceedings of the 2019 Women Institute of Technology Conference on Electrical and Computer Engineering (WITCON ECE), Dehradun, India, 22–23 November 2019. [Google Scholar]
  9. Chan, K.-H.; Im, S.-K. Discrete Tchebichef transform for versatile video coding. In Proceedings of the 2021 International Conference on Multimedia Retrieval (ICMR′21), Taipei, Taiwan, 21–24 August 2021. [Google Scholar]
  10. Oliveira, P.A.M.; Cintra, R.J.; Bayer, F.M.; Kulasekera, S.; Madanayake, A. A discrete Tchebichef transform approximation for image and video coding. IEEE Signal Process. Lett. 2015, 22, 1137–1141. [Google Scholar] [CrossRef]
  11. Mefoued, A.; Harize, S.; Kouadria, N. Efficient, low complexity 8-point discrete Tchebichef transform approximation for signal processing applications. J. Frankl. Inst. 2023, 360, 4807–4829. [Google Scholar] [CrossRef]
  12. Liu, Z.; Bai, Z.; Shi, J.; Chen, H. Image segmentation by using discrete Tchebichef moments and quantum neural network. In Proceedings of the Third International Conference on Natural Computation (ICNC 2007), Haikou, China, 24–27 August 2007. [Google Scholar]
  13. Kumar, A.; Ahmad, M.O.; Swamy, M.N.S. Tchebichef and adaptive steerable-based total variation model for image denoising. IEEE Trans. Image Process. 2019, 28, 2921–2935. [Google Scholar] [CrossRef]
  14. Abu, N.A.; Ernawan, F.; Salim, F. Smooth formant peak via discrete Tchebichef transform. J. Comput. Sci. 2015, 11, 351–360. [Google Scholar] [CrossRef]
  15. Ernawan, F.; Noersasongko, E.; Abu, N.A. Using discrete Tchebichef transform on speech recognition. In Proceedings of the Fourth International Conference on Machine Vision (ICMV 2011): Computer Vision and Image Analysis; Pattern Recognition and Basic Technologies, Singapore, 9–10 December 2011. [Google Scholar]
  16. Kumar, A.; Singh, H.V.; Khare, V. Tchebichef transform domain-based deep learning architecture for image super-resolution. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 2182–2193. [Google Scholar] [CrossRef]
  17. Aliouat, A.; Kouadria, N.; Chiper, D.F. x-DTT: A package for calculating real and integer discrete Tchebichef transform kernels based on orthogonal polynomials. SoftwareX 2023, 23, 101441. [Google Scholar] [CrossRef]
  18. VideoLAN, a Project and a Non-Profit Organization. x264. Available online: http://www.videolan.org/developers/x264.html (accessed on 8 April 2025).
  19. Richardson, I. The H.264 Advanced Video Compression Standard, 2nd ed.; Wiley: Hoboken, NJ, USA, 2010. [Google Scholar]
  20. Cintra, R.J.; Bayer, F.M. A DCT approximation for image compression. IEEE Signal Process. Lett. 2011, 18, 579–582. [Google Scholar] [CrossRef]
  21. Cintra, R.J.; Bayer, F.M.; Tablada, C.J. Low-complexity 8-point DCT approximations based on integer functions. Signal Process. 2014, 99, 201–214. [Google Scholar] [CrossRef]
  22. Farsiani, S.; Sodagar, A.M. Hardware and power-efficient compression technique based on discrete Tchebichef transform for neural recording microsystems. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020. [Google Scholar]
  23. Senapati, R.K.; Pati, U.C.; Mahapatra, K.K. Reduced memory, low complexity embedded image compression algorithm using hierarchical listless discrete Tchebichef transform. IET Image Process. 2014, 8, 213–238. [Google Scholar] [CrossRef]
  24. Setiadi, D.R.I.M.; Sutojo, T.; Rachmawanto, E.H.; Sari, C.A. Fast and efficient image watermarking algorithm using discrete Tchebichef transform. In Proceedings of the 2017 5th International Conference on Cyber and IT Service Management (CITSM), Denpasar, Bali, 8–10 August 2017. [Google Scholar]
  25. Huang, W.; Chen, S.; Zheng, G. A fast 2D discrete Tchebichef transform algorithm. In Proceedings of the 2010 International Conference on Innovative Computing and Communication and 2010 Asia-Pacific Conference on Information Technology and Ocean Engineering, Macao, China, 30–31 January 2010. [Google Scholar]
  26. Paim, G.; Santana, G.M.; Rocha, L.M.G.; Soares, L.; da Costa, E.A.C.; Bampi, S. Exploring approximations in 4- and 8- point DTT hardware architectures for low-power image compression. Analog Integr. Circuits Signal Process. 2018, 97, 503514. [Google Scholar] [CrossRef]
  27. Paim, G.; Rocha, L.M.G.; Santana, G.M.; Soares, L.; da Costa, E.A.C.; Bampi, S. Power-, area-, and compression-efficient eight-point approximate 2-D discrete Tchebichef transform hardware design combining truncation pruning and efficient transposition buffers. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 66, 680–693. [Google Scholar] [CrossRef]
  28. Saleh, H.I. A fast block-pruned 4 × 4 DTT algorithm for image compression. Int. J. Comput. Theory Eng. 2009, 1, 258–261. [Google Scholar] [CrossRef]
  29. Saleh, H.I. A fast multiplier-less regular 4 × 4 DTT algorithm. Arab. J. Nucl. Res. Appl. 2009, 42, 279–287. [Google Scholar]
  30. Nakagaki, K.; Mukundan, R. A fast 4 × 4 forward discrete Tchebichef transform algorithm. IEEE Signal Process. Lett. 2007, 14, 684–687. [Google Scholar] [CrossRef]
  31. Abdelwahab, S.A. Image compression using a fast and efficient discrete Tchebichef transform algorithm. In Proceedings of the 6th International Conference on Informatics and Systems, INFOS2008, Cairo, Egypt, 27–29 March 2008. [Google Scholar]
  32. Bouguezel, S.; Ahmad, M.; Swamy, M. A multiplication-free transform for image compression. In Proceedings of the 2008 2nd International Conference on Signals, Circuits and Systems, Nabeul, Tunisia, 7–9 November 2008. [Google Scholar]
  33. Khan, M.Y.; Istyaq, S. Novel matrices based on DTT to substitute DCT. Int. J. Electron. Electr. Comput. Syst. 2017, 6, 37–40. [Google Scholar]
  34. Zhang, X.; Yu, F.X.; Guo, R.; Kumar, S.; Wang, S.; Chang, S.-F. Fast orthogonal projection based on Kronecker product. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
  35. Majorkowska-Mech, D.; Cariow, A. Discrete pseudo-fractional Fourier transform and its fast algorithm. Electronics 2021, 10, 2145. [Google Scholar] [CrossRef]
  36. Bi, G.; Zeng, Y. Transforms and Fast Algorithms for Signal Analysis and Representations, 1st ed.; Birkhäuser: Boston, MA, USA, 2004. [Google Scholar]
  37. Cariow, A. Strategies for the synthesis of fast algorithms for the computation of the matrix-vector product. J. Signal Process. Theory Appl. 2014, 3, 1–19. [Google Scholar] [CrossRef]
  38. Polyakova, M.; Witenberg, A.; Cariow, A. The design of fast type-V discrete cosine transform algorithms for short-length input sequences. Electronics 2024, 13, 4165. [Google Scholar] [CrossRef]
  39. Polyakova, M.; Cariow, A. Fast type-II Hartley transform algorithms for short-length input sequences. Appl. Sci. 2024, 14, 10719. [Google Scholar] [CrossRef]
  40. Polyakova, M.; Cariow, A.; Sklyar, J. Fast algorithms for short-length odd-time and odd-frequency discrete Hartley transforms. Electronics 2025, 14, 996. [Google Scholar] [CrossRef]
  41. Ishwar, S.; Meher, P.K.; Swamy, M.N.S. Discrete Tchebichef transform—A fast 4 × 4 algorithm and its application in image/video compression. In Proceedings of the 2008 IEEE International Symposium on Circuits and Systems (ISCAS), Seattle, WA, USA, 18–21 May 2008. [Google Scholar]
  42. Abramov, S.; Abramova, V.; Lukin, V.; Egiazarian, K. Prediction of signal denoising efficiency for DCT-based filter. Telecommun. Radio Eng. 2019, 78, 1129–1142. [Google Scholar] [CrossRef]
  43. Kocoń, S.; Piskorowski, J. Determination of non-zero initial conditions for IIR Notch filters using the vector projection method with minimum delay. Energies 2023, 16, 1702. [Google Scholar] [CrossRef]
Figure 1. The data flow graph of the three-point DTT algorithm.
Figure 1. The data flow graph of the three-point DTT algorithm.
Signals 06 00023 g001
Figure 2. The data flow graph of the 4-point DTT algorithm.
Figure 2. The data flow graph of the 4-point DTT algorithm.
Signals 06 00023 g002
Figure 3. The data flow graph of the 5-point DTT algorithm.
Figure 3. The data flow graph of the 5-point DTT algorithm.
Signals 06 00023 g003
Figure 4. The data flow graph of the six-point 1D DTT algorithm.
Figure 4. The data flow graph of the six-point 1D DTT algorithm.
Signals 06 00023 g004
Figure 5. The data flow graph of the 7-point 1D DTT algorithm.
Figure 5. The data flow graph of the 7-point 1D DTT algorithm.
Signals 06 00023 g005
Figure 6. The data flow graph of the eight-point 1D DTT algorithm.
Figure 6. The data flow graph of the eight-point 1D DTT algorithm.
Signals 06 00023 g006
Figure 7. Diagram of the memory consumption for real 1D DTT, which is implemented via a direct matrix-vector product (black bars) and the proposed algorithms (gray bars).
Figure 7. Diagram of the memory consumption for real 1D DTT, which is implemented via a direct matrix-vector product (black bars) and the proposed algorithms (gray bars).
Signals 06 00023 g007
Table 1. The number of multiplications, additions, and shifts performed by the proposed algorithms and the direct matrix-vector products.
Table 1. The number of multiplications, additions, and shifts performed by the proposed algorithms and the direct matrix-vector products.
NDirect MethodProposed Algorithms
Adds.Mults.Adds.Mults.Shifts
3585 (0%)3 (−62%)1
4128 9 (−25%)2 (−75%)5
5182319 (+5%)5 (−78%)6
6303627 (−10%)6 (−83%)15
7374437 (0%)7 (−84%)22
8566453 (−5%)8 (−87%)27
Table 2. The number of operations for the existing algorithms.
Table 2. The number of operations for the existing algorithms.
AlgorithmReference, Year of Publication N = 4 N = 8
Mults.Adds.ShiftsMults.Adds.Shifts
Gogoi, Peesapati [8], 20196 (−67%)8 (+12%)2 (+150%)29 (−72%)32 (+66%)12 (+125%)
Prattipati, Ishwar, Swamy, Meher[1], 20138 (0%)44 (+20%) 29 (−7%)
Ishwar, Meher, Swamy[41], 20082 (0%)10 (−10%)4 (+25%)
Proposed algorithm29585327
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cariow, A.; Polyakova, M. The Fast Discrete Tchebichef Transform Algorithms for Short-Length Input Sequences. Signals 2025, 6, 23. https://doi.org/10.3390/signals6020023

AMA Style

Cariow A, Polyakova M. The Fast Discrete Tchebichef Transform Algorithms for Short-Length Input Sequences. Signals. 2025; 6(2):23. https://doi.org/10.3390/signals6020023

Chicago/Turabian Style

Cariow, Aleksandr, and Marina Polyakova. 2025. "The Fast Discrete Tchebichef Transform Algorithms for Short-Length Input Sequences" Signals 6, no. 2: 23. https://doi.org/10.3390/signals6020023

APA Style

Cariow, A., & Polyakova, M. (2025). The Fast Discrete Tchebichef Transform Algorithms for Short-Length Input Sequences. Signals, 6(2), 23. https://doi.org/10.3390/signals6020023

Article Metrics

Back to TopTop