The Development of Fast DST-II Algorithms for Short-Length Input Sequences

: The subject of this work is the development of fast algorithms for the discrete sinusoidal transformation of the second type (DST-II) for sequences of input data of small length N = 2, 3, 4, 5, 6, 7, 8. The starting point for the development of algorithms is the well-known possibility of representing any discrete transformation in the form of a matrix–vector product. Due to the remarkable structural properties of the matrices of the DST-II transformation base, these matrices can be successfully factorized, which should lead to a reduction in the computational complexity of the procedure as a whole. You can factorize matrices in different ways. The art of designing fast algorithms is to find the factorization that produces the maximum effect. We justified the correctness of the obtained algorithmic solutions theoretically, using strict mathematical derivations of each of them. The developed algorithms were then further tested using MATLAB R2023b software to finally confirm their performance. Finally, we presented estimates of the computational complexity for each solution obtained and compared them with direct computational methods that rely on the direct calculation of matrix–vector products.


Introduction
Discrete trigonometric transforms are widely used in solving problems in many modern computing systems for digital signal and image processing, including filtering and denoising, noisy speech enhancement, interpolation, video coding, etc. [1][2][3][4][5][6][7][8][9][10].There are eight different types of discrete cosine transform and eight types of discrete sine transform [11].The popularity of the discrete cosine transform is based on the fact that it closely approximates the optimal Karhunen-Löwe transform (KLT) under a stationary firstorder Markov condition with strong inter-pixel correlations.However, for low-correlation input signals, discrete sine transform (DST) provides lower data rates [12][13][14] because, like other orthogonal transforms, implementing the discrete sine transforms requires much time to search for algorithmic solutions.To reduce this time is an important task; this problem can be solved in two ways.One direction is the hardware implementation of calculations [15][16][17][18] and the other one is the reduction of the number of arithmetic operations necessary to implement the transform.A large number of papers is devoted to the development of effective algorithms for the implementation of various discrete cosine transforms (DCTs) and DSTs, but most of them pursue the search for universal solutions that allow reducing the number of arithmetic operations for arbitrary lengths of input data sequences [19][20][21][22][23][24][25].There is a third way, which also has a right to exist.This is an approximation of discrete trigonometric transforms.To date, a large number of algorithms have been developed that use approximations of the DCT/DST transforms.
Approximation algorithms for sequences of standard lengths N = 4, 8, and 16 are known [26][27][28][29][30].However, the development of reduced complexity algorithms for traditional small-size DCT/DST transforms has not been canceled.Some applications require the use of conventional DCT/DST transforms for various short-length input data sequences.This is explained by the fact that algorithms for small-size transforms can serve as kernels for the synthesis of larger algorithms [30,31].A fairly large number of works have been devoted to the development of small-sized DCT algorithms, and much less attention has been paid to similar algorithms for the DST.Among the other types of discrete trigonometric transforms, DCT-II/DST-II plays an important role [29, [32][33][34][35].For small-sized DCT-II, the algorithms were shown in one of our previous papers [36].We did not find any small-sized type II DSTs in the sources known to us.To fill this gap, we are developing fast algorithms for low-dimensional discrete trigonometric transformations to expand their collection.This paper is devoted to reduced complexity DST-II algorithms for input sequences of length N = 2, 3, 4, 5, 6, 7, 8.

Short Background
The discrete sine transform is one of the orthogonal transforms used, among others, for the analysis and processing of sounds and signals.DST-II can be represented by the following expression: where for k = N − 1 and equals 1 for the remaining k; • y is the output sequence after the DST-II operation is performed; • x n is the sequence of input data; • N is the number of signal samples.
In matrix notation, DST-II can be represented as follows: where for k = N − 1 and equals 1 for the remaining k.
DST-II in matrix notation is as follows: In this paper, we use the following markings and signs: ⊗ is the Kronecker product of two matrices; • ⊕ is the direct sum of two matrices.
An empty cell in a matrix means it contains zero.We mark the multipliers as s m , but we do not use a superscript in the data flow graphs in order to maintain greater readability and elegance.

Algorithm for 2-Point DST-II
The expression for two-point DST-II is as follows: where , a 2 = 0.7071.
The expression for DST-II for N = 2 can be presented as follows: where , s Figure 1 shows a data flow graph of the synthesized algorithm for the two-point DST-II.As can be seen, we are able to reduce the number of multiplication operations from 4 to 2, while the number of addition operations is 2, which is the same as when using the direct method.

Algorithm for 3-Point DST-II
The expression for three-point DST-II is as follows: where Now, we will decompose the matrix C 3 into two components: where After eliminating redundancy in matrix C (b) 3 and eliminating rows and columns containing only zero entries, we obtain matrix C 2 : Thanks to the already noted remarkable properties of structural matrices, the final computational procedure for the three-point DST-II takes the following form: where 1 , s 2 , s , s Figure 2 shows a data flow graph of the synthesized algorithm for the three-point DST-II.As can be seen, we are able to reduce the number of multiplication operations from 9 to 4, while the number of addition operations is 5, which is the same as when using the direct method.

Algorithm for 4-Point DST-II
The expression for four-point DST-II is as follows: where Permute columns of C 4 according to π (0) 4 and rows according to π (1) 4 .After permutations, the matrix acquires the following structure: where Matrices with such a structure allow effective factorization, which leads to a reduction in the number of arithmetic operations when calculating matrix-vector products [37].
In this work, we preserve designations of matrices
Taking this into account, we can derive the final expression: where Figure 3 shows a data flow graph of the synthesized algorithm for four-point DST-II.As can be seen, we are able to reduce the number of multiplication operations from 8 to 3 and the number of addition operations from 12 to 9.

Algorithm for 5-Point DST-II
The expression for five-point DST-II is as follows: where Now, we will decompose the matrix C 5 into two components: where 5 has one entry in the first and third rows and five entries with the same value in the fifth row, which means that the number of operations is small and we do not need to perform further transformations for this matrix.4 .After permutations, the matrix matches the matrix pattern: where Considering the structures of the resulting matrices, the final computational procedure can be derived as: where , s (5) 6 = e 5 , 2×3 ⊕ 1, .
Figure 4 shows a data flow graph of the synthesized algorithm for five-point DST-II.As can be seen, we are able to reduce the number of multiplication operations from 23 to 7 and the number of addition operations from 18 to 17.

Algorithm for 6-Point DST-II
The expression for six-point DST-II is as follows: where Y 6×1 = [y 0 , y 1 , y 2 , y 3 , y 4 , y 5 ] T , X 6×1 = [x 0 , x 1 , x 2 , x 3 , x 4 , x 5 ] T , Now, we will decompose the matrix C 6 into two components: where 6 has two of the same entries in the first, second, and fifth rows and six entries with the same value in the third and sixth rows, which allows us to reduce the number of operations without the need for further transformations.
After eliminating redundancy in matrix C (b) 6 and eliminating rows and columns containing only zero entries, we obtain matrix C 4 : Let us define the permutation π (2) 4 in the following form: We permute columns of C 4 according to π (0) 4 and rows according to π (2) 4 .After permutation, the matrix matches the matrix pattern: where Taking this into account, we can derive the final expression: (0) where 1 , . . ., s Figure 5 shows a data flow graph of the synthesized algorithm for six-point DST-II.As can be seen, we are able to reduce the number of multiplication operations from 30 to 7 and the number of addition operations from 28 to 25.

Algorithm for 7-Point DST-II
The expression for seven-point DST-II is as follows: where Y 7×1 = [y 0 , y 1 , y 2 , y 3 , y 4 , y 5 , y 6 ] T , X 7×1 = [x 0 , x 1 , x 2 , x 3 , x 4 , x 5 , x 6 ] T , Now, we will decompose the matrix C 7 into two components: where 7 has one entry in the first, third, and fifth rows and seven entries with the same value in the seventh row, which allows us to reduce the number of operations without the need for further transformations.After eliminating redundancy in matrix C (b) 7 and eliminating rows and columns containing only zero entries, we obtain matrix C 6 : Let us define the permutations π We permute columns of C 6 according to π (1) 6 and rows according to π (0) 6 .After permutation, the matrix matches the matrix pattern: where Now, we will move on to dealing with matrices A 3 and B 3 .In this case, a circular convolution matrix will be used [38].The circular convolution matrix for N = 3 and the expressions for calculating the values are as follows: The calculation procedure for the circular convolution matrix for N = 3 is presented below: 3 T 3×4 D (1) where 4 = diag(s 0 , s 1 , s 2 , s 3 ), To make the A 3 and B 3 matrices consistent with the circular convolution expression, we need to modify them.In the A 3 matrix, we change the sign of all terms in the second column and third row.In the B 3 matrix, we change the sign in the first row and first column.In this way, we obtain the matrices: Using the three-point convolution algorithm, the values s i for matrices A ′ 3 and B ′ 3 take the following form: 3 = e 7 + c 7 , s a 7 + e 7 + 2c 7 Considering the presented derivations, the final computational procedure for 7-point DST-II will look as follows. (3) where Figure 6 shows a data flow graph of the synthesized algorithm for seven-point DST-II.As can be seen, we are able to reduce the number of multiplication operations from 46 to 10 and the number of addition operations from 39 to 37.

Algorithm for 8-Point DST-II
The expression for eight-point DST-II is as follows: where Y 8×1 = [y 0 , y 1 , y 2 , y 3 , y 4 , y 5 , y 6 , y 7 ] T , X 8×1 = [x 0 , x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 ] T , After this operation, the calculation procedure is as follows: where . Now, we will deal with matrices A 4 and B 4 .The matrix A 4 does not fit any pattern and we need to modify it.We do this by changing the sign for the third column.In this way, we obtain a matrix that looks like this:  We permute columns of A ′ 4 according to π (0) 4 and rows according to π (3) 4 .Now, A ′ 4 fits the pattern: Let us permute columns of B 4 according to π (0) 4 .Then, we are able to use the matrix pattern: After this operation, the calculation procedure is as follows: 8 P (1) 10×8 P (2) 8 W

Discussion of Computational Complexity
Firstly, we explain how to calculate the number of multiplication and addition operations for the direct DST-2 calculation method and proposed solutions.For any number that is a power of two, a shift can be used instead of a multiplication operation.If the value is zero, then we do not count addition and multiplication operations for it.
The above appear in the matrices: C 3 -one zero; C 4 -eight values of 0.5; C 5 -two zeros; C 6 -two zeros and four values of 0.5; C 7 -three zeros.In the proposed solutions, in diagonal matrices are the following: D 5 -two values of 0.5; D 8 -one value of 0.5.
The work shows how it is possible to reduce the number of multiplication operations in DST-II algorithms of sizes 2 to 8. At the same time, the number of addition operations was slightly reduced.The number of addition operations was reduced by an average of 20%, and the number of multiplication operations was reduced by an average of 74%.The achieved results are presented in Table 1, which contains data by taking into account the above rules.This allows for a significant reduction in the amount of resources used on the signal processor while speeding up work and allowing for easier operation in real time.A sig-nificant reduction in multiplication operations contributes to this because, due to their characteristics, they are more expensive to use than addition operations.
Each proposed algorithm has been implemented in the MATLAB environment and we are confident that they all work correctly.

Conclusions
To date, many papers have already been published concerning the development of fast algorithms for implementing discrete trigonometric transforms [1,2,11,20,27].These studies have not lost their relevance today.The presented article is a continuation of these studies.For well-known reasons, we have focused on developing fast algorithms for short sequences of input data.Generally speaking, small-sized, fast discrete trigonometric transform algorithms are of particular interest because they are subsequently used as building blocks for larger-sized algorithms.We plan to collect a library of fast short-length algorithms for all types of discrete trigonometric transforms.For some types of trigonometric transformations, such as DCT-I, DCT-II, and DCT-IV (as well as some others), such algorithms have already been developed [32,36,37].The subject of our research is fast algorithms for small-sized DST-II transforms.The solutions presented here are intended to replenish the collection of fast discrete trigonometric transformation algorithms that many researchers have been working on for several decades.We present here the new algorithms we have obtained, without, however, claiming that they are optimal.This is what we managed to obtain, and we want to share our solutions with the scientific community.If someone manages to achieve better results, we will only be happy about it.

Figure 1 .
Figure 1.The data flow graph of the proposed algorithm for computation of two-point DST-II.

Figure 2 .
Figure 2. The data flow graph of the proposed algorithm for computation of three-point DST-II.
Now, we need to change the order of columns and rows.Let us define the permutations π

Figure 3 .
Figure 3.The data flow graph of the proposed algorithm for computation of four-point DST-II.
After eliminating redundancy in matrix C (b) 5 and eliminating rows and columns containing only zero entries, we obtain matrix C 4 : permute columns of C 4 according to π (0) 4 and rows according to π (1)

Figure 4 .
Figure 4.The data flow graph of the proposed algorithm for computation of five-point DST-II.

Figure 5 .
Figure 5.The data flow graph of the proposed algorithm for computation of six-point DST-II.

Figure 6 .
Figure 6.The data flow graph of the proposed algorithm for computation of seven-point DST-II.

Figure 7 .
Figure 7.The data flow graph of the proposed algorithm for computation of eight-point DST-II.

Table 1 .
Comparison of the direct method with the proposed solutions.