Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences

Polyakova, Marina; Cariow, Aleksandr; Papliński, Janusz P.

doi:10.3390/electronics14193958

Open AccessArticle

Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences

by

Marina Polyakova

^1,*,

Aleksandr Cariow

^2,*

and

Janusz P. Papliński

^2,*

¹

Institute of Computer Systems, Odesa Polytechnic National University, 65044 Odesa, Ukraine

²

Faculty of Computer Science and Information Technology, West Pomeranian University of Technology in Szczecin, 71-210 Szczecin, Poland

^*

Authors to whom correspondence should be addressed.

Electronics 2025, 14(19), 3958; https://doi.org/10.3390/electronics14193958

Submission received: 10 September 2025 / Revised: 2 October 2025 / Accepted: 6 October 2025 / Published: 8 October 2025

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

This paper presents new fast discrete Krawtchouk transform (DKT) algorithms for input sequences of length 3 to 8. Small-sized DKT algorithms can be utilized in image processing applications to extract local image features formed by a sliding spatial window, and they can also serve as building blocks for developing larger-sized algorithms. Existing strategies to reduce the computational complexity of DKT mainly focus on modifying the recurrence relations for Krawtchouk polynomials, dividing the input signals into blocks or layers, or using different methods to approximate the coefficient values. Algorithms developed using the first two strategies are computationally intensive, which introduces a significant time delay in the computation process. Algorithms based on the approximation of polynomial coefficient values reduce computation time but at the expense of reduced accuracy. We use a different approach based on reducing the block structure of the matrix to one of the previously developed block-structural patterns, which allows us to factorize the resulting matrix in such a way that it leads to a reduction in the computational complexity of the synthesized algorithm. We describe the algorithmic solutions we have obtained through data flow graphs. The proposed DKT algorithms reduce the number of multiplications, additions, and shifts by an average of 58%, 27%, and 68%, respectively, compared to the direct computation of DKT via matrix-vector product. These characteristics were averaged across the considered input sizes (from 3 to 8).

Keywords:

discrete Krawtchouk transform; Krawtchouk moments; fast algorithms; computational complexity

1. Introduction

Currently, in signal and image processing, transforms based on orthogonal polynomials are being widely used [1]. In particular, the discrete Tchebichef transform is applied in image compression and video coding [2,3], as well as in text analysis [4]. Hahn moments are utilized for image classification [5,6], while Legendre moments are employed in image analysis and reconstruction [7,8]. Image moments obtained based on orthogonal polynomials are projections of an image function on a polynomial basis. They capture image object shape properties. That is why such moments are applied as local or global image features in image processing and computer vision [1,9]. One such feature set is the Krawtchouk moments, which is used in image watermarking [10,11,12,13], image analysis and reconstruction [6,14,15,16,17], face recognition [18], edge detection [19], data hiding [20], single-pixel imaging [21,22], and other applications. Krawtchouk moments can be extracted using the discrete Krawtchouk transform (DKT), which computes a weighted sum of signal samples or pixel intensities using Krawtchouk polynomials of various orders. Like other transforms based on orthogonal polynomials, the DKT shares properties of linearity and orthonormality and represents the signal in the spectral domain. Owing to the existence of the inverse DKT, the complete reconstruction of a signal is enabled.

In image processing, the use of the DKT often requires reducing computational complexity, especially in real-time or high-performance applications, such as three-dimensional (3D) object reconstruction [17] or face recognition [18]. This paper addresses the issue of saving arithmetic operations for one-dimensional DKT. Unlike trigonometric transforms, the elements of the DKT matrices depend not only on the discrete independent variable x and the degree n of Krawtchouk polynomials, which serve as the indices of these elements, but also on the localization parameter p. This parameter affects both the structure of the DKT matrices and the values of their elements. Therefore, the computational complexity of the DKT has often been reduced by applying algebraic transformations to the recursive formulas used to calculate Krawtchouk polynomial values for different p [23,24].

At the same time, the DKT algorithms for short-length input sequences can significantly decrease the time consumption of the local feature extraction in image processing and computer vision applications. Additionally, such algorithms can be exploited as building blocks for designing fast algorithms for large data sequences. For example, the Kronecker product of several smaller orthogonal matrices can be used to form a transform matrix for processing large data sequences when reducing the overall system complexity is crucial [25,26].

In this article, we consider a relevant problem of constructing fast DKT algorithms for short-length input sequences. To determine the appropriate strategy for developing these algorithms, related papers are reviewed.

1.1. State-of-the-Art of the Problem

As already mentioned, the DKT requires efficient computations because it is extensively used in image processing to obtain global and local image features. However, calculating Krawtchouk polynomials and their corresponding moments requires evaluating hypergeometric functions, which is computationally intensive. To address this issue, two strategies have been proposed in the literature to reduce the computational complexity [23].

The first strategy concentrates on computing the moment kernels [27,28]. These kernel values represent the elements of the DKT matrices and are derived from orthogonal polynomials evaluated at different arguments and orders (degrees). Recursive relations are employed to calculate the polynomial values instead of directly computing the polynomials based on hypergeometric functions [29]. Additionally, digital filter design is used for this purpose. In [9], Krawtchouk moments were obtained from the outputs of cascaded digital filters. The first filter generated Krawtchouk moments through geometric moments, while the second filter computed the Krawtchouk moments directly. Applying these filters to a real image of size 128 × 128 pixels reduced processing time by 57–87% compared to using recurrence relations.

The second strategy assumes that the moment kernels have been evaluated previously. The input signal is represented by a set of intensity slices or blocks to reduce the computational complexity of the Krawtchouk moment calculation compared to the whole signal [30,31]. For instance, based on this strategy, studies in [17,31] introduced an algorithm for computing 3D Krawtchouk moments that utilizes auxiliary matrices along the depth axis of the object. This approach significantly reduces both the computational complexity and the time needed for 3D object reconstruction.

In [23], the two strategies were combined into a unified algorithm that computes the moment functions through recursive relations for the Krawtchouk polynomials. The summations over image pixel intensities or signal samples use Clenshaw’s algorithm. The algorithm has demonstrated a reduction in computational complexity compared to existing recursion methods and fast algorithms for calculating Krawtchouk moments.

Besides the two previously mentioned strategies, the authors of the paper [32] proposed a fast algorithm for computing the 4 × 4 DKT by utilizing the reduced number of distinct basis function values. The resulting algorithm requires only 8 multiplications, 80 additions, and 32 shifts. Consequently, it reduces the number of multiplications by 98%, 88%, and 83% compared to the direct method, the DKT recurrence relations, and the representation by cascaded digital filters, respectively.

Thus, the brief review above has uncovered strategies for reducing the computational complexity of the DKT. Next, we will identify the limitations of existing algorithms and outline the main contribution of this research.

1.2. The Main Contributions of the Paper

The literature review has shown that, in accordance with the first strategy, the recursive relations exploit the symmetry of the Krawtchouk polynomials, reducing the computational cost compared to direct calculation using closed-form expressions. This leads to faster evaluation, especially for high-order polynomials and large transform sizes. Recursive algorithms can handle polynomials of much higher orders than direct methods, enabling the feature extraction from large datasets for signal/image processing applications [23,27].

However, recursive relations can still encounter issues with numerical stability and the spread of rounding errors, especially for very high polynomial orders or extreme parameter values [23]. If the recursive scheme is not carefully designed, numerical errors may accumulate, reducing the accuracy of the transform in certain parameter ranges. Some recursive algorithms require the calculation of initial values and use multiple recurrence relations, increasing implementation complexity. As a result, although they lower computational complexity, it remains significant and can still limit the use of these methods in real-time or embedded systems unless further improvements or parallel processing are implemented [9,27,28].

In accordance with the second strategy, decomposing a grayscale image into several binary intensity slices allows describing the image as a set of homogeneous rectangular blocks. The computation of moment kernels via intensity slices enhances the capability of Krawtchouk moments to capture local image features with reduced computational complexity and improved scalability for large images. As a result, pattern recognition performance improves compared to global moment calculations. The computational complexity can be reduced by handling fewer pixels per slice through block representations, making DKT suitable for real-time applications. Moreover, computations on individual slices can be parallelized [29,30,31].

However, decomposing into multiple intensity slices captures local image features with less computational effort. It allows for better scalability with large images, which can offset some of the speed advantages for smaller images or fewer slices. The choice of the number of slices and block sizes must be made carefully, as overly fine slicing can increase processing time [17,30,31]. Computing the DKT for image or signal slices may still be resource-intensive and time-consuming compared to some recursive or fast transform algorithms, especially for large images or when high-order moments are calculated.

To address the mentioned drawbacks, we propose using a structural approach to develop fast DKT algorithms [33]. Over time, we have refined this method and successfully applied it to create efficient algorithms for discrete trigonometric transforms [34].

The difference in the structural approach to constructing fast algorithms is that the resulting matrix factorization does not rely on the specific properties of the transforms themselves. Instead, it depends solely on the structure of the transformation matrix, specifically the repetition and arrangement of its elements. This enables the structural approach to be used for factorizing matrices of various transforms. The core idea is that, initially, preprocessing of the transform matrices involves permuting rows and columns, as well as changing the signs of some elements within certain rows and columns. Afterward, a correspondence is established between submatrices of the resulting matrix and the templates defined in [33]. These templates are matrix patterns for which the results of factorization are detailed in [33]. Finally, the factorizations of the submatrices are combined to form the factorization of the original transform matrix. Based on this matrix factorization, a fast transform algorithm is then constructed as a data flow graph and subsequently expressed as pseudocode.

Unlike the first strategy of reducing the DKT computational complexity, which cuts down the complexity of calculating the transform matrix, the structural approach decreases the number of arithmetic operations needed to compute the matrix-vector product of the transform. The structural approach involves partitioning not the data but the transform matrix into blocks or layers. This is the key difference between the structural approach and the second strategy for reducing the DKT computational complexity. Finally, unlike the research [32], the structural approach enables the construction of fast DKT algorithms not only for transform matrices of size 4.

In this paper, initially, the DKT matrices for small-sized input sequences were obtained using the definition of the Krawtchouk polynomials via the hypergeometric function [9]. Due to the small size of the input sequences, this step is computationally efficient. It enables the accurate calculation of polynomial values. After that, the factorizations of the DKT matrices were obtained by applying two techniques, specifically, the technique based on the structural approach [33] and the technique using the symmetry property of orthogonal polynomials [35]. The resulting factorizations of the DKT matrices were applied to construct the data flow graphs for the fast DKT algorithms. Next, to reduce the computational complexity, we calculated the number of multiplications and additions required by algorithms based on the structural approach and the symmetry property of the Krawtchouk polynomials. The algorithms with the smaller number of arithmetic operations are positioned as proposed fast DKT algorithms. The main contributions of this research are as follows.

1. After choosing the suitable sizes of the DKT matrices, their factorizations on sparse and diagonal matrices are developed for the input sequence lengths in the range from 3 to 8. The correctness of the obtained factorizations of the DKT matrices was confirmed mathematically and with MATLAB 2024a implementation.

2. The fast DKT algorithms are constructed with the data flow graphs. Each path from the input vertex to the output vertex involves only one multiplication, reducing both processing time and resource usage.

3. The obtained algorithms for small-sized DKT were generalized to the case of long input sequences based on the symmetry property of the Krawtchouk polynomials.

The paper is organized as follows. In Section 1, the problem of reducing the computational complexity of DKT and the research aim are presented. Notations and a mathematical background are introduced in Section 2. The fast DKT algorithms are designed for N in the range from 3 to 8 in Section 3. The results of the research are discussed in Section 4 and Section 5. In Section 6, we provide the conclusions. Based on the data flow graphs of the fast DKT algorithms, in Appendix A we design the pseudocode suitable for software implementation.

2. Short Background

We can express the 1D DKT as follows [17,18,19,20]:

y_{n} = \sum_{x = 0}^{N - 1} f_{x} K_{n} (x; p, N - 1), x, n = 0, 1, \dots, N - 1,

(1)

where

y_{n}

is the output signal after the direct DKT;

K_{n} (x; p, N - 1)

is a kernel of the DKT;

f_{x}

is the input signal; and N is the number of signal samples. The localization parameter p ∈ (0, 1) determines the position and displacement of the Krawtchouk polynomials along the input sequence. In this way, specific signal features can be extracted. If p < 0.5 (p > 0.5), then the polynomials are shifted to the beginning (end) of the input sequence. We assume that p = 0.5 because in this case, the polynomials are centered concerning the input sequence. Moreover, for p = 0.5, the structure of the DKT matrices is suitable for their factorization and for extracting the centered features of the input sequence.

The kernel of DKT is the Krawtchouk polynomial of degree n:

\begin{matrix} K_{n} (x; p, N - 1) = {}_{2}F_{1} & (- n, - x; p, \\ - & N + 1; \frac{1}{p}) \sqrt{{(- 1)}^{n} (- N + 1)_{n} (\binom{N - 1}{x}) \frac{p^{n + x} {(1 - p)}^{N - x - n - 1}}{n!}}, \end{matrix}

(2)

where

x

,

n = 0, 1, \dots, N - 1

. The hypergeometrical function

{}_{2}F_{1} (a, b; c; z)

was defined as

{}_{2}F_{1} (a, b; c; z) = \sum_{k = 0}^{\infty} \frac{{(a)}_{k} {{(b)}_{k} z}^{k}}{{(c)}_{k} k!}

, where

{(a)}_{k}

is the Pochhammer symbol which is defined as

{(a)}_{0}

= 1,

{(a)}_{k}

= a(a + 1)(a + 2)… (a + k − 1), k ≥ 1 [20].

The Krawtchouk polynomials satisfy the orthogonality property, specifically [20]:

\sum_{x = 0}^{N - 1} K_{n} (x; p, N - 1) K_{m} (x; p, N - 1) = δ_{n m}, n, m = 0, 1, \dots, N - 1 .

(3)

where

δ_{n m}

denotes the Kronecker delta, i.e.,

δ_{n m} = 1

if

n = m

and

δ_{n m} = 0

otherwise.

As a consequence of the orthogonality, the inverse Krawtchouk transform was defined as:

f_{x} = \sum_{n = 0}^{N - 1} y_{n} K_{n} (x; p, N - 1), x = 0, 1, \dots, N - 1 .

(4)

In matrix notation, the DKT is defined as follows [23,31]:

Y_{N \times 1} = C_{N} F_{N \times 1},

(5)

where

Y_{N \times 1} = {[y_{0}, y_{1}, \dots, y_{N - 1}]}^{T}

,

F_{N \times 1} = {[f_{0}, f_{1}, \dots, f_{N - 1}]}^{T}

,

C_{N} = [\begin{matrix} K_{0} (0; p, N - 1) & K_{0} (1; p, N - 1) & \dots & K_{0} (N - 1; p, N - 1) \\ K_{1} (0; p, N - 1) & K_{1} (1; p, N - 1) & \dots & K_{1} (N - 1; p, N - 1) \\ \dots & \dots & \dots & \dots \\ K_{N - 1} (0; p, N - 1) & K_{N - 1} (1; p, N - 1) & \dots & K_{N - 1} (N - 1; p, N - 1) \end{matrix}]

.

In this paper, we use the following notations:

$I_{N}$ is an order N identity matrix;
$H_{2}$ is a 2 × 2 Hadamard matrix;
$1_{N \times M}$ is an N × M matrix of ones (a matrix where every element is equal to one);
⮾ is the Kronecker product of two matrices;
⊕ is the direct sum of two matrices;
an empty cell in a matrix means it contains zero;
the multipliers were marked as $s_{k}^{(N)}$ .

3. The DKT Algorithms with Reduced Complexity for Short-Length Input Sequences

3.1. Algorithm for the 3-Point DKT

Let us express the three-point DKT as a matrix-vector product:

Y_{3 \times 1} = C_{3} X_{3 \times 1},

(6)

where

Y_{3 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \end{matrix}]

,

X_{3 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \end{matrix}]

,

C_{3} = [\begin{matrix} a_{3} & b_{3} & a_{3} \\ b_{3} & 0 & - b_{3} \\ a_{3} & - b_{3} & a_{3} \end{matrix}]

with

a_{3}

= 0.5 and

b_{3}

= 0.7071.

To change the order of the columns of the matrix

C_{3}

, the permutation

π_{1} = (\begin{matrix} 1 & 2 & 3 \\ 1 & 3 & 2 \end{matrix})

is introduced. Then the permutation matrix is

P_{3} = [\begin{matrix} 1 \\ 1 \\ 1 \end{matrix}]

, and the matrix

C_{3}

after the permutation is denoted as

C_{3}^{(a)} = [\begin{matrix} a_{3} & a_{3} & b_{3} \\ b_{3} & - b_{3} & 0 \\ a_{3} & a_{3} & - b_{3} \end{matrix}]

.

Based on the structure of the matrix

C_{3}^{(a)}

, we have constructed the factorization of the matrix

C_{3}

. The matrix

C_{3}^{(a)}

contains repeating elements in the first and second rows, and in the first and second columns. Let us sum these elements before multiplying by the corresponding coefficient to reduce the number of multiplications. Then the factorization will include matrices

W_{3}^{(0)} = [\begin{matrix} 1 & 1 \\ 1 & - 1 \\ 1 \end{matrix}]

and

P_{3}

. From the factors, we form a diagonal matrix

D_{3} = diag (1 / 2, s_{0}^{(3)}, s_{1}^{(3)})

with

s_{0}^{(3)} = s_{1}^{(3)} = b_{3}

. To the obtained sums of repeating elements of the first and second columns of matrix

C_{3}^{(a)}

we add the elements of the third column, including matrix

W_{3}^{(1)} = [\begin{matrix} 1 & 1 \\ 1 \\ 1 & - 1 \end{matrix}]

in the factorization. As a result, the following decomposition was obtained:

Y_{3 \times 1} = W_{3}^{(1)} D_{3} W_{3}^{(0)} P_{3} X_{3 \times 1} .

(7)

We have designed the 3-point DKT algorithm by developing a data flow graph, which is shown in Figure 1. Let us note that in the initial DKT matrix

a_{3} = 1 / 2,

and one matrix entry is zero. Then the proposed three-point DKT algorithm reduces the number of multiplications, additions, and shifts from 4 to 2, from 5 to 4, and from 4 to 1, respectively.

3.2. Algorithm for the 4-Point DKT

Let us present the 4-point DKT as a matrix-vector product:

Y_{4 \times 1} = C_{4} X_{4 \times 1},

(8)

where

Y_{4 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \end{matrix}]

,

X_{4 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \end{matrix}]

,

C_{4} = [\begin{matrix} a_{4} & b_{4} & b_{4} & a_{4} \\ b_{4} & a_{4} & - a_{4} & - b_{4} \\ b_{4} & - a_{4} & - a_{4} & b_{4} \\ a_{4} & - b_{4} & b_{4} & - a_{4} \end{matrix}]

with

a_{4}

= 0.3536 and

b_{4}

= 0.6124.

We apply the structural approach [33,34] to factorize the matrix

C_{4}

. In this way, the order of the columns and rows of the matrix

C_{4}

was altered with the permutations

π_{2} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 1 & 2 & 4 & 3 \end{matrix})

and

π_{3} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 1 & 3 & 2 & 4 \end{matrix})

. The permutation matrices are

P_{4}^{(0)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

and

P_{4}^{(1)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

.

It can be noted that the resulting matrix

C_{4}^{(a)} = [\begin{matrix} a_{4} & b_{4} & a_{4} & b_{4} \\ b_{4} & - a_{4} & b_{4} & - a_{4} \\ b_{4} & a_{4} & - b_{4} & - a_{4} \\ a_{4} & - b_{4} & - a_{4} & b_{4} \end{matrix}]

can be represented as

C_{4}^{(a)} = [\begin{matrix} A_{2} & A_{2} \\ B_{2} & - B_{2} \end{matrix}]

where

A_{2} = [\begin{matrix} a_{4} & b_{4} \\ b_{4} & - a_{4} \end{matrix}]

and

B_{2} = [\begin{matrix} b_{4} & a_{4} \\ a_{4} & - b_{4} \end{matrix}]

. The matrices

C_{4}^{(a)}

,

A_{2}

, and

B_{2}

are factorized as follows [33]:

C_{4}^{(a)} = (A_{2} \oplus B_{2}) (H_{2} ⮾ I_{2}), A_{2} = T_{2 \times 3}^{(4)} diag (a_{4} - b_{4}, - a_{4} - b_{4}, b_{4}) T_{3 \times 2}^{(3)}, B_{2} = T_{2 \times 3}^{(4)} diag (b_{4} - a_{4}, - b_{4} - a_{4}, a_{4}) T_{3 \times 2}^{(3)},

(9)

where

T_{2 \times 3}^{(4)}

= [\begin{matrix} 1 & 0 & 1 \\ 0 & 1 & 1 \end{matrix}]

,

T_{3 \times 2}^{(3)} = [\begin{matrix} 1 & 0 \\ 0 & 1 \\ 1 & 1 \end{matrix}]

.

As a result, we have yielded the following factorization of the 4-point DKT matrix:

Y_{4 \times 1} = P_{4}^{(1)} W_{4 \times 6} D_{6} W_{6 \times 4} W_{4} P_{4}^{(0)} X_{4 \times 1},

(10)

where

W_{4} = H_{2} ⮾ I_{2}

,

W_{4 \times 6} = T_{2 \times 3}^{(4)} \oplus T_{2 \times 3}^{(4)}

,

W_{6 \times 4} = T_{3 \times 2}^{(3)} \oplus T_{3 \times 2}^{(3)}

,

D_{6} = diag (s_{0}^{(4)}, s_{1}^{(4)}, s_{2}^{(4)}, s_{3}^{(4)}, s_{4}^{(4)}, s_{5}^{(4)})

,

s_{0}^{(4)} = a_{4} - b_{4}

,

s_{1}^{(4)} = - a_{4} - b_{4}

,

s_{2}^{(4)} = b_{4}

,

s_{3}^{(4)} = b_{4} - a_{4}

,

s_{4}^{(4)} =

- b_{4} - a_{4}

,

s_{5}^{(4)}

=

a_{4}

.

A data flow graph of the proposed four-point DKT algorithm is constructed in Figure 2. Note that the initial 4-point DKT requires 16 multiplications and 12 additions. With the proposed 4-point DKT algorithm, the number of multiplications can be reduced from 16 to 6. The number of additions can be reduced from 12 to 10.

3.3. Algorithm for the 5-Point DKT

Next, we design the algorithm for the 5-point DKT which is expressed as follows:

Y_{5 \times 1} = C_{5} X_{5 \times 1},

(11)

where

Y_{5 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \end{matrix}]

,

X_{5 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \end{matrix}]

,

C_{5} = [\begin{matrix} a_{5} & b_{5} & c_{5} & b_{5} & a_{5} \\ b_{5} & b_{5} & 0 & - b_{5} & - b_{5} \\ c_{5} & 0 & - b_{5} & 0 & c_{5} \\ b_{5} & - b_{5} & 0 & b_{5} & - b_{5} \\ a_{5} & - b_{5} & c_{5} & - b_{5} & a_{5} \end{matrix}]

with

a_{5}

= 0.25;

b_{5}

= 0.5;

c_{5}

= 0.6124.

Let us change the order of columns of the matrix

C_{5}

according to the permutation

π_{4} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 \\ 1 & 5 & 2 & 4 & 3 \end{matrix})

. Then we obtain the matrix

C_{5}^{(a)} = [\begin{matrix} a_{5} & a_{5} & b_{5} & b_{5} & c_{5} \\ b_{5} & - b_{5} & b_{5} & - b_{5} & 0 \\ c_{5} & c_{5} & 0 & 0 & - b_{5} \\ b_{5} & - b_{5} & - b_{5} & b_{5} & 0 \\ a_{5} & a_{5} & - b_{5} & - b_{5} & c_{5} \end{matrix}]

with the permutation matrix

P_{5} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

.

To construct the factorization of the matrix

C_{5}

, we note that after the permutation of

π_{4}

, we have obtained pairs of repeating elements in the first four columns. These elements may differ in sign. We construct the butterfly modules for these columns using the matrix

W_{5} = H_{2} \oplus H_{2} \oplus 1

. In place of the fifth column, we include a unit on the diagonal of the matrix

W_{5}

.

Next, we form a diagonal matrix

D_{7} = diag (1 / 4, s_{0}^{(5)}, 1 / 2, 1 / 2, 1 / 2, 1 / 2, 1 / 2, s_{1}^{(5)})

, where

s_{0}^{(5)} = s_{1}^{(5)} = c_{5}

, from the repeating elements of the columns of the matrix

C_{5}^{(a)}

. The final matrix

W_{5 \times 7} = [\begin{matrix} 1 & 1 & 1 \\ 1 & 1 \\ 1 & - 1 \\ 1 & - 1 \\ 1 & - 1 & 1 \end{matrix}]

included in the factorization connects output variables to linear combinations of input values. Thus, we obtained the following factorization of the 5-point DKT matrix:

Y_{5 \times 1} = W_{5 \times 7} D_{7} W_{7 \times 5} W_{5} P_{5} X_{5 \times 1},

(12)

where

W_{7 \times 5} = 1_{2 \times 1} \oplus I_{3} \oplus 1_{2 \times 1}

.

We show a data flow graph of the proposed 5-point DKT algorithm in Figure 3. The initial 5-point DKT requires 4 multiplications, 16 additions, and 21 shifts because four entries of the transform matrix are equal to zero. We have taken into account that multiplication by

a_{5}

= 0.25 requires two shifts and multiplication by

b_{5}

= 0.5 requires one shift. The developed 5-point DKT algorithm reduces the number of multiplication to 2 because

s_{0}^{(5)} = s_{1}^{(5)} = c_{5}

. The number of additions is decreased from 16 to 12, and 6 shifts are required.

3.4. Algorithm for the 6-Point DKT

Let us obtain the algorithm for the 6-point DKT, which is expressed as follows:

Y_{6 \times 1} = C_{6} X_{6 \times 1},

(13)

where

Y_{6 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \end{matrix}]

,

X_{6 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \end{matrix}]

,

C_{6} = [\begin{matrix} a_{6} & b_{6} & c_{6} & c_{6} & b_{6} & a_{6} \\ b_{6} & d_{6} & e_{6} & - e_{6} & - d_{6} & - b_{6} \\ c_{6} & e_{6} & - f_{6} & - f_{6} & e_{6} & c_{6} \\ c_{6} & - e_{6} & - f_{6} & f_{6} & e_{6} & - c_{6} \\ b_{6} & - d_{6} & e_{6} & e_{6} & - d_{6} & b_{6} \\ a_{6} & - b_{6} & c_{6} & - c_{6} & b_{6} & - a_{6} \end{matrix}]

with

a_{6}

= 0.1768,

b_{6}

= 0.3953,

c_{6}

= 0.5590,

d_{6}

= 0.5303,

e_{6}

= 0.25,

f_{6}

= 0.3536.

The columns and rows of the matrix

C_{6}

are permutated according to the permutations

π_{5}

and

π_{6}

which are defined in the following form:

π_{5} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & 2 & 3 & 6 & 5 & 4 \end{matrix})

and

π_{6} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 \\ 1 & 3 & 5 & 2 & 4 & 6 \end{matrix})

.

Then the matrix

C_{6}

is represented as

C_{6}^{(a)} = [\begin{matrix} a_{6} & b_{6} & c_{6} & a_{6} & b_{6} & c_{6} \\ c_{6} & e_{6} & - f_{6} & c_{6} & e_{6} & - f_{6} \\ b_{6} & - d_{6} & e_{6} & b_{6} & - d_{6} & e_{6} \\ b_{6} & d_{6} & e_{6} & - b_{6} & - d_{6} & - e_{6} \\ c_{6} & - e_{6} & - f_{6} & - c_{6} & e_{6} & f_{6} \\ a_{6} & - b_{6} & c_{6} & - a_{6} & b_{6} & - c_{6} \end{matrix}]

. The matrices

P_{6}^{(0)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

and

P_{6}^{(1)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

are the permutation matrices.

The resulting matrix

C_{6}^{(a)}

can be represented as

C_{6}^{(a)} = [\begin{matrix} A_{3} & A_{3} \\ B_{3} & - B_{3} \end{matrix}]

where

A_{3} = [\begin{matrix} a_{6} & b_{6} & c_{6} \\ c_{6} & e_{6} & - f_{6} \\ b_{6} & - d_{6} & e_{6} \end{matrix}]

and

B_{3} = [\begin{matrix} b_{6} & d_{6} & e_{6} \\ c_{6} & - e_{6} & - f_{6} \\ a_{6} & - b_{6} & c_{6} \end{matrix}]

. We change the order of the columns of the matrix

A_{3}

and obtain the matrix

A_{3}^{(a)} = [\begin{matrix} c_{6} & a_{6} & b_{6} \\ - f_{6} & c_{6} & e_{6} \\ e_{6} & b_{6} & - d_{6} \end{matrix}]

. Next, we swap the second and third columns of the matrix

B_{3}

and yield the matrix

B_{3}^{(a)} = [\begin{matrix} b_{6} & e_{6} & d_{6} \\ c_{6} & - f_{6} & - e_{6} \\ a_{6} & c_{6} & - b_{6} \end{matrix}]

.

Let us extract the submatrices

A_{2}^{(c)} = [\begin{matrix} c_{6} & a_{6} \\ - f_{6} & c_{6} \end{matrix}]

and

B_{2}^{(c)} = [\begin{matrix} c_{6} & - f_{6} \\ a_{6} & c_{6} \end{matrix}]

from the matrices

A_{3}^{(a)}

and

B_{3}^{(a)}

, respectively. Then the matrices

C_{6}^{(a)}

,

A_{2}^{(c)}

, and

B_{2}^{(c)}

are factorized as follows [34]:

C_{6}^{(a)} = (A_{3} \oplus B_{3}) (H_{2} ⮾ I_{3}), A_{2}^{(c)} = T_{2 \times 3}^{(3)} diag (- f_{6} - c_{6}, a_{6} - c_{6}, c_{6}) T_{3 \times 2}^{(3)}, B_{2}^{(c)} = T_{2 \times 3}^{(3)} diag (a_{6} - c_{6}, - f_{6} - c_{6}, c_{6}) T_{3 \times 2}^{(3)},

(14)

where

T_{2 \times 3}^{(3)}

= [\begin{matrix} 0 & 1 & 1 \\ 1 & 0 & 1 \end{matrix}]

.

Let us denote the scaling factors as

s_{0}^{(6)} = s_{8}^{(6)} = - f_{6} - c_{6}

,

s_{1}^{(6)} = s_{7}^{(6)} = a_{6} - c_{6}

,

s_{2}^{(6)} = s_{9}^{(6)} = c_{6}

,

s_{3}^{(6)} = s_{4}^{(6)} = s_{6}^{(6)} = s_{11}^{(6)} = b_{6}

,

s_{5}^{(6)} = s_{10}^{(6)} = d_{6}

, and form the diagonal matrix

D_{16} = diag (1 / 4, s_{0}^{(6)}, s_{1}^{(6)}, s_{2}^{(6)}, s_{3}^{(6)}, s_{4}^{(6)}, 1 / 4, s_{5}^{(6)}, s_{6}^{(6)}, s_{7}^{(6)}, s_{8}^{(6)}, s_{9}^{(6)}, 1 / 4, s_{10}^{(6)}, 1 / 4, s_{11}^{(6)})

. Further, we add matrices from decomposition (14) to the left of the diagonal matrices into the matrix

W_{16 \times 6} = T_{8 \times 3} \oplus T_{8 \times 3}

, and to the right of the diagonal matrices into the matrices

W_{6} = H_{2} ⮾ I_{3}

,

W_{6 \times 16} = T_{3 \times 8}^{(0)} \oplus T_{3 \times 8}^{(1)}

, where

T_{8 \times 3}

,

T_{3 \times 8}^{(0)}

, and

T_{3 \times 8}^{(1)}

are defined as follows:

T_{3 \times 8}^{(0)} = [\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & - 1 \end{matrix}], T_{3 \times 8}^{(1)} = [\begin{matrix} 1 & 1 & 1 \\ 1 & 1 & - 1 \\ 1 & 1 & - 1 \end{matrix}] T_{8 \times 3} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 & 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

(15)

Combining the matrices

W_{6 \times 16}

,

W_{16 \times 6}

,

W_{6}

, and

W_{16 \times 26}

we obtain the matrix factorization for the 6-point DKT:

Y_{6 \times 1} = P_{6}^{(1)} W_{6 \times 16} D_{16} W_{16 \times 6} P_{6}^{(2)} W_{6} P_{6}^{(0)} X_{6 \times 1},

(16)

where

P_{6}^{(2)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

.

We present the data flow graph for the 6-point DKT algorithm in Figure 4. It should be noted that the initial 6-point DKT requires 28 multiplications, 30 additions, and 16 shifts because

e_{6}

= 0.25, and we have 8 such entries in the 6-point DKT matrix

C_{6}

. The developed 6-point DKT algorithm reduces the number of multiplications to 12. The number of additions has decreased from 30 to 20, and 8 shifts are required.

3.5. Algorithm for the 7-Point DKT

Let us construct the algorithm for 7-point DKT based on the following expression:

Y_{7 \times 1} = C_{7} X_{7 \times 1},

(17)

where

Y_{7 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \\ y_{6} \end{matrix}]

,

X_{7 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \end{matrix}]

,

C_{7} = [\begin{matrix} a_{7} & b_{7} & c_{7} & d_{7} & c_{7} & b_{7} & a_{7} \\ b_{7} & f_{7} & g_{7} & 0 & - g_{7} & - f_{7} & - b_{7} \\ c_{7} & g_{7} & - a_{7} & - h_{7} & - a_{7} & g_{7} & c_{7} \\ d_{7} & 0 & - h_{7} & 0 & h_{7} & 0 & - d_{7} \\ c_{7} & - g_{7} & - a_{7} & h_{7} & - a_{7} & - g_{7} & c_{7} \\ b_{7} & - f_{7} & g_{7} & 0 & - g_{7} & f_{7} & - b_{7} \\ a_{7} & - b_{7} & c_{7} & - d_{7} & c_{7} & - b_{7} & a_{7} \end{matrix}]

with

a_{7}

= 0.1250,

b_{7}

= 0.3062,

c_{7}

= 0.4841,

d_{7}

= 0.5590,

f_{7}

= 0.5,

g_{7}

= 0.3953,

h_{7}

= 0.4330. We change the order of columns of the matrix

C_{7}

with the permutation

π_{7} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ 1 & 7 & 2 & 6 & 3 & 5 & 4 \end{matrix})

. As a result, the matrix

C_{7}^{(a)}

and the permutation matrix

P_{7}

are obtained:

C_{7}^{(a)} = [\begin{matrix} a_{7} & a_{7} & b_{7} & b_{7} & c_{7} & c_{7} & d_{7} \\ b_{7} & - b_{7} & f_{7} & - f_{7} & g_{7} & - g_{7} & 0 \\ c_{7} & c_{7} & g_{7} & g_{7} & - a_{7} & - a_{7} & - h_{7} \\ d_{7} & - d_{7} & 0 & 0 & - h_{7} & h_{7} & 0 \\ c_{7} & c_{7} & - g_{7} & - g_{7} & - a_{7} & - a_{7} & h_{7} \\ b_{7} & - b_{7} & - f_{7} & f_{7} & g_{7} & - g_{7} & 0 \\ a_{7} & a_{7} & - b_{7} & - b_{7} & c_{7} & c_{7} & - d_{7} \end{matrix}], P_{7} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

The factorization of the matrix

C_{7}^{(a)}

was introduced similarly to the factorization of the matrix

C_{5}

[35]. This matrix contains pairs of repeating elements in adjacent columns, and these elements may differ in sign. This structure allows us to extract the butterfly modules using the matrix

W_{7} = H_{2} \oplus H_{2} \oplus H_{2} \oplus 1

. The values of the repeating elements were entered into a diagonal matrix

D_{13} = diag (s_{0}^{(7)}, s_{1}^{(7)}, s_{2}^{(7)}, s_{3}^{(7)}, s_{4}^{(7)}, s_{5}^{(7)}, 0.5, s_{6}^{(7)}, \dots, s_{11}^{(7)})

, where

s_{0}^{(7)} = s_{7}^{(7)} = a_{7}

,

s_{1}^{(7)} = s_{6}^{(7)} = c_{7}

,

s_{2}^{(7)} = s_{4}^{(7)} = b_{7}

,

s_{3}^{(7)} = s_{10}^{(7)} = d_{7}

,

s_{5}^{(7)} = s_{8}^{(7)} = g_{7}

,

s_{9}^{(7)} = s_{11}^{(7)} = h_{7}

. This matrix was linked to the butterfly modules using the matrix

W_{13 \times 7} = 1_{2 \times 1} \oplus 1_{2 \times 1} \oplus 1_{2 \times 1} \oplus 1 \oplus 1_{2 \times 1} \oplus 1_{2 \times 1} \oplus 1_{2 \times 1}

. To calculate the output variables, the matrix

W_{7 \times 13} = [\begin{matrix} 1 & 1 & 1 & 1 \\ 1 & 1 & 1 \\ 1 & 1 & - 1 & - 1 \\ 1 & - 1 \\ 1 & - 1 & - 1 & 1 \\ 1 & - 1 & 1 \\ 1 & - 1 & 1 & - 1 \end{matrix}]

was formed. As a result, we have the factorization:

Y_{7 \times 1} = W_{7 \times 13} D_{13} W_{13 \times 7} W_{7} P_{7} X_{7 \times 1} .

(18)

The calculation of the 7-point DKT with the matrix-vector product requires 40 multiplications, 37 additions, and 4 shifts because five elements of the matrix

C_{7}

are zeros. Four elements of this matrix are equal to 0.5, and multiplication by these elements can be implemented using shifts.

The repeated elements of the matrix

C_{7}^{(a)}

allow the designing of the data flow graph for the 7-point DKT algorithm presented in Figure 5. This algorithm reduces the number of multiplications from 40 to 12. The number of additions and shifts can be decreased from 37 to 23 and from 4 to 1, respectively.

3.6. Algorithm for the 8-Point DKT

To design the algorithm for 8-point DKT we express this transform as a matrix-vector product:

Y_{8 \times 1} = C_{8} X_{8 \times 1},

(19)

where

Y_{8 \times 1} = [\begin{matrix} y_{0} \\ y_{1} \\ y_{2} \\ y_{3} \\ y_{4} \\ y_{5} \\ y_{6} \\ y_{7} \end{matrix}]

,

X_{8 \times 1} = [\begin{matrix} x_{0} \\ x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5} \\ x_{6} \\ x_{7} \end{matrix}]

,

C_{8} = [\begin{matrix} a_{8} & b_{8} & c_{8} & d_{8} & d_{8} & c_{8} & b_{8} & a_{8} \\ b_{8} & e_{8} & f_{8} & g_{8} & - g_{8} & - f_{8} & - e_{8} & {- b}_{8} \\ c_{8} & f_{8} & a_{8} & - h_{8} & - h_{8} & a_{8} & f_{8} & c_{8} \\ d_{8} & g_{8} & - h_{8} & - l_{8} & l_{8} & h_{8} & - g_{8} & - d_{8} \\ d_{8} & - g_{8} & - h_{8} & l_{8} & l_{8} & - h_{8} & - g_{8} & d_{8} \\ c_{8} & - f_{8} & a_{8} & h_{8} & - h_{8} & - a_{8} & f_{8} & - c_{8} \\ b_{8} & - e_{8} & f_{8} & - g_{8} & - g_{8} & f_{8} & - e_{8} & b_{8} \\ a_{8} & - b_{8} & c_{8} & - d_{8} & d_{8} & - c_{8} & b_{8} & - a_{8} \end{matrix}]

with

a_{8}

= 0.0884,

b_{8}

= 0.2339,

c_{8}

= 0.4050,

d_{8}

= 0.5229,

e_{8}

= 0.4419,

f_{8}

= 0.4593,

g_{8}

= 0.1976,

h_{8}

= 0.3423,

l_{8}

= 0.2652.

To alter the order of rows and columns of the matrix

C_{8}

we define the permutation

π_{8} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 8 & 2 & 7 & 3 & 6 & 4 & 5 \end{matrix})

and

π_{9} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 \\ 1 & 2 & 3 & 4 & 8 & 7 & 6 & 5 \end{matrix})

. As a result, we obtain the permutation matrices

P_{8}^{(1)}

and

P_{8}^{(0)}

:

P_{8}^{(1)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}]; P_{8}^{(0)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \\ 1 \end{matrix}] .

The resulting matrix

C_{8}^{(a)}

matches the template

[\begin{matrix} A_{4} & A_{4} \\ B_{4} & - B_{4} \end{matrix}]

which is factorized as follows [33]:

C_{8}^{(a)} = (A_{4} \oplus B_{4}) (H_{2} ⮾ I_{4}),

(20)

where

A_{4} = [\begin{matrix} a_{8} & b_{8} & c_{8} & d_{8} \\ c_{8} & f_{8} & a_{8} & - h_{8} \\ d_{8} & - g_{8} & - h_{8} & l_{8} \\ b_{8} & - e_{8} & f_{8} & - g_{8} \end{matrix}]

and

B_{4} = [\begin{matrix} a_{8} & - b_{8} & c_{8} & - d_{8} \\ c_{8} & - f_{8} & a_{8} & h_{8} \\ d_{8} & g_{8} & - h_{8} & - l_{8} \\ b_{8} & e_{8} & f_{8} & g_{8} \end{matrix}]

.

Let us consider the matrix

A_{4}

which matches the template

[\begin{matrix} A_{2}^{(a)} & B_{2}^{(a)} \\ C_{2} & D_{2} \end{matrix}]

after the permutation of columns with the matrix

P_{4}^{(a)} = [\begin{matrix} 1 \\ 1 \\ 1 \\ 1 \end{matrix}]

. The obtained matrix

A_{4}^{(a)}

is decomposed as [33]:

A_{4}^{(a)} = (T_{2 \times 4}^{(2)} ⮾ I_{2}) (A_{2}^{(a)} \oplus B_{2}^{(a)} \oplus C_{2} \oplus D_{2}) (T_{4 \times 2}^{(2)} ⮾ I_{2}),

(21)

where

T_{2 \times 4}^{(2)} = [\begin{matrix} 1 & 1 & 0 & 0 \\ 0 & 0 & 1 & 1 \end{matrix}]

,

T_{4 \times 2}^{(2)} = {[\begin{matrix} 1 & 0 & 1 & 0 \\ 0 & 1 & 0 & 1 \end{matrix}]}^{T}

,

A_{2}^{(a)} = [\begin{matrix} a_{8} & c_{8} \\ c_{8} & a_{8} \end{matrix}]

,

B_{2}^{(a)} = [\begin{matrix} b_{8} & d_{8} \\ f_{8} & - h_{8} \end{matrix}]

,

C_{2} = [\begin{matrix} d_{8} & - h_{8} \\ b_{8} & f_{8} \end{matrix}]

, and

D_{2} = [\begin{matrix} - g_{8} & l_{8} \\ e_{8} & - g_{8} \end{matrix}]

. The matrix

A_{2}^{(a)}

,

B_{2}^{(a)}

,

C_{2}

, and

D_{2}

can be represented as follows [33]:

A_{2}^{(a)} = H_{2} diag (a_{8} + c_{8}, a_{8} - c_{8}) H_{2} / 2, B_{2}^{(a)} = T_{2 \times 4}^{(2)} diag (b_{8}, d_{8}, f_{8}, - h_{8}) T_{4 \times 2}^{(2)}, C_{2} = T_{2 \times 4}^{(2)} diag (d_{8}, - h_{8}, b_{8}, f_{8}) T_{4 \times 2}^{(2)}, D_{2} = T_{2 \times 3}^{(3)} diag (e_{8} + g_{8}, l_{8} + g_{8}, - g_{8}) T_{3 \times 2}^{(3)} .

(22)

We alter the order of the columns of the matrix

B_{4}

with the permutation

π_{10} = (\begin{matrix} 1 & 2 & 3 & 4 \\ 1 & 3 & 2 & 4 \end{matrix})

. The signs of the entries in the third and fourth columns of the resulting matrix were then changed. The obtained matrix

B_{4}^{(a)} = A_{4}^{(a)}

is also factorized with Equation (22).

Let us add the matrices from decomposition (21) to the expressions

W_{8 \times 16} = (T_{2 \times 4}^{(2)} ⮾ I_{2}) \oplus (T_{2 \times 4}^{(2)} ⮾ I_{2})

and

W_{16 \times 8} = (T_{4 \times 2}^{(1)} ⮾ I_{2}) \oplus (T_{4 \times 2}^{(1)} ⮾ I_{2})

. After applying the decomposition (22) to the matrices

A_{4}^{(a)}

and

B_{4}^{(a)}

we join matrices to the left of the diagonal matrix into the matrix

W_{16 \times 26} = H_{2} \oplus T_{2 \times 4}^{(2)} \oplus T_{2 \times 4}^{(2)} \oplus T_{2 \times 3}^{(3)} \oplus H_{2} \oplus T_{2 \times 4}^{(2)} \oplus T_{2 \times 4}^{(2)} \oplus T_{2 \times 3}^{(3)}

. Also, we add matrices to the right of the diagonal matrix into the matrix

W_{26 \times 16} = H_{2} \oplus T_{4 \times 2}^{(2)} \oplus T_{4 \times 2}^{(2)} \oplus T_{3 \times 2}^{(3)} \oplus H_{2} \oplus T_{4 \times 2}^{(2)} \oplus T_{4 \times 2}^{(2)} \oplus T_{3 \times 2}^{(3)}

.

Combining the matrices

W_{8 \times 16}

,

W_{16 \times 8}

,

W_{26 \times 16}

, and

W_{16 \times 26}

we obtain the matrix factorization for the 8-point DKT:

Y_{8 \times 1} = P_{8}^{(1)} W_{8 \times 16} W_{16 \times 26} D_{26} W_{26 \times 16} W_{16 \times 8} P_{8}^{(2)} W_{8} P_{8}^{(0)} X_{8 \times 1},

(23)

where

W_{8} = H_{2} ⮾ I_{4}

,

D_{26} = diag (s_{0}^{(8)}, s_{1}^{(8)}, \dots, s_{25}^{(8)})

,

s_{0}^{(8)} = s_{13}^{(8)} = (a_{8} + c_{8}) / 2

,

s_{1}^{(8)} = s_{14}^{(8)} = (a_{8} - c_{8}) / 2

,

s_{2}^{(8)} = s_{8}^{(8)} = s_{15}^{(8)} = s_{21}^{(8)} = b_{8}

,

s_{3}^{(8)} = s_{6}^{(8)} = s_{16}^{(8)} = s_{19}^{(8)} = d_{8}

,

s_{4}^{(8)} = s_{9}^{(8)} = s_{17}^{(8)} = s_{22}^{(8)} = f_{8}

,

s_{5}^{(8)} = s_{7}^{(8)} = s_{18}^{(8)} = s_{20}^{(8)} = - h_{8}

,

s_{10}^{(8)} = s_{23}^{(8)} = g_{8} - e_{8}

,

s_{11}^{(8)} = s_{24}^{(8)} = l_{8} + g_{8}

,

s_{12}^{(6)} = s_{25}^{(6)} = - g_{8}

. The additional permutation matrix

P_{8}^{(2)}

is introduced as a direct sum of the two matrices:

P_{4}^{(a)}

and

P_{4}^{(b)} = [\begin{matrix} 1 \\ 1 \\ - 1 \\ - 1 \end{matrix}]

.

Based on the obtained factorization of the matrix,

C_{8}

the algorithm of the 8-point DKT calculation is proposed. The data flow graph of this algorithm is presented in Figure 6. The computing of the 8-point DKT with the proposed algorithm can reduce the required number of multiplications from 64 to 26. The number of additions decreased from 56 to 38.

3.7. Generalization of the Proposed Algorithms

Considering the symmetry of orthogonal Krawtchouk polynomials, we propose a generalization of the developed DKT algorithms for short-length input sequences when N is greater than 8. Taking into account Equation (5), we consider the cases of odd and even lengths of input signals separately to obtain the fast N-point DKT algorithms.

Let us suppose that N is even and define the permutation of columns of the DKT matrix

C_{N}

similarly with [35] as:

π_{N} = (\begin{matrix} 1 & 2 & 3 & 4 & 5 & 6 & \dots & N - 1 & N \\ 1 & N & 2 & N - 1 & 3 & N - 2 & \dots & N / 2 & N / 2 + 1 \end{matrix}) .

(24)

The corresponding permutation matrix we denote as

P_{N}

. Next, the inputs after the permutations are pairwise combined into butterfly modules by multiplying by the matrix

W_{N} = H_{2} \oplus H_{2} \oplus \dots \oplus H_{2}

where

H_{2}

is included in the direct sum N/2 times. Further, the outputs of each butterfly module are transmitted to a layer of multipliers using the matrix

W_{(N^{2} / 2) \times N} = 1_{N / 2 \times 1} \oplus 1_{N / 2 \times 1} \oplus \dots \oplus 1_{N / 2 \times 1}

where

1_{N / 2 \times 1}

is included in the direct sum N times. The layer of multipliers is presented by a diagonal matrix

D_{N^{2} / 2}

. We include into the diagonal matrix

D_{N^{2} / 2}

the elements of the original DKT matrix

C_{N}

located in the first N rows of this matrix and the first N/2 columns:

D_{N^{2} / 2} = diag (K_{0} (0; p, N - 1), K_{1} (0; p, N - 1), \dots, K_{N - 1} (0; p, N - 1), K_{0} (1; p, N - 1), K_{1} (1; p, N - 1), \dots, K_{N - 1} (1; p, N - 1), \dots, K_{0} (N / 2; p, N - 1), K_{1} (N / 2; p, N - 1), \dots, K_{N - 1} (N / 2; p, N - 1)) .

(25)

In sum, the output variables of the N-point DKT were obtained as follows:

Y_{N \times 1} = W_{N \times (N^{2} / 2)} D_{N^{2} / 2} W_{(N^{2} / 2) \times N} W_{N} P_{N} X_{N \times 1} .

(26)

The matrix

W_{N \times (N^{2} / 2)}

in the factorization (26) establishes a link between the output variables and linear combinations of the input values.

Let us consider an odd N. In this case the order of the columns of the DKT matrix

C_{N}

is changed using the permutation

π_{N}^{(0)} = (\begin{matrix} 1 & 2 & 3 & 4 & \dots & N - 2 & N - 1 & N \\ 1 & N & 2 & N - 1 & \dots & [N / 2] & [N / 2] + 2 & [N / 2] + 1 \end{matrix}),

(27)

where

[N / 2]

denotes the floor of

N / 2

. The symmetry of Krawtchouk polynomials is considered once again. By (27) we introduce the permutation matrix

P_{N}^{(0)}

. As in the previous case, we pairwise combine the inputs after the permutations into butterfly modules by multiplying by the matrix

W_{N}^{(0)} = H_{2} \oplus H_{2} \oplus \dots \oplus H_{2} \oplus 1

where

H_{2}

is included in the direct sum

[N / 2]

times.

Next, we consider two cases of value

[N / 2] + 1

. The first case is that

[N / 2] + 1

is an even number. Then N = 7, 11, 15, …, and the outputs of each butterfly module are transmitted to a layer of multipliers using the matrix

W_{([N / 2] (N - 1) + 1) \times N} = 1_{n \times 1} \oplus 1_{n \times 1} \oplus \dots \oplus 1_{n \times 1} \oplus 1_{([N / 2] + 1) \times 1}

where n = (N + 1)/4, and

1_{n \times 1}

is included in the direct sum

(N - 1)

times. The second case is that

[N / 2] + 1

is an odd number. Then N = 9, 13, 17, …, and the outputs of each butterfly module are transmitted to a layer of multipliers using the matrix

W_{([N / 2] (N - 1) + 1) \times N} = 1_{m \times 1} \oplus 1_{l \times 1} \oplus 1_{m \times 1} \oplus 1_{l \times 1} \oplus \dots \oplus 1_{([N / 2] + 1) \times 1} \oplus 1

where m = (N + 3)/4, l = (N

-

1)/4,

1_{m \times 1} \oplus 1_{l \times 1}

is included in the direct sum

[N / 2] - 1

times.

The layer of multipliers is presented by a diagonal matrix

D_{{([N / 2] + 1)}^{2}}

. In the diagonal matrix

D_{{([N / 2] + 1)}^{2}}

we expand by columns the elements of the original DKT matrix

C_{N}

located in the first

[N / 2] + 1

rows of this matrix and the first

[N / 2] + 1

columns:

D_{{([N / 2] + 1)}^{2}} = diag (K_{0} (0; p, N - 1), K_{1} (0; p, N - 1), \dots, K_{[N / 2] + 1} (0; p, N - 1), K_{0} (1; p, N - 1), K_{1} (1; p, N - 1), \dots, K_{[N / 2] + 1} (1; p, N - 1), \dots, K_{0} ([N / 2] + 1; p, N - 1), K_{1} ([N / 2] + 1; p, N - 1), \dots, K_{[N / 2] + 1} ([N / 2] + 1; p, N - 1)) .

(28)

As a result, the output variables of the N-point DKT were obtained as follows:

Y_{N \times 1} = W_{N \times ({([N / 2] + 1)}^{2})} D_{{([N / 2] + 1)}^{2}} W_{({([N / 2] + 1)}^{2}) \times N} W_{N}^{(0)} P_{N}^{(0)} X_{N \times 1},

(29)

where

W_{N \times ({([N / 2] + 1)}^{2})}

is the matrix that was included in the factorization to establish a correspondence between the outputs and linear combinations of the values of the inputs. The matrix

W_{N \times ({([N / 2] + 1)}^{2})}

is sparse and has a complex structure similar to that of the matrices

W_{7 \times 13}

and

W_{5 \times 7}

. One can see that entries equal to one or minus one form the diamond structure in such matrices.

We note that when constructing the generalized algorithm, it was not taken into account that there may be zeros among the entries of the diagonal matrix. Then the corresponding paths on the data flow graph must be deleted. As a result, the resulting graph will have a simpler structure.

4. Results

Each proposed algorithm has been implemented in the MATLAB environment, and the correctness of the algorithm is verified. The research was performed using an Intel Core i5-7400 processor (Intel, Santa Clara, CA, USA), 3 GHz CPU, 16 GB memory, Windows 10 operating system, 64-bit. To test the designed algorithms, we compare the number of arithmetic operations for computing the DKT with direct matrix-vector products and with the developed algorithms. Initially, the DKT matrices were obtained by applying Equations (6), (8), (11), (13), (17) and (19) for N ranging from 3 to 8. Next, the DKT matrix factorizations were calculated with the expressions (7), (10), (12), (16), (18), and (23). We determined the correctness of the proposed algorithms, establishing the coincidence of the entries of DKT matrices and the entries of the products of the matrices included in the factorizations of the DKT matrices for the same N.

Additionally, we evaluated the number of arithmetic operations involved in the designed algorithms. The results are presented in Table 1. The percentage reduction in the number of operations is indicated in parentheses, compared to the direct method. It can be seen from Table 1 that for values of N ranging from 3 to 8, the number of multiplications, additions, and shifts is reduced by an average of 58%, 27%, and 68%, respectively.

5. Discussion of Computational Complexity

In Table 2, we have shown the number of additions, shifts, and multiplications for the DKT algorithms obtained based on the symmetry of Krawtchouk polynomials and with the structural approach. The results in Table 2 revealed that the number of additions is less by 11–28% for algorithms developed based on the symmetry of Krawtchouk polynomials. The algorithms designed with the structural approach required fewer multiplications by 14–25%. Applying the structural approach, we actually replace several multiplications (their number depends on N) with the same number of additions, which are less time-consuming and resource-intensive.

In addition, in Table 2, we provide estimates of the number of arithmetic operations for generalization the presented algorithms for even and odd N. The direct matrix-vector product requires

N^{2}

multiplications and N(N − 1) additions. Then we obtain that the number of multiplications can be reduced in two times.

Memory consumption results for the proposed DKT algorithms are shown in Figure 7. We have calculated the number of memory cells required for the algorithms presented in Section 3 using the designed pseudocodes. These pseudocodes are shown in Appendix A. Our solutions use 34% more memory than the direct matrix-vector product. We averaged this characteristic over the input sizes ranging from 4 to 8.

It should be noted that memory consumption depends heavily on implementation methods, platform, and the designer’s expertise, unlike more objective measures of arithmetic complexity. The constructed DKT algorithms support software implementation, which may vary in memory usage, time delays, and required resources. The same memory cells can be reused on different algorithm stages if possible. Moreover, the developed solutions can be implemented sequentially, in parallel, or in combination, affecting result latency. Thus, the evaluation of the efficiency of memory use is subjective. Consequently, arithmetic complexity remains the most reliable measure of the efficiency of the proposed algorithms since implementation details are not considered.

6. Conclusions

In this article, we have developed algorithms for the DKT with a fixed p equal to 0.5, applying the structural approach. It is supposed that the length N of the input sequence is in the range of 3 to 8. Compared to the direct matrix-vector product, the obtained algorithms reduce the number of multiplications, additions, and shifts by approximately 58%, 27%, and 68%, respectively. We averaged these characteristics over the considered input sizes (from 3 to 8).

The designed algorithms were represented using data flow graphs. One unquestionable benefit of the suggested solutions is that each designed data flow graph’s critical path only has one multiplication. It is well known that additional data processing problems arise due to the doubling of the operand format with each subsequent multiplication if the critical path in the algorithm’s data flow graph contains more than one multiplication. Our algorithms do not have this problem.

We have generalized the obtained algorithms for small-sized DKT to the case of long input sequences. A matrix factorization of each proposed solution is derived based on the symmetry property of the Krawtchouk polynomials. Each such factorization has been constructed using sparse matrices. Based on the matrix factorization for a specific signal length N, the DKT algorithm can be represented by a data flow graph. The number of arithmetic operations of the generalized algorithms was compared to that of the algorithms derived from the structural approach. As a result, the universal and simpler solution may be followed by a little decrease in computational efficiency.

The developed fast DKT algorithms do not directly affect accuracy in application contexts. This is because the proposed algorithms reduce the number of arithmetic operations required to compute the matrix-vector product, but do not alter its values. For example, in image feature extraction with varying image resolutions, accuracy significantly depends on the methods used to calculate the elements of the transform matrix and on data preprocessing. The specific algorithms presented do not impact this accuracy.

However, the proposed algorithms have certain limitations. First, the structural approach is better suited for constructing fast algorithms for short data sequences. Identifying the structure of transform matrices becomes more challenging with longer sequences. To address this, we have developed the generalization of the proposed DKT algorithms in Section 3.7. Second, the efficiency of synthesizing the presented fast algorithms relies heavily on the structural properties of the transform matrices. Specifically, it depends on how effectively these matrices can be transformed to align their block structure with the matrix patterns described in [33]. The symmetry of the Krawtchouk polynomials assumes a specific structure of the DKT matrices. This enabled us to develop a generalization of the fast DKT algorithms for input sequence lengths exceeding 8. Future research may focus on applying the proposed fast DKT algorithms and their generalization in image inpainting in the spectral domain using convolutional neural networks [36]. In quantum watermarking schemes, orthogonal transforms can decompose digital images into basis items or basis images [37]. These basis images can have simple quantum analogs that represent quantum operators describing transitions between quantum states. This property of orthogonal transforms allows watermark embedding in the frequency domain, where watermark bits or quantum watermark information are embedded into transformed components rather than directly into pixels, enhancing security [37].

Another promising area for future research is the development of fast DKT algorithms for long input sequences, constructing a fast orthogonal projection of the DKT using the Kronecker product, which generates large orthogonal matrices from smaller ones [25,26]. This enables the adaptation of DKT algorithms designed for short-length sequences to process long-length sequences. In this way, the computational complexity of the DKT can be significantly reduced, enhancing its efficiency for practical applications [36,37].

Author Contributions

Conceptualization, A.C.; methodology, A.C. and M.P.; software, M.P.; validation, A.C., M.P. and J.P.P.; formal analysis, A.C., M.P. and J.P.P.; investigation, M.P.; writing—original draft preparation, M.P. and A.C.; writing—review and editing, A.C., M.P. and J.P.P.; supervision, A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

The appendix presents the pseudocode of the proposed fast DKT algorithms. These pseudocodes were used in Section 4 to calculate the number of memory cells required for the algorithms presented in Section 3. To save the memory cells required for implementing the constructed algorithms, we reuse variables and specify which variables must be output and in which order. Thus, in Table A1, the pseudocode of the proposed 3-point DKT algorithm is presented. The inputs of the pseudocode are

x_{0}, x_{1},

and

x_{2} .

Because the scaling factors are similar for this algorithm (

s_{0}^{(3)} = s_{1}^{(3)}

), we use only

s_{0}^{(3)}

to construct the pseudocode. We reuse variables; then

p

is an additional variable and the outputs of the pseudocode are

x_{1}, x_{0},

and

x_{2} .

Table A1. The pseudocode for the constructed fast DKT algorithm for N = 3 with variable reuse.

Step 1	Step 2
$p = x_{1} s_{0}^{(3)}$ , $x_{1} = (x_{0} + x_{2}) / 2$ ;	$x_{0} = (x_{0} - x_{2}) s_{0}^{(3)},$ $x_{2} = x_{1} - p, x_{1} = x_{1} + p .$

We present the pseudocode for the developed 4-point DKT algorithm in Table A2. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

and

x_{3} .

We use only scaling factors

s_{0}^{(4)}

,

s_{1}^{(4)}

,

s_{2}^{(4)}

, and

s_{5}^{(4)}

to design the pseudocode. This is due to the following relations between the scaling factors:

s_{0}^{(4)} = - s_{3}^{(4)}

,

s_{1}^{(4)} = s_{4}^{(4)}

. We reuse variables; additional variables are

y_{0}, y_{1}, y_{2},

and

y_{3} .

The outputs of the pseudocode are

x_{0}, x_{1}, x_{2},

and

x_{3} .

Table A2. The pseudocode for the developed fast 4-point DKT algorithm with variable reuse.

Step 1	Step 2	Step 3
$y_{0} = x_{0} + x_{3}, y_{1} = x_{1} + x_{2}$ , $y_{2} = x_{0} - x_{3}$ , $y_{3} = x_{1} - x_{2}$ ;	$x_{0} = y_{1} s_{1}^{(4)},$ $y_{1} = (y_{0} + y_{1}) s_{2}^{(4)}$ , $x_{2} = - y_{2} s_{0}^{(4)},$ $x_{3} = y_{3} s_{1}^{(4)}, y_{3} = (y_{2} + y_{3}) s_{5}^{(4)}$ , $y_{0} = y_{0} s_{0}^{(4)};$	$x_{3} = y_{3} + x_{3},$ $x_{1} = y_{3} + x_{2}$ , $x_{2} = y_{1} + x_{0}, x_{0} = y_{0} + y_{1} .$

The pseudocode for the designed 5-point DKT algorithm is shown in Table A3. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

and

x_{4} .

We use only scaling factor

s_{0}^{(5)}

because

s_{0}^{(5)} = s_{1}^{(5)}

. The outputs of the pseudocode are

x_{0}, x_{1}, x_{2}, x_{3},

and

x_{4}

due to variable reusing. Variables

y_{0}, y_{1}, y_{2},

and

y_{3}

are additional.

Table A3. The pseudocode for the designed 5-point DKT algorithm with variable reuse.

Step 1	Step 2	Step 3
$y_{0} = x_{0} + x_{4}, y_{1} = x_{0} - x_{4}$ , $y_{2} = x_{1} + x_{3}$ , $y_{3} = x_{1} - x_{3}$ ;	$x_{0} = y_{0} / 4 + x_{2} s_{0}^{(5)}$ , $x_{1} = (y_{3} + y_{1}) / 2$ , $x_{2} = y_{0} s_{0}^{(5)} - x_{2} / 2$ , $x_{3} = (y_{1} - y_{3}) / 2$ ;	$x_{4} = x_{0} - y_{2} / 2$ , $x_{0} = x_{0} + y_{2} / 2$ .

In Table A4, the pseudocode of the proposed 6-point DKT algorithm is presented. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

x_{4},

and

x_{5} .

We use only scaling factors

s_{0}^{(6)}

,

s_{1}^{(6)}

,

s_{2}^{(6)}

,

s_{3}^{(6)}

, and

s_{5}^{(6)}

because

s_{0}^{(6)} = s_{8}^{(6)}

,

s_{1}^{(6)} = s_{7}^{(6)}

,

s_{2}^{(6)} = s_{9}^{(6)}

,

s_{3}^{(6)} = s_{4}^{(6)} = s_{6}^{(6)} = s_{11}^{(6)}

,

s_{5}^{(6)} = s_{10}^{(6)}

. We reuse variables. So, the outputs of the pseudocode are

x_{0}, x_{1}, x_{2}, x_{3}, x_{4},

and

x_{5} .

Variables

y_{0}, y_{1}, y_{2}, y_{3}, y_{4},

and

y_{5}

are additional.

Table A4. The pseudocode for the developed 6-point DKT algorithm with variable reuse.

Step 1	Step 2	Step 3
$y_{0} = x_{2} + x_{3},$ $y_{1} = x_{0} + x_{5},$ $y_{2} = x_{1} + x_{4}$ , $y_{3} = x_{0} - x_{5}$ , $y_{4} = x_{2} - x_{3}$ , $y_{5} = x_{1} - x_{4}$ ;	$x_{2} = (y_{0} + y_{1}) s_{2}^{(6)},$ $x_{5} = (y_{3} + y_{4}) s_{2}^{(6)};$	$x_{0} = y_{1} s_{1}^{(6)} + x_{2} + y_{2} s_{3}^{(6)}, x_{1} = y_{3} s_{3}^{(6)} + y_{4} / 4 + y_{5} s_{5}^{(6)},$ $x_{2} = y_{0} s_{0}^{(6)} + x_{2} + y_{2} / 4, x_{3} = y_{4} s_{0}^{(6)} + x_{5} - y_{5} / 4,$ $x_{4} = y_{0} / 4 + y_{1} s_{3}^{(6)} - y_{2} s_{5}^{(6)}, x_{5} = x_{5} + y_{3} s_{1}^{(6)} - y_{5} s_{3}^{(6)} .$

In Table A5, the pseudocode of the constructed 7-point DKT algorithm is shown. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

x_{4},

x_{5},

and

x_{6} .

We use only scaling factors

s_{0}^{(7)}

,

s_{1}^{(7)}

,

s_{2}^{(7)}

,

s_{3}^{(7)}

,

s_{5}^{(7)}

, and

s_{9}^{(7)}

because

s_{0}^{(7)} = s_{7}^{(7)}

,

s_{1}^{(7)} = s_{6}^{(7)}

,

s_{2}^{(7)} = s_{4}^{(7)}

,

s_{3}^{(7)} = s_{10}^{(7)}

,

s_{5}^{(7)} = s_{8}^{(7)}

, and

s_{9}^{(7)} = s_{11}^{(7)}

. We reuse variables. So, the outputs of the pseudocode are

y_{0}, y_{1}, y_{2}, y_{3}, y_{4}, y_{5},

and

x_{0} .

Table A5. The pseudocode for the designed 7-point DKT algorithm with variable reuse.

Step 1	Step 2	Step 3
$y_{0} = x_{0} + x_{6},$ $y_{1} = x_{0} - x_{6},$ $y_{2} = x_{1} + x_{5}$ , $y_{3} = (x_{1} - x_{5}) / 2$ , $y_{4} = x_{2} + x_{4}$ , $y_{5} = x_{2} - x_{4}$ ;	$x_{0} = s_{0}^{(7)} y_{0} + s_{1}^{(7)} y_{4},$ $x_{1} = s_{1}^{(7)} y_{0} - s_{0}^{(7)} y_{4},$ $x_{2} = s_{2}^{(7)} y_{1} + s_{5}^{(7)} y_{5},$ $x_{4} = s_{2}^{(7)} y_{2} + s_{3}^{(7)} x_{3},$ $x_{5} = s_{5}^{(7)} y_{2} - s_{9}^{(7)} x_{3};$	$y_{0} = x_{0} + x_{4}, x_{0} = x_{0} - x_{4},$ $y_{5} = x_{2} - x_{3}, y_{1} = x_{2} + x_{3},$ $y_{2} = x_{1} + x_{5}, y_{4} = x_{1} - x_{5},$ $y_{3} = s_{3}^{(7)} y_{1} - s_{9}^{(7)} y_{5}$ .

In Table A6, the pseudocode of the constructed 8-point DKT algorithm is presented. The inputs of the pseudocode are

x_{0}, x_{1}, x_{2},

x_{3},

x_{4},

x_{5},

x_{6}

, and

x_{7} .

We use scaling factors

s_{0}^{(8)}

,

s_{1}^{(8)}

,

s_{2}^{(8)}

,

s_{3}^{(8)}

,

s_{4}^{(8)}

,

s_{5}^{(8)}

,

s_{10}^{(8)}

,

s_{11}^{(8)}

,

s_{12}^{(8)}

because

s_{0}^{(8)} = s_{13}^{(8)}

,

s_{1}^{(8)} = s_{14}^{(8)}

,

s_{2}^{(8)} = s_{8}^{(8)} = s_{15}^{(8)} = s_{21}^{(8)}

,

s_{3}^{(8)} = s_{6}^{(8)} = s_{16}^{(8)} = s_{19}^{(8)}

,

s_{4}^{(8)} = s_{9}^{(8)} = s_{17}^{(8)} = s_{22}^{(8)}

,

s_{5}^{(8)} = s_{7}^{(8)} = s_{18}^{(8)} = s_{20}^{(8)}

,

s_{10}^{(8)} = s_{23}^{(8)}

,

s_{11}^{(8)} = s_{24}^{(8)}

,

s_{12}^{(6)} = s_{25}^{(6)}

. We reuse variables. So, the outputs of the pseudocode are

y_{0}, y_{1}, y_{2}, y_{3}, y_{4}, y_{5}, y_{6},

and

x_{7} .

Table A6. The pseudocode for the constructed 8-point DKT algorithm with variable reuse.

Step 1	Step 2	Step 3	Step 4	Step 5
$y_{0} = x_{0} + x_{7},$ $y_{1} = x_{6} - x_{1},$ $y_{2} = x_{1} + x_{6}$ , $y_{3} = x_{0} - x_{7}$ , $y_{4} = x_{3} + x_{4}$ , $y_{5} = x_{2} - x_{5},$ $y_{6} = x_{2} + x_{5}$ , $x_{7} = x_{4} - x_{3}$ ;	$x_{0} = s_{0}^{(8)} (y_{0} + y_{6}),$ $x_{1} = s_{1}^{(8)} (y_{0} - y_{6}),$ $x_{2} = s_{2}^{(8)} y_{2} + s_{3}^{(8)} y_{4},$ $x_{3} = s_{4}^{(8)} y_{2} + s_{5}^{(8)} y_{4},$ $x_{4} = s_{3}^{(8)} y_{0} + s_{5}^{(8)} y_{6},$ $x_{5} = s_{2}^{(8)} y_{0} + s_{4}^{(8)} y_{6},$ $x_{6} = s_{12}^{(8)} (y_{2} + y_{4});$	$y_{0} = x_{0} + x_{1} + x_{2},$ $y_{6} = x_{5} + s_{10}^{(8)} y_{2} + x_{6},$ $y_{2} = x_{0} - x_{1} + x_{3},$ $y_{4} = x_{4} + s_{11}^{(6)} y_{4} + x_{6};$	$x_{0} = s_{0}^{(8)} (y_{3} + y_{5}),$ $x_{1} = s_{1}^{(8)} (y_{3} - y_{5}),$ $x_{2} = s_{2}^{(8)} y_{1} + s_{3}^{(8)} x_{7},$ $x_{3} = s_{4}^{(8)} y_{1} + s_{5}^{(8)} x_{7},$ $x_{4} = s_{3}^{(8)} y_{3} + s_{5}^{(8)} y_{5},$ $x_{5} = s_{2}^{(8)} y_{3} + s_{4}^{(8)} y_{5},$ $x_{6} = s_{12}^{(8)} (y_{1} + x_{7});$	$y_{1} = x_{5} + s_{10}^{(8)} y_{1} + x_{6},$ $y_{3} = x_{4} + s_{11}^{(6)} x_{7} + x_{6},$ $x_{7} = x_{0} + x_{1} + x_{2},$ $y_{5} = x_{0} - x_{1} + x_{3} .$

References

Flusser, J.; Suk, T.; Zitová, B. 2D and 3D Image Analysis by Moments; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
Yang, L.; Liu, X.; Hu, Z. Advance and prospect of discrete Tchebichef transform and its application. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020. [Google Scholar]
Mefoued, A.; Kouadria, N.; Harize, S.; Doghmane, N. Improved discrete Tchebichef transform approximations for efficient image compression. J. Real-Time Image Process. 2023, 21, 12. [Google Scholar] [CrossRef]
Ke, W.; Chan, K.-H. A multilayer CARU framework to obtain probability distribution for paragraph-based sentiment analysis. Appl. Sci. 2021, 11, 11344. [Google Scholar] [CrossRef]
Tahiri, M.A.; Amakdouf, H.; El Mallahi, M.; Qjidaa, H. Optimized quaternion radial Hahn moments application to deep learning for the classification of diabetic retinopathy. Multimed Tools Appl. 2023, 82, 46217–46240. [Google Scholar] [CrossRef]
Younsi, M.; Diaf, M.; Siarry, P. Comparative study of orthogonal moments for human postures recognition. Eng. Appl. Artif. Intell. 2023, 120, 105855. [Google Scholar] [CrossRef]
Chiang, A.; Liao, S. Image analysis with Legendre moment descriptors. J. Comput. Sci. 2015, 11, 127–136. [Google Scholar] [CrossRef]
Camacho-Bello, C. Exact Legendre–Fourier moments in improved polar pixels configuration for image analysis. IET Image Process. 2018, 13, 118–124. [Google Scholar] [CrossRef]
Asli, B.H.S.; Flusser, J. Fast computation of Krawtchouk moments. Inf. Sci. 2014, 288, 73–86. [Google Scholar] [CrossRef]
Venkataramana, A.; Raj, P.A. Image watermarking using Krawtchouk moments. In Proceedings of the 2007 International Conference on Computing: Theory and Applications (ICCTA’07), Kolkata, India, 5–7 March 2007; pp. 676–680. [Google Scholar]
Papakostas, G.A.; Tsougenis, E.D.; Koulouriotis, D.E. Near optimum local image watermarking using Krawtchouk moments. In Proceedings of the 2010 IEEE International Conference on Imaging Systems and Techniques, Thessaloniki, Greece, 1–2 July 2010; pp. 464–467. [Google Scholar]
Yamni, M.; Karmouni, H.; Daoui, A.; Sayyouri, M.; Qjidaa, H. Blind image zero-watermarking algorithm based on radial Krawtchouk moments and chaotic system. In Proceedings of the 2020 International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 9–11 June 2020; pp. 1–7. [Google Scholar]
Zhang, L.; Xiao, W.; Qian, G.; Ji, Z. Rotation, scaling, and translation invariant local watermarking technique with Krawtchouk moments. Chin. Opt. Lett. 2007, 5, 21–24. [Google Scholar]
Yap, P.-T.; Paramesran, R.; Ong, S.-H. Image analysis by Krawtchouk moments. IEEE Trans. Image Process. 2003, 12, 1367–1377. [Google Scholar] [CrossRef]
Yap, P.T.; Raveendran, P.; Ong, S.H. Krawtchouk moments as a new set of discrete orthogonal moments for image reconstruction. In Proceedings of the 2002 International Joint Conference on Neural Networks, IJCNN’02 (Cat. No.02CH37290), Honolulu, HI, USA, 12–17 May 2002; Volume 1, pp. 908–912. [Google Scholar]
Karmouni, H.; Jahid, T.; Lakhili, Z.; Hmimid, A.; Sayyouri, M.; Qjidaa, H.; Rezzouk, A. Image reconstruction by Krawtchouk moments via digital filter. In Proceedings of the International Conference on Intelligent Systems and Computer Vision (ISCV), Fez, Morocco, 17–19 April 2017; pp. 1–7. [Google Scholar]
Mesbah, A.; El Mallahi, M.; Lakhili, Z.; Qjidaa, H.; Berrahou, A. Fast and accurate algorithm for 3D local object reconstruction using Krawtchouk moments. In Proceedings of the 2016 5th International Conference on Multimedia Computing and Systems (ICMCS), Marrakech, Morocco, 29 September–1 October 2016; pp. 1–6. [Google Scholar]
Rani, J.S.; Devaraj, D. Face recognition using Krawtchouk moment. Sadhana 2012, 37, 441–460. [Google Scholar] [CrossRef]
Rivero-Castillo, D.; Pijeira, H.; Assunçao, P. Edge detection based on Krawtchouk polynomials. J. Comput. Appl. Math. 2015, 284, 244–250. [Google Scholar] [CrossRef]
Yamni, M.; Daoui, A.; Pławiak, P.; Mao, H.; Alfarraj, O.; El-Latif, A.A.A. A novel 3D reversible data hiding scheme based on integer–reversible Krawtchouk transform for IoMT. Sensors 2023, 23, 7914. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Yao, X.-R.; Zhao, Q.; Liu, S.; Liu, X.-F.; Wang, C.; Zhai, G.-J. Single-pixel compressive imaging based on the transformation of discrete orthogonal Krawtchouk moments. Opt. Express 2019, 27, 29838–29853. [Google Scholar] [CrossRef] [PubMed]
Zhao, W.; Gao, L.; Zhai, A.; Wang, D. Comparison of common algorithms for single-pixel imaging via compressed sensing. Sensors 2023, 23, 4678. [Google Scholar] [CrossRef]
Honarvar Shakibaei Asli, B.; Horri Rezaei, M. Four-term recurrence for fast Krawtchouk moments using Clenshaw algorithm. Electronics 2023, 12, 1834. [Google Scholar] [CrossRef]
Abdulhussain, S.H.; Ramli, A.R.; Al-Haddad, S.A.R.; Mahmmod, B.M.; Jassim, W.A. Fast recursive computation of Krawtchouk polynomials. J. Math. Imaging Vis. 2018, 60, 285–303. [Google Scholar] [CrossRef]
Zhang, X.; Yu, F.X.; Guo, R.; Kumar, S.; Wang, S.; Chang, S.-F. Fast orthogonal projection based on Kronecker product. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 2929–2937. [Google Scholar]
Majorkowska-Mech, D.; Cariow, A. Discrete pseudo-fractional Fourier transform and its fast algorithm. Electronics 2021, 10, 2145. [Google Scholar] [CrossRef]
Mahmmod, B.M.; Abdul-Hadi, A.M.; Abdulhussain, S.H.; Hussien, A. On computational aspects of Krawtchouk polynomials for high orders. J. Imaging 2020, 6, 81. [Google Scholar] [CrossRef]
Al-Utaibi, K.A.; Abdulhussain, S.H.; Mahmmod, B.M.; Naser, M.A.; Alsabah, M.; Sait, S.M. Reliable recurrence algorithm for high-order Krawtchouk polynomials. Entropy 2021, 23, 1162. [Google Scholar] [CrossRef]
Venkataramana, A.; Raj, P.A. Recursive computation of forward Krawtchouk moment transform using Clenshaw’s recurrence formula. In Proceedings of the 2011 Third National Conference on Computer Vision, Pattern Recognition, Image Processing and Graphics, Hubli, India, 15–17 December 2011; pp. 200–203. [Google Scholar]
Karampasis, N.D.; Spiliotis, I.M.; Boutalis, Y.S. Real-time computation of Krawtchouk moments on gray images using block representation. SN Comput. Sci. 2021, 2, 124. [Google Scholar] [CrossRef]
Mesbah, A.; El Mallahi, M.; El Fadili, H.; Zenkouar, K.; Berrahou, A.; Qjidaa, H. An algorithm for fast computation of 3D Krawtchouk moments for volumetric image reconstruction. In Proceedings of the Mediterranean Conference on Information & Communication Technologies 2015 (MedCT 2015), Saïdia, Morocco, 7–9 May 2015; El Oualkadi, A., Choubani, F., El Moussati, A., Eds.; Lecture Notes in Electrical Engineering. Springer: Cham, Switzerland, 2015; Volume 380. [Google Scholar]
Wu, J.S.; Yang, C.F.; Shu, H.Z.; Wang, L.; Senhadji, L. A fast algorithm for the 4x4 discrete Krawtchouk transform. In Proceedings of the 2014 International Symposium on Information Technology (ISIT 2014), Dalian, China, 14–16 October 2014; pp. 23–28. [Google Scholar]
Andreatto, B.; Cariow, A. Automatic generation of fast algorithms for matrix-vector multiplication. Int. J. Comput. Math. 2017, 95, 626–644. [Google Scholar] [CrossRef]
Polyakova, M.; Witenberg, A.; Cariow, A. The fast type-IV discrete sine transform algorithms for short-length input sequences. Bull. Pol. Acad. Sci. Tech. Sci. 2025, 73, e153827. [Google Scholar] [CrossRef]
Cariow, A.; Polyakova, M. The fast discrete Tchebichef transform algorithms for short-length input sequences. Signals 2025, 6, 23. [Google Scholar] [CrossRef]
Kolodochka, D.; Polyakova, M.; Rogachko, V. Prediction the accuracy of image inpainting using texture descriptors. Radio Electron. Comput. Sci. Manag. 2025, 2, 56–67. [Google Scholar] [CrossRef]
Xing, Z.; Lam, C.-T.; Yuan, X.; Im, S.-K.; Machado, P. MMQW: Multi-Modal Quantum Watermarking Scheme. IEEE Trans. Inf. Forensics Secur. 2024, 19, 5181–5195. [Google Scholar] [CrossRef]

Figure 1. The data flow graph of the three-point DKT algorithm.

Figure 2. The data flow graph of the 4-point DKT algorithm.

Figure 3. The data flow graph of the 5-point DKT algorithm.

Figure 4. The data flow graph of the six-point DKT algorithm.

Figure 5. The data flow graph of the 7-point DKT algorithm.

Figure 6. The data flow graph of the 8-point DKT algorithm.

Figure 7. Diagram of the memory consumption for the DKT, which is implemented via a direct matrix-vector product (blue line) and the proposed algorithms (red line).

Table 1. The number of additions, multiplications, and shifts in the constructed DKT algorithms and the direct matrix-vector products.

N	Direct Method			Proposed Algorithms
N	Adds.	Mults.	Shifts	Adds.	Mults.	Shifts
3	5	4	4	4 (−20%)	2 (−50%)	1 (−75%)
4	12	16	−	10 (−17%)	6 (−63%)	−
5	16	4	21	12 (−25%)	2 (−50%)	6 (−71%)
6	30	28	16	20 (−33%)	12 (−57%)	8 (−50%)
7	37	40	4	23 (−38%)	12 (−70%)	1 (−75%)
8	56	64	−	38 (−32%)	26 (−59%)	−

Table 2. The number of additions, multiplications, and shifts in the constructed DKT algorithms based on a structural approach and taking into account the symmetry of Krawtchouk polynomials.

N	Symmetry of Krawtchouk Polynomials			Structural Approach
N	Adds.	Mults.	Shifts	Adds.	Mults.	Shifts
4	8	8	−	10 (+25%)	6 (−25%)	−
5	12	2	6	12 (0%)	2 (0%)	6 (0%)
6	18	14	8	20 (+11%)	12 (−14%)	8 (0%)
7	18	15	1	23 (+28%)	12 (−20%)	1 (0%)
8	32	32	−	38 (+19%)	26 (−19%)	−
even N	$N (N / 2 + 1)$	$N^{2} / 2$	−	−	−	−
odd N	$N ([N / 2] + 2) - 1$	${([N / 2] + 1)}^{2}$	−	−	−	−

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Polyakova, M.; Cariow, A.; Papliński, J.P. Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences. Electronics 2025, 14, 3958. https://doi.org/10.3390/electronics14193958

AMA Style

Polyakova M, Cariow A, Papliński JP. Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences. Electronics. 2025; 14(19):3958. https://doi.org/10.3390/electronics14193958

Chicago/Turabian Style

Polyakova, Marina, Aleksandr Cariow, and Janusz P. Papliński. 2025. "Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences" Electronics 14, no. 19: 3958. https://doi.org/10.3390/electronics14193958

APA Style

Polyakova, M., Cariow, A., & Papliński, J. P. (2025). Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences. Electronics, 14(19), 3958. https://doi.org/10.3390/electronics14193958

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast Discrete Krawtchouk Transform Algorithms for Short-Length Input Sequences

Abstract

1. Introduction

1.1. State-of-the-Art of the Problem

1.2. The Main Contributions of the Paper

2. Short Background

3. The DKT Algorithms with Reduced Complexity for Short-Length Input Sequences

3.1. Algorithm for the 3-Point DKT

3.2. Algorithm for the 4-Point DKT

3.3. Algorithm for the 5-Point DKT

3.4. Algorithm for the 6-Point DKT

3.5. Algorithm for the 7-Point DKT

3.6. Algorithm for the 8-Point DKT

3.7. Generalization of the Proposed Algorithms

4. Results

5. Discussion of Computational Complexity

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI