Efﬁcient Algorithms and Architectures for DSP Applications

In the new era of digital revolution, the digital sensors and embedded designs become cheaper and more present [...]


Introduction
In the new era of digital revolution, the digital sensors and embedded designs become cheaper and more present.As a result, the amount of available digital data becomes higher and higher, while the requirements for efficient implementations are more and more challenging, especially for real-time applications.Thus, the optimization and efficient implementation of the DSP algorithms and architectures are more and more important.We can say that they represent an essential part of the research in many modern applications, e.g., multimedia, big data, IoT, etc.
For the real-time implementations of modern DSP applications, the efficient optimization of such architectures or software implementations is sometimes a critical and challenging issue.For example, real-time multimedia applications have increasingly more performance requirements due to data processing and transmission of huge data volumes at high speeds, with resource constraints specific to portable devices.
This Special Issue focuses on papers that demonstrate how these design challenges can be overcome using some innovative solutions.Thus, we concentrated on a dedicated theme, namely "Efficient Algorithms and Architectures for DSP Applications", which tried to attract studies on innovative, high-value, efficient and robust solutions, including (but not limited to) architecture design, framework designs, VLSI signal processing, performance evaluation, low-power circuits and systems for DSP, tensor-based signal processing, and decomposition-based adaptive algorithms.
The current Special Issue contains 11 papers, which will be briefly presented in the following section.We would like to thank all the authors for their insightful contributions on important topics related to the DSP domain.

Short Presentation of the Papers
In [1], Chiper and Cotorobai try to resolve one challenging problem in designing VLSI chips for embedded designs used in DSP applications, namely, an efficient incorporation of the security techniques, while maintaining the high performances of these chips.They propose a new approach for designing a unified VLSI architecture for discrete sine and cosine transforms of type-IV (DCT/DST IV), which allows an efficient incorporation of the obfuscation technique.The proposed approach uses a new VLSI algorithm that involves modular and regular computational structures (called quasi-correlated structures).Furthermore, it can be efficiently implemented in VLSI using an architectural paradigm inspired by systolic arrays.The obtained architecture allows a very efficient implementation (with high speed performances) and a low hardware complexity, due to the fact that the hardware resources are efficiently used by both transforms.
In [2], Benesty et al. provide a comprehensive review on the identification of linear and bilinear systems.In this framework, they present some recent developments that exploit decomposition-based approaches for multiple-input/single-output (MISO) system identification problems.The basic idea is to reformulate such a high-dimension problem in the framework of bilinear forms (as combinations of low-dimension solutions), taking advantage of the Kronecker product decomposition and low-rank approximation of the spatiotemporal impulse response of the system.Based on this approach, the authors develop an iterative Wiener filter, with improved performance features related to the accuracy and robustness of the solution.These performance gains become more important especially in difficult scenarios (e.g., small amounts of available data) and noisy environments.
In [3], Fîciu et al. develop a robust and computationally efficient tensor-based recursive least-squares (RLS) algorithm for the identification of multilinear forms.In this context, a high-dimensional system identification problem can be efficiently addressed based on tensor decomposition and modeling.The resulting gain is twofold, in terms of both performance and complexity.The proposed RLS-based solution involves dichotomous coordinate descent (DCD) iterations to solve the normal equations of the algorithm.Thus, only bit-shifts and additions are required in this step of the algorithm.The robustness of the tensor-based RLS-DCD algorithm relies on the regularization terms.These are incorporated within the cost functions, aiming to attenuate the effects of the system noise.As a result, the proposed algorithm exhibits good performances, especially in adverse conditions, with low signal-to-noise ratios.
In [4], Rusu et al. address the multilinear system framework for the exploitation of system identification problems that could be modeled through parallel or cascaded filters.In this framework, they introduce different memoryless and memory structures that are described from a bilinear perspective.Based on the memory structures, a multilinear recursive least-squares algorithm is developed by considering the Kronecker product decomposition concept.This approach leads to a significant reduction in terms of computational complexity.Thus, the proposed algorithm could be a good candidate for real-time applications, which involve long length impulse responses and systems characterized by reverberation.
In [5], Raciborski and Cariow propose a unified approach to derive DFT algorithms for short length (power of two) input sequences using the Winograd method.Usually, these algorithms are defined based on a set of recurrent relations or using a product of sparse matrices.The proposed approach uses a simple, clear, and easy to understand derivation that allows to obtain such DFT algorithms (in a similar way) for length N = 8, 16, and 32.The proposed algorithms have a reduced number of multiplications, with a slight increase in the number of additions.Moreover, there is a single multiplication on the critical path, which involves additional advantages in terms of speed and accuracy.
In [6], Kolenderski and Cariow present new algorithms for type-I DCT, with reduced complexity.Type-I DCT is a less popular transform, but it has important applications in wireless communication systems.Therefore, it is really useful to develop efficient algorithms for this transform, in order to be involved in real-time applications.In this paper, the authors propose fast algorithms for type-I DCT of small lengths (i.e., N = 2, 3, 4, 5, 6, 7, and 8), which have a reduced arithmetic complexity.The proposed method is based on efficient factorizations of small-size type-I DCT algorithms, which lead to fast versions with good performances.
In [7], Chiper et al. present a unified overview on the main VLSI implementations of the forward and inverse DST transforms existing in the literature, which are based on the systolic array architectural paradigm.The main features of these VLSI implementations are presented in terms of their advantages and drawbacks.One of the central ideas is to show the advantages from a VLSI implementation perspective, using some regular and modular computational structures, such as circular correlation, cyclic convolution, and pseudo-band correlation.Thus, high performances with a reduced hardware complexity are obtained.Using the ideas presented in this review, the authors develop and present a new VLSI implementation of the forward DST.Moreover, this allows an efficient incorporation of the hardware security technique (called obfuscation), with very low overheads.
In [8], Kim develops a reversible data hiding (RDH) method based on the dual absolute moment block truncation coding (AMBTC) technique.The proposed AMBTC-based dual-image RDH uses the Hamming code (7,4) and least-significant bit replacement.The resulting solution improves security, since the AMBTC owns advantages in low-bandwidth channel environments.Moreover, it guarantees sufficient data hiding, proper cover image quality, and the restoration of the original cover image.Thus, it is efficient in terms of both image quality and embedding ratio.Also, by using the AMBTC compressed image as a cover image, fast transmission in a low-traffic network may be possible.In addition, the proposed method owns an important advantage related to the quality of the two marked images, which is almost the same because the payload is properly distributed.
In [9], Chiper presents an improved VLSI algorithm for type-IV DCT, which allows an efficient VLSI implementation with the incorporation of the obfuscation technique.It implies low overheads, while improving at the same time the overall performances (as compared to existing solutions), even without using hardware security techniques.This represents one of the most challenging problems in designing VLSI chips for common goods.The paper is based on a new approach that uses a new VLSI algorithm for DCT IV, which allows an efficient VLSI implementation, with a significant hardware complexity reduction.This maintains high-speed performances and allows an efficient incorporation of the obfuscation technique.The obtained VLSI architecture for type-IV DCT achieves a significant reduction of the hardware complexity (as compared to existing solutions) and allows an efficient incorporation of the hardware security mechanism, with very low overheads.
In [10], Engroff et al. present an EDA (electronic design automation) tool for semiautomatic development of ASIPs (application specific instruction set processors).Such a processor is specifically implemented for a target application, thus allowing a better hardware customization.The solution provides a set of integrated tools to interpret and generate a customized hardware, e.g., compilation, simulation, and hardware synthesis.The proposed methodology is based on a new customizable microprocessor called PAMPIUM, which can be optimized in terms of silicon area, power consumption, and processing performance.Consequently, the resulting tool (namely ASIPAMPIUM) affords a simple and intuitive design flow.The proposed methodology is tested for the implementation of the fast Fourier transform (FFT) algorithm, leading to significantly improved results.
In [11], Bureneva and Mironov propose a modification of the Booth's algorithm for the multiplication by three digits at the same time (on a group of bits).The developed version reduces the number of partial products and the depth of the tree that performs the parallel summation of them.Thus, it accelerates the overall operation of the multiplier module.Moreover, the proposed solution reduces the performance difference between the embedded field-programmable gate array (FPGA) multipliers and the multipliers implemented on logical cells.Such a result can be further used by the designers of digital circuits, in order to select an optimal method for the implementation of a multiplier on FPGAs.

Perspectives
The works presented in this Special Issue motivate us to further focus on advanced algorithms and architectures for digital signal processing (https://www.mdpi.com/journal/electronics/special_issues/Algorithms_Architectures_DSP (accessed on 7 February 2023)).Towards this direction, novel ideas and improved solutions for challenging DSP problems are welcome in the same areas.These can be related to VLSI signal processing, optimization of the VLSI implementation of multimedia blocks, circuits and systems for DSP applications, adaptive/learning algorithms, tensor-based signal processing, and sparsity-aware algorithms.

Conflicts of Interest:
The authors declare no conflict of interest.