You are currently viewing a new version of our website. To view the old version click .
Applied Sciences
  • Article
  • Open Access

29 April 2025

Efficient Implementation of Matrix-Based Image Processing Algorithms for IoT Applications

and
Telecommunications Department, Faculty of Electronics, Telecommunications and Information Technology, POLITEHNICA Bucharest National University for Science and Technology, 060042 Bucharest, Romania
*
Author to whom correspondence should be addressed.

Abstract

This paper analyzes implementation approaches of matrix-based image processing algorithms. As an example, an image processing algorithm that provides both image compression and image denoising using random sample consensus and discrete cosine transform is analyzed. Two implementations are illustrated: one using the Blackfin processor with 32-bit fixed-point representation and the second using the convolutional neural network (CNN) accelerator in the MAX78000 microcontroller. Implementation with Blackfin can be considered a classic approach, in C language, possible on all existing microcontrollers. This implementation is improved by using two cores. The proposed implementation with the CNN accelerator is a new approach that effectively uses the dedicated accelerator for convolutional neural networks, with better results than a classical implementation. The execution time of matrix-based image processing algorithms can be reduced by using CNN accelerators already integrated in some modern microcontrollers to implement artificial intelligence algorithms. The proposed method uses CNN in a different way, not for artificial intelligence algorithms, but for matrix calculations using CNN resources effectively while maintaining the accuracy of the calculations. A comparison of these two implementations and the validation using MATLAB with 64 bits floating point representation are conducted. The obtained performance is good both in terms of quality of reconstructed image and execution time, and the performance differences between the infinite precision implementation and the finite precision implementation are small. The CNN accelerator implementation, based on matrix multiplication implemented using CNN, has a better performance suitable for Internet of Things applications.

2. The Algorithm Description

The above-mentioned algorithm is based on sparsity of discrete cosine transform. The CS-RANSAC algorithm is used to choose the non-noisy pixels. The DCT coefficients are determined using the compressive sensing method. The image to be processed is divided into blocks of smaller size ( B , B ) (chosen according to the size of the two-dimensional DCT transform) and each block is processed separately. Two-dimensional DCT transforms (direct and inverse) will be calculated as matrix multiplications, considering that the block to be processed is transformed into a column vector, x b of dimensions ( N , 1 ) with N = B 2 , which will be multiplied by a matrix W of dimensions ( B 2 , B 2 ) determined by the DCT transformation matrix T with W = T T , where is the Kronecker product. DCT transform will be X = W x , and the inverse DCT transform is determined as x = W 1 X = W T X . For each of the blocks, a subset y b of set x b with S < B 2 pixels are randomly chosen. Matrix A is then determined, which contains the lines of the transpose of W corresponding to the indices of the elements chosen from x b . Using matrix A and vector y b , the first k coefficients of the two-dimensional DCT transform are determined. Using these coefficients, the x r b vector is reconstructed, and the error between x b and x r b is determined. These steps are repeated until the error is small enough or the number of iterations exceeds a maximum imposed number. The last determined DCT coefficients will be the coefficients used to restore the pixels from the processed block. The algorithm is designed so that it can be implemented as efficiently as possible (both on a computer with infinite precision and on a microcontroller with fewer resources and lower precision). The critical elements for microcontroller implementation are matrix multiplication and calculation precision, especially when implementing the inverse matrix function. These aspects will be discussed in the microcontrollers’ implementation section. The algorithm is illustrated in Figure 1. In this figure, the necessary explanation has been included as comments enclosed between “/*” and “*/”. Figure 1a illustrates the core of the algorithm, and Figure 1b,c show the compressive sensing function and pseudo-inverse function, respectively.
Figure 1. The CS-RANSAC algorithm. (a) CS-RANSAC algorithm (b) Compressive sensing function (c) Pseudo-inverse function.
The algorithm uses two functions:
-
The CSrec function, which receives as parameters the set of randomly chosen elements for determining non-noisy pixels, the modified transformation matrix, and the number of DCT coefficients (sparsity factor) and returns the DCT coefficients that will be used to reconstruct the elements in the block [].
-
The pseudoinv function, which calculates the inverse of a matrix using an iterative method [].
It was implemented in MATLAB R2024b (on a computer with an Intel I5-10210U processor at 1.60 GHz, 4 cores, 6 MB cache memory, 16 GB memory RAM, and operating system Windows 10 Pro 64-bit) in order to validate the its functionality and to evaluate the performances obtained with infinite precision.
The threshold d was chosen experimentally as a tradeoff between denoising performance and computational time. The maximum number of iterations N m a x is necessary to avoid a too long running time of the algorithm in searching his convergence, and it is also set experimentally. Many images with various noise types and levels have been used to determine these two parameters as accurately as possible.
Two implementations in C language were made using the Visual DSP++ development environment for Blackfin microcontrollers and a Maxim Eclipse SDK for the MAX78000 microcontroller (Analog Devices Inc./Maxim Integrated, Wilmington, MA, USA, Digi-Key), including ARM, RISC V, and CNN cores, with 32 bits finite precision fixed point.

3. The Algorithm Performance Evaluation

This section describes the performance of the above-described algorithm in terms of the quality of the reconstructed image. The selected parameters of the algorithm were: B = 4 , S = 15 , k = 3 . The image size was set to 512 pixels in height and 512 pixels in width [] (for both type of implementations—MATLAB and microcontrollers). Gaussian noise, salt and pepper noise (impulsive), and multiplicative noise (speckle) were successively added to the test images. Additionally, the image was blurred. The performance of the algorithm was evaluated using the peak signal to noise ratio P S N R = 10 log 10 255 2 1 N M i = 1 N j = 1 M [ I ( i , j ) I r ( i , j ) ] 2 with I as the noisy image and I r as the reconstructed image; both dimensions ( N , M ) and the structural similarity index measure S S I M ( x , y ) = ( 2 μ x μ y + C 1 ) ( 2 σ x y + C 2 ) ( μ x 2 + μ y 2 + C 1 ) ( σ x 2 + σ y 2 + C 2 ) , where μ x , μ y , σ x 2 , σ y 2 , and σ x y are the mean, the variance, and the covariance of pixels in windows x and y . The constant coefficients C 1 and C 2 are used to stabilize the division with a weak denominator. SSIM quantifies image quality degradation caused by data compression or by losses in data transmission. Unlike PSNR, SSIM is based on visible structures in the image and perhaps represents a more reliable indicator of image quality degradation. PSNR is an alternative measurement of the quality of the reconstructed image []. Figure 2, Figure 3 and Figure 4 illustrate the performance of the denoising algorithm with infinite precision (MATLAB implementation—64 bits floating point). The noise is mixed noise. The results show an image quality improvement in both PSNR (up to 6 dB) and SSIM (up to 3 times) when using the CS-RANSAC algorithm.
Figure 2. Performance evaluation for low noise (MATLAB implementation).
Figure 3. Performance evaluation for large noise (MATLAB implementation).
Figure 4. Performance evaluation for impulsive noise (MATLAB implementation).
In the Figure 2, the noised image has a PSNR of 20 dB and SSIM of 0.36. The noise applied on the image is mixed noise: Gaussian variance = 0.001; salt and pepper density 2%; blur kernel window length = 3; speckle variance = 0.001.
In Figure 3, the noised image has a PSNR of 17 dB and SSIM of 0.22. The noise is also mixed noise: Gaussian variance = 0.001; salt and pepper density 5%; blur kernel window length = 3; speckle variance = 0.001.
In Figure 4, the noise is impulsive with a noise density of 10%, and the noised image has a PSNR of 14 dB and SSIM of 0.15.
Table 1 summarizes the performance of the CS-RANSAC algorithm compared with DCT denoising at the same level of sparsity. One can observe that the performance is better for the CS-RANSAC algorithm both for PSNR and SSIM criteria.
Table 1. DCT denoising vs. CS-RANSAC denoising.
A more detailed performance evaluation that proves the comparable or better performance of CS-RANSAC compared with other existing methods (BM3D, TV-L1) is given in []. In our paper, we evaluate the algorithm performance, in various implementation, to give a performance comparison between these implementations and to prove the possibility to implement, with good performance, relative complex image processing algorithms using microcontrollers in IoT applications.

4. The Microcontrollers’ Implementations

Two 32-bit fixed-point microcontroller (e.g., Blackfin BF5xx, and MAX7800x, Analog Devices, Analog Devices Inc./Maxim Integrated, USA, Digi-Key) implementations are proposed in this section.
The Blackfin architecture is designed for multimedia applications, the accessible memory is up to hundreds of Mbytes, and the processor clock frequency is up to 750 MHz. The instruction set is powerful (arithmetic instructions, multiplications with accumulation, dual and quad instructions, hardware loops, multifunction instructions) [,].
The MAX7800x chip is a dual core ultra-low power microcontroller with an ARM Cortex M4 processor with FPU up to 100 MHz with a 16 KB instruction cache, 512 KB flash memory, 128 KB SRAM, and a RISC-V Coprocessor up to 60 MHz for digital signal processing instructions. There are many interfaces (general purpose IO pins—GPIO, serial ports, analog to digital convertor (10 bit, 8 channels), neural network accelerator optimized for deep convolutional neural networks (442k 8-bit weight capacity, network depth up to 64 layers with up to 1024 channels per layer), power management for battery operations, real time clock, timers, AES 128/192/256, and CRC hardware acceleration engine. The ARM Cortex-M4 with FPU processor CM4 is well suited for the neural network system control and combines high-efficiency signal processing functionality with low energy consumption. The 32-bit RISC-V coprocessor is dedicated for ultra-low power consumption signal processing. The instructions set include: four parallel 8-bit additions/subtractions, floating point single precision operations, two parallel 16-bit additions/subtractions, two parallel MACs, 32- or 64-bit accumulate, signed, unsigned, data with or without saturation. A convolutional neural network (CNN) unit is included in the MAX7800x chip.
A more detailed architecture description of MAX 7800x, how the proposed implementation uses the CNN accelerator, and ARM and RISC V cores are shown in the next section.
The above-presented algorithm was written in C programming language, using the integrated development environment Visual DSP ++ 5.1 and Maxim Eclipse SDK. The code was automatically optimized for speed (hardware loops, interprocedural analysis) []. The size S of set S influences the performance of the algorithm. The total number of combinations (randomly chosen) for RANSAC algorithm are C N S = N ( N 1 ) ( N S + 1 ) S ( S 1 ) ...1 . If S is small, the number of combinations is large, and the algorithm may converge in many iterations (especially for large noise). If S is large, the number of combinations is smaller, then the RANSAC algorithm has a smaller number of iterations. Some adaptations of the algorithm were made to reduce the execution time: for S = 15, the method of determining set S has been changed (considering that the number of possible combinations is 16, all the possible combinations will be considered and there is no need to randomly choose), and the number of iterations in the CS-RANSAC algorithm is limited to a maximum of 16 iterations. VSDP++ library functions were used for all matrix and vector operations [,]: matrix multiplication—matmmltf, matrix addition and subtraction—matsadd, matssub, matrix transpose—transpm, maximum and minimum element in a vector—vecmax, vecmin, location of maximum and minimum element in a vector, vecmaxloc, vecminloc. The code uses a 32-bit representation for floating point algorithm variables and computations (multiplications and additions) []. This approach will cause a slight decrease in precision and, therefore, the quality of the reconstructed image, but the use of a 32-bit fixed-point representation would excessively increase the execution time. The use of fixed-point representation keeps the processing time at reasonable values with an acceptable decrease in performance. The execution time was measured, in processor cycles, using the IDEs’ code profiler.

5. The Performance Using 32-Bit Fixed-Point Microcontrollers

This section describes the results obtained using the 32-bit processor. The execution time and the effect of finite precision are shown in Figure 5, Figure 6 and Figure 7. One can observe, in these figures, that the performance is good. For low and medium levels of mixed noise, the CS-RANSAC algorithm has a PSNR greater with up to 4 dB and a SSIM greater up to 80%. The SSIM obtained with CS-RANSAC is better than that of DCT even the differences in PSNR are not so high for high levels of mixed noise. The CS-RANSAC algorithm responds better for impulsive noise, as is shown in Figure 7. In Figure 5, Figure 6 and Figure 7, the noised image characteristics (PSNR, SSIM, noise type) are the same as noised images characteristics from Figure 2, Figure 3 and Figure 4.
Figure 5. Performance evaluation for low noise (microcontrollers implementation).
Figure 6. Performance evaluation for large noise (microcontrollers implementation).
Figure 7. Performance evaluation for impulsive noise (microcontrollers implementation).
Table 2 summarizes the performances, and Table 3 compares the implementation in MATLAB with Blackfin implementation. Table 4 illustrate the execution time considering a Blackfin processor at 750 MHz.
Table 2. Performance comparison (32-bit fixed-point implementation).
Table 3. Performance comparison (32-bit fixed-point vs. MATLAB implementation).
Table 4. Execution time—seconds (32-bit fixed-point vs. MATLAB implementation).
An improved implementation using MAX7800x (with ARM core at 100 MHz, RISC V core at 60 MHz and CNN accelerator with 64 cores at 50 MHz) will be described in the next sections.
One can observe, for medium to high noise, that the execution time is reasonable (about seconds). The execution time can be decreased by using a dual-core Blackfin processor. An implementation based on an accelerator for convolutional neural networks (CNNs) is possible by implementing matrix multiplication using a 1 × 1 convolution performed with CNN.

6. The Improvement of Processing Time Using CNN Accelerator

The multiplication of two matrices A = [ a i j ] and B = [ b i j ] with the result C = A B = [ c i j ] and i , j = 1 N can be performed using a fully interconnected layer as in Figure 8:
Figure 8. Neural network fully interconnected layer used for matrix multiplication.
The input layer consists of each row in matrix A and the output layer contains the elements of the product matrix. For each input element, the weights are the corresponding elements of columns in matrix B or zero elements. For clarity, only the weights for one output elements are shown. Figure 9 details the weights for a simple example ( N = 2 ):
A = [ a 11 a 12 a 21 a 22 ] and   B = [ b 11 b 12 b 21 b 22 ] then   C = A B = [ c 11 c 12 c 21 c 22 ] = [ a 11 b 11 + a 12 b 21 a 11 b 12 + a 12 b 22 a 21 b 11 + a 22 b 21 a 21 b 12 + a 22 b 22 ]
Figure 9. Neural network fully interconnected layer used for matrix multiplication (detailed example for N = 2 ).
For more clarity, the weights have been shown individually for each output element.
The neural network fully interconnected layer has the output c 11 = a 11 b 11 + a 12 b 21 + a 12 0 + a 12 0 and so on. One can observe this layer outputs exactly the product matrix elements.
The implementation of the fully interconnected layer can be done in CNN by enabling the flatten mode (this mode supports a series of 1 × 1 convolution emulating a fully interconnected network with up to 1024 inputs). Matrix multiplication (fixed point) is shown in Figure 10.
Figure 10. Matrix multiplication CNN-based algorithm (fixed point).
Using the above algorithm, a speedup about 30 times can be achieved for integer matrix multiplication, comparing with the implementation on a 32-bit fixed-point microcontroller. This can be useful for algorithms based on matrix computation that does not require a large dynamic range. In common CNNs, the values of neural network layers and weights are represented in fixed point with 8 bits. There are certain applications that require more precision. For example, the CS-RANSAC algorithm, presented in previous sections, requires higher precision due the DCT coefficients of lower order.
We proposed an approach that makes matrix multiplication with increased precision possible, considering a floating-point representation of matrix elements.
We assume that the values of matrix elements are a i j = A i j 2 e a i j , b i j = B i j 2 e b i j and c i j = C i j 2 e c i j with A i j , B i j , C i j - mantises and e a i j , e b i j , e c i j - exponents represented as fixed-point integers with 8 bits. The output elements are c i j = k = 1 N A i k 2 e a i k B k j 2 e b k j = k = 1 N A i k B k j 2 e a i k + e b k j .
The term O i k j = A i k B k j will be computed using a full interconnected layer, as it has been shown previously (with a slight modification of weights—see Figure 11), and the term e a i k + e b k j will be computed using the element-wise function (the element-wise function must be enabled in CNN, and the addition function must be selected).
Figure 11. The full interconnected layer for floating point implementation ( N = 2 ).
Then, the maximum exponent is calculated as E i j , m a x = m a x k = 1 N ( e a i k + e b k j ) , and all the terms O i k j will multiplied with 2 e a i k + e b k j E i j , m a x in a second full interconnected layer. The results X i k j = O i k j 2 e a i k + e b k j E i j , m a x are summed using the element wise CNN features. The sum is calculated iteratively using the element wise addition in log 2 N 2 steps, as in Figure 12.
Figure 12. Element wise addition (example for N = 4 ).
Finally, in a third full interconnected layer, the elements of matrix products c i j = 2 E i j , m a x i k j are calculated. We assume that in image processing, one matrix (image to process) has sub-unitary mantises and e a i j = 0 , i , j = 1 N . The other matrix’s exponents and mantises are constant; therefore, E i j , m a x and 2 e a i k + e b k j E i j , m a x can be passed as parameters in the matrix multiplication function. The complete algorithm is illustrated in Figure 13.
Figure 13. Matrix multiplication CNN-based algorithm (floating point).
The algorithm illustrated in Figure 13 can be efficiently implemented using the MAX78000 chip [,]. The block diagram of MAX7800 is illustrated in Figure 14. The CNN accelerator consists of 64 parallel processors with 512 KB of SRAM-based storage. Each processor includes a pooling unit and a convolutional engine with dedicated weight memory. Four processors share one data memory. These are further organized into groups of 16 processors that share common controls. A group of 16 processors operates as a slave to another group or independently. Data are read from SRAM associated with each processor and written to any data memory located within the accelerator. Any given processor has visibility of its dedicated weight memory and to the data memory instance it shares with the three others.
Figure 14. The MAX7800x general architecture.
In general, an algorithm (or working task) with M instructions with t 1 average execution time per instruction can be divided into two parts: f M —running using one processor with t 2 average execution time per instruction and ( 1 f ) M —running using N processors with t 3 average execution time per instruction, with f < 1 . The speedup is calculated as s = M t 1 f M t 2 + ( 1 f ) M t 3 N = t 1 f t 2 + ( 1 f ) t 3 N . Considering t 2 = r t 1 and t 3 = q t 1 with r , q < 1 , the speedup becomes s = t 1 f r t 1 + ( 1 f ) q t 1 N = 1 f r + ( 1 f ) q N = N N f r + ( 1 f ) q = N f ( N r q ) + q .
The term M t 1 is the computing time for single processor, and the term f M t 2 is the computing time for the non-parallelizable fraction in a multiprocessor system with N processors (cores). The parallelizable fraction ( 1 f ) M t 2 is computed on N processor (cores) so the execution time will be ( 1 f ) M t 2 N . All the computations involved in matrix processing (multiplication, addition) can be implemented using the CNN block in flattened or element wise modes. The above presented CS-RANSAC algorithm illustrated above was implemented using MAX78000 and its CNN accelerator. The numerical precision is similar with numerical precision obtained with previous implementation on Blackfin (the ARM and RISC cores in MAX78000 also use 32-bit fixed-point representation).
In the speedup relation, we set N = 64 (the number of cores in CNN), r = 0.13 (the ratio between the ARM microcontroller speed and Blackfin microcontroller), q = 0.06 (the ratio between CNN cores speed and the Blackfin microcontroller), and f = 0.78 (the algorithm code that does not contain matrix operations that can be performed in CNN). With this value, the theoretical speedup (between Blackfin implementation and MAX7800 implementation) is s = 9.84 . The effective speedup (obtained by counting processor cycles by the IDE code profiler) is lower due to the data transfers performed using RISC V. Figure 15 illustrates the execution time obtained with CNN implementation. In this case (software floating point implementation), the speedup obtained is about 7 times for the CS-RANSAC algorithm. The execution time depends proportionally on image size at a constant noise level because the algorithm split the images in blocks of 16 × 16 pixel, which are processed individually.
Figure 15. Processing time (various noised image sizes) for two 32-bit fixed-point implementations (Blackfin and Max78000 CNN).
The computing time of the algorithm can be improved, and modifying the way to calculate DCT coefficients using original DCT or CS-RANSAC depends on if the block is noisy or not and using the ARM and CNN in parallel.
For the first improving method, the original CS-RANSAC algorithm was combined with a noise estimator []. Each block is marked as light noised or heavy noised and is processed using simple DCT or CS-RANSAC, respectively. The noise estimator can be implemented in fixed point using the ARM and RISC-V microcontrollers in parallel in the MC78000 chip. Figure 16a shows how the tasks for such implementation can be scheduled.
Figure 16. (a) Task scheduling for the algorithm; (b) algorithm improvement using parallel processing.
The following tasks are defined: NE_Task—for noise estimate, P_Task—the processing task that implement all the processing for DCT and CS-RANSAC noise removal and compression algorithms excepts the matrix multiplications and matrix additions, which are computed by a matrix processing task—MP_Task and COMM_task—a communication task that transfer the information (matrix values) between CNN and ARM using DMA channels. The tasks NE_Task and P_Task are running in ARM core, and the task COMM_task is running in the ARM coprocessor (that acts as a direct memory access—DMA controller). All the matrix manipulations are passed to CNN accelerator (programmed in flatten mode for matrix multiplications or element wise mode for matrix additions or subtractions) and are computed by MP_Task. All tasks are synchronized using global semaphores.
Depending on the noise level, the execution time can be reduced as illustrated in Figure 17 [].
Figure 17. Computing time reduction (in dot line—trendline of ratio).
For an average noise probability of 50%, one can observe that the computation time reduction ratio is about 35%. If the noise estimator is not used, the task NE_Task in the scheduling tasks from Figure 16 is removed.
The second method ensures the halving of the computation time. This goal is achieved by partitioning the computation in matrix operations (multiplications, addition)- performed in CNN and non-matrix operations (all the remaining computations)—performed in ARM. Two blocks are processed in parallel in ARM and CNN, alternatively, as shown in Figure 16b.
For relatively small image sizes and frames per seconds, the algorithm can operate for videoclips in real time for small noise levels.

7. Conclusions

This paper focuses on the analysis of the possibility of accelerating the necessary processing in algorithms based on matrix operations. Accelerating these operations can be achieved using neural network processing units (NPUs) integrated into the architecture of today’s high-performance microcontrollers.
The aim of the work was to investigate how much the performance increases if an accelerator integrated in the microcontroller is used and under what conditions it can adapt to perform calculations not specific to the role for which it was designed. As an example, the paper presents implementations and performance analysis of an image compression and noise removal algorithm based on compressive sensing and CS-RANSAC.
This algorithm was validated as a good algorithm in terms of noise removal and image compression using infinite precision implementation (e.g., MATLAB simulations). The main goal of this work is to evaluate if a microcontroller implementation is feasible in terms of processing accuracy and computation time to be used in IoT applications that involve hardware nodes with resource constraints.
The obtained results show that good quality of the reconstructed image can be obtained for medium to relatively high noise levels (specific in IoT systems) in a calculation time of the order of seconds or tenth of seconds.
Additionally, the paper proposes methods to improve the algorithm: (1) by selectively applying DCT or the CS-RANSAC to each block in the image (without degrading the quality of the image), and (2) by using the ARM microcontroller and CNN cores in parallel or using a dual core Blackfin microcontroller.
For relatively small values (matrices of order tens), all implementations are scalable. The dimensions of the matrices are changed in code (written in C language) with only the limitation of the size of the memories in the microcontroller or CNN.
The method proposed by using CNN has certain advantages compared with a classical implementation, but it has better results for algorithms that work in fixed point. The data transfer to feed the CNN can slow down the computation time. Future research will focus on ways to organize calculations to make this data transfer more efficient and should investigate different accelerator hardware implementations and the power consumption of such implementations. Additionally, different noise models and removal noise algorithms for IoT (including static images and videoclips) will be investigate in future research.

Author Contributions

Conceptualization, S.Z.; methodology, R.Z.; software, S.Z.; validation. R.Z.; formal analysis, R.Z.; investigation, S.Z.; resources, S.Z.; data curation, R.Z.; writing—original draft preparation, S.Z.; writing—review and editing, R.Z. and S.Z.; visualization, R.Z.; funding acquisition, S.Z. All authors have read and agreed to the published version of the manuscript.

Funding

The research leading to these results received funding from the Research Centre on Systems Software and Networks in Telecommunication (CCSRST).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. IoT Frequency Bands. Available online: https://www.data-alliance.net/blog/iot-internet-of-things-wireless-protocols-and-their-frequency-bands/ (accessed on 1 May 2024).
  2. Roy, A.; Bandopadhaya, S.; Chandra, S.; Suhag, A. Removal of impulse noise for multimedia-IoT applications at gateway level. Multimed. Tools Appl. 2022, 81, 34463–34480. [Google Scholar] [CrossRef]
  3. Bartyzel, K. Adaptive Kuwahara filter. Signal Image Video Process. 2016, 10, 663–670. [Google Scholar] [CrossRef]
  4. Jiang, J.; Zhang, L.; Yang, J. Mixed Noise removal by weighted encoding with sparse nonlocal regularization. IEEE Trans. Image Process. 2014, 23, 2651–2662. [Google Scholar] [CrossRef] [PubMed]
  5. Strong, D.; Chan, T. Edge-preserving and scale-dependent properties of total variation regularization. Inverse Probl. 2003, 19, S165–S187. [Google Scholar] [CrossRef]
  6. Dabov, K.; Foi, A.; Katkovnik, V.; Egiazarian, K. Image denoising by sparse 3D transform-domain collaborative filtering. IEEE Trans. Image Process. 2007, 16, 2080–2095. [Google Scholar] [CrossRef] [PubMed]
  7. Yang, Y.; Sun, J.; Li, H.; Xu, Z. ADMM-CSNEt: A deep learning approach for image compressive sensing. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 521–538. [Google Scholar] [CrossRef] [PubMed]
  8. Yu, G.; Sapiro, G. DCT Image Denoising: A Simple and Effective Image Denoising Algorithm. Image Process. Line 2011, 1, 292–296. [Google Scholar] [CrossRef]
  9. Bian, S.; He, X.; Xu, Z.; Zhang, L. Image Denoising by Deep Convolution Based on Sparse Representation. Computers 2023, 12, 112. [Google Scholar] [CrossRef]
  10. Mao, J.; Sun, L.; Chen, J.; Yu, S. Overview of Research on Digital Image Denoising Methods. Sensors 2025, 25, 2615. [Google Scholar] [CrossRef] [PubMed]
  11. Strutz, T. Data Fitting and Uncertainty (A Practical Introduction to Weighted Least Squares and Beyond), 2nd ed.; Springer Vieweg: Wiesbaden, Germany, 2016; ISBN 978-3-658-11455-8. [Google Scholar]
  12. Raguram, R.; Frahm, J.-M.; Pollefeys, M. A Comparative Analysis of RANSAC Techniques Leading to Adaptive Real-Time Random Sample Consensus. In Proceedings of the 10th European Conference on Computer Vision Part II, Marseille, France, 12–18 October 2008; pp. 500–513. [Google Scholar]
  13. Stanković, I.; Brajović, M.; Lerga, J.; Daković, M.; Stanković, L. Image denoising using RANSAC and compressive sensing. Multimedia Tools Appl. 2022, 81, 44311–44333. [Google Scholar] [CrossRef]
  14. Candes, E.J.; Wakin, M.B. An Introduction to Compressive Sampling. IEEE Signal Process. Mag. 2008, 25, 21–30. [Google Scholar] [CrossRef]
  15. Rabie, T. Robust estimation approach to blind denoising. IEEE Trans. Image Process. 2005, 14, 1755–1766. [Google Scholar] [CrossRef] [PubMed]
  16. Ponomarenko, M.; Gapon, N.; Voronin, V.; Egiazarian, K. Blind estimation of white Gaussian noise variance in highly textured images. Electron. Imaging 2018, 30, art00016. [Google Scholar] [CrossRef]
  17. A Basic Note on Iterative Matrix Inversion. Available online: https://aalexan3.math.ncsu.edu/articles/mat-inv-rep.pdf (accessed on 1 May 2024).
  18. TAMPERE17 Image Database. Available online: https://webpages.tuni.fi/imaging/tampere17/tampere17_grayscale.zip (accessed on 1 May 2024).
  19. Horé, A.; Ziou, D. Image Quality Metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition, Istanbul, Turkey, 7 October 2010; pp. 2366–2369. [Google Scholar]
  20. ADSP-BF533 Blackfin Processor Hardware Reference. Available online: https://www.analog.com/media/en/dsp-documentation/processor-manuals/ADSP-BF533_hwr_rev3.6.pdf (accessed on 1 May 2024).
  21. ADSP-BF533 EZ-KIT Lite® Evaluation System Manual. Available online: https://www.analog.com/media/en/technical-documentation/user-guides/ADSP-BF533_ezkit_man_rev.3.2.pdf (accessed on 1 May 2024).
  22. VisualDSP++ 5.0 C/C++ Compiler and Library Manual for Blackfin Processors. Available online: https://www.analog.com/media/en/dsp-documentation/software-manuals/50_bf_cc_rtl_mn_rev_5.4.pdf (accessed on 1 May 2024).
  23. Fast Floating-Point Arithmetic Emulation on Blackfin Processors. Available online: https://www.analog.com/media/en/technical-documentation/application-notes/ee.185.rev.4.08.07.pdf (accessed on 1 May 2024).
  24. MAX78000 Data Sheet. Available online: https://www.analog.com/media/en/technical-documentation/data-sheets/MAX78000.pdf (accessed on 1 May 2024).
  25. MAX78000 User Guide. Available online: https://www.analog.com/media/en/technical-documentation/user-guides/max78000-user-guide.pdf (accessed on 1 May 2024).
  26. Zoican, S.; Zoican, R.; Galatchi, D. Image denoising algorithm for IoT based on compressive sensing principle and Blackfin microcontrollers. In Proceedings of the 2024 15th International Conference on Communications (COMM), Bucharest, Romania, 6 November 2024; pp. 1–4. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.