Fast Parallel Gaussian Filter Based on Partial Sums

Bosakova-Ardenska, Atanaska; Andreeva, Hristina; Halvadzhiev, Ivan

doi:10.3390/engproc2025104001

Open AccessProceeding Paper

Fast Parallel Gaussian Filter Based on Partial Sums^†

by

Atanaska Bosakova-Ardenska

^1,*

,

Hristina Andreeva

² and

Ivan Halvadzhiev

³

¹

Department of Electrical Engineering, Electronics, and Automation, University of Food Technologies, 4002 Plovdiv, Bulgaria

²

Department of Mathematics, Physics, and Information Technology, University of Food Technologies, 4002 Plovdiv, Bulgaria

³

Department of Computer Systems and Technologies, University of Food Technologies, 4002 Plovdiv, Bulgaria

^*

Author to whom correspondence should be addressed.

^†

Presented at the International Conference on Electronics, Engineering Physics and Earth Science (EEPES 2025), Alexandroupolis, Greece, 18–20 June 2025.

Eng. Proc. 2025, 104(1), 1; https://doi.org/10.3390/engproc2025104001

Published: 21 August 2025

Download

Browse Figures

Versions Notes

Abstract

As a convolutional operation in a space domain, Gaussian filtering involves a large number of computational operations, a number that increases when the sizes of images and the kernel size also increase. Thus, finding methods to accelerate such computations is significant for overall time complexity enhancement, and the current paper proposes the use of partial sums to achieve this acceleration. The MPI (Message Passing Interface) library and the C programming language are used for the parallel program implementation of Gaussian filtering, based on a 1D kernel and 2D kernel working with and without the use of partial sums, and then a theoretical and practical evaluation of the effectiveness of the proposed implementations is made. The experimental results indicate a significant acceleration of the computational process when partial sums are used in both sequential and parallel processing. A PSNR (Peak Signal to Noise Ratio) metric is used to assess the quality of filtering for the proposed algorithms in comparison with the MATLAB implementation of Gaussian filtering, and time performance for the proposed algorithms is also evaluated.

Keywords:

image processing; parallel algorithm; Message Passing Interface; multi-core architecture

1. Introduction

A digital image is a result of the execution of a transformation function that presents information that has been registered by specific sensors. Depending on the hardware and software components utilized to register and store the images, there are two well-known and popular forms used for image description—the first is in the frequency domain and the second is in the space domain [1]. The processing of images in both domains is based on the use of specific algorithms, which, typically, are categorized according to their purpose. So, if the purpose of image processing is to decrease the size of stored images, then the category of such algorithms is known as “compression”. If the purpose is to divide the image into segments that contain pixels sharing common characteristics, then the category of such algorithms is known as “segmentation”. If the purpose is the modification of structural elements by the erosion or dilatation of contours, then the category of algorithms is known as “morphology”. If the purpose is noise reduction, then the category of algorithms is known as “denoising” or “filtering”, etc. The main reason for scientists developing many filtering techniques is the current variety of noise sources, which have negative effects on the image’s content [2]. In addition, it is known that some filtering algorithms affect specific structural elements in processed images, which have a positive effect on image content in such a way as to prepare it for further processing steps related to contour definition, object extraction, etc. [3,4]. Every filtering algorithm uses some function (linear or non-linear) that describes a series of operations that have to be performed in order to define the new (filtered) value of every pixel of the processed image. In light of contemporary trends for growing image sizes and increasing numbers, and considering the high computational cost of filtering operations, a lot of algorithms have been developed to accelerate filtering performance. Some of these algorithms are implemented using specific hardware components, such as those proposed by Vijeyakumar et al., comprising a fast median filter based on specific logical elements (digital comparators and adders) [5] and fast linear filters, as proposed by Hassan and Gawande et al. [6,7], which were designed using languages specifically for hardware design, such as Verilog and VHDL, to implement filters on FPGA circuits. Other algorithms for fast filtering are designed to utilize multiple cores on video cards using CUDA technology [8,9] or to utilize parallel threads that are supported by multiprocessor systems [10]. It is notable that advancements in hardware and software technologies imply increasing interest in developing fast parallel algorithms for image filtering that also reduce energy consumption; thus, the current research supports contemporary trends.

The main aim of the current research is to propose an algorithm for fast Gaussian filtering using concepts designed for parallel computations and the optimization of mathematical operations for processing every pixel, based on partial sums.

2. Materials and Methods

Current research proposes a fast parallel algorithm for image filtering using a Gaussian filter, and the next four subsections discuss the known parallel implementations of the Gaussian filter, the proposed algorithms, and the program techniques used, along with a set of images that were selected for an experimental evaluation of the performance and filtering quality of the proposed algorithm (the quality of filtering was evaluated using the PSNR/Peak Signal-to-Noise Metric).

2.1. Related Works

The name “Gaussian filter” comes from the base principle that is used for calculating the coefficients of the kernel (mask), which is related to the well-known normal (Gaussian) distribution that is used to describe noise models and, consequently, to describe correction coefficients that can remove or reduce the noise in the processed image [1]. The next formula describes a function of the Gaussian kernel (G) in the spatial domain:

G (r) = K e^{- \frac{r^{2}}{2 σ^{2}}} .

(1)

In Formula (1), r is the size of the kernel, σ is a standard deviation that is used to describe a normal distribution, and K is a coefficient that can be used to normalize the calculated kernel coefficients. It is obvious that when using Formula (1), the coefficients will be floating-point numbers, which implies a negative effect on computational complexity. Regarding this effect, some authors offer approximations in order to accelerate image processing, using only integer coefficients [2,11,12]. In contrast, the usage of parallel processing has yielded encouraging results with respect to accelerating the computational process for Gaussian filtering [13]. In recent years, different techniques for parallel computing have been exploited to provide more effective Gaussian filtering regarding time consumption. Examples of published work in these directions include the following research: Bozkurt et al. presented a GPU-based (Graphical Processing Unit-based) implementation that performs better than the standard CPU-based implementation of the Gaussian filter [14]; Rybakova et al. discussed the acceleration of filtering using vectorization operations supported by contemporary processors with SIMD (Single Instruction Multiple Data) features [15]; Fauzie et al. designed a cluster of single-board computers to implement filtering that was faster than traditional Gaussian filtering [16]. Since Gaussian filtering offers the potential for practical application in a variety of areas, such as satellite image processing [17,18], medical image processing [19,20], and others, it is crucial to develop fast implementation methods that can be exploited by the contemporary multicore architectures that are now widespread, and this need also motivates the current research.

2.2. Parallel Algorithms for Gaussian Filtering

The kernels for Gaussian filtering can be presented as vectors or matrices, depending on whether their purpose is filtering 1D or 2D signals [1]. Typically, a digital image consists of pixels that are arranged in multiple rows, i.e., every image can be presented as a 2D matrix of pixels, and every pixel has a color that can be saved in one or more color channels, which define the type of image as either a grayscale or a color image. By thinking about the two-dimensional image matrix as a set of vectors, it can be supposed that every vector can be filtered using a 1D kernel. Based on these concepts, three algorithms for sequential and parallel Gaussian filtering have been designed: A1, which uses a 1D Gaussian kernel and is named Gaussian1D; A2, which uses the 2D implementation of a Gaussian filter, based on copying the coefficients of a 1D kernel into a 2D kernel, and is named Gaussian2D; and A3, which modifies algorithm A2 using partial sums to improve time performance and is named Gaussian2DParSums. Sequential implementations of the above-mentioned three algorithms were implemented to support an experimental analysis of the effectiveness of parallel algorithms.

All the discussed parallel algorithms use a row distribution of input images among all the parallel processes that are assigned to the Master process. Figure 1 describes the common scheme of algorithms, using the AMPA model for parallel algorithm representation [21].

The Master process is responsible for managing the parallel calculations, so it executes those reading and writing operations related to the input and output images and distributes parts of the input image among the slave processes, together with its self-calculated Gaussian kernel coefficients. When every process has its own part of an image, it can start to filter it until, finally, all the filtered parts of the image are combined by the Master process to produce an output image.

There is a single difference between algorithms Gaussian1D and Gaussian2D, and this is related to the normalization of the coefficients of the kernel. For algorithm Gaussian1D, the normalization coefficient is:

K_{G a u s s i a n 1 D} = \frac{1}{\sum e^{- \frac{r^{2}}{2 σ^{2}}}} .

(2)

Its main purpose is to normalize all calculated coefficients in such a way that the sum of all normalized coefficients is equal to 1. For the algorithm Gaussian2D, the normalization coefficient is similar, but a multiplication of the number of all coefficients in the 2D kernel is added into the divisor of Formula (3), and the purpose is again the same.

K_{G a u s s i a n 2 D} = \frac{1}{r^{2} * \sum e^{- \frac{r^{2}}{2 σ^{2}}}}

(3)

The algorithm Gaussian2DParSums uses the 2D Gaussian kernel defined by Formula (3) and partial sums to accelerate the calculation process [22]. The main idea is that when the mask (kernel) is moving to process the next pixel in the row, the previously calculated sums of kernel coefficients multiplied by the pixels’ values could then be reused to decrease computational complexity. The next pseudocode implements this calculation, assuming that the input image is stored in the array img[m][n], which contains all the image pixels (situated in m rows and n columns, respectively).

int i, j, k, p, idx, mask, img[m][n]
double tot, sum1, kernel[mask][mask], sum[mask]
for i = 1 to m − mask − 1
for k = 1 to mask {sum [k] = 0}
for j = 1 to n − mask − 1
if i = 1 then
tot = 0
for k = 1 to mask
for p = 1 to mask
sum [k] + = img [i + p − 1][j + k − 1] ∗ kernel [p] [k]
tot + = sum [k]
end for
end for
end if else
sum1 = 0;
for idx = 1 to mask {sum1 + = img [i + idx − 1][j] ∗ kernel [idx][1]}
tot = sum1;
for idx = mask to 2 {sum [idx] = sum [idx − 1]; tot + = sum [idx]}
sum[1] = sum1;
end else {img [i][j + mask/2] = tot;}
end for
end for

With mask is noted size of the mask (which could be an odd number—3, 5, 7, etc.), while kernel[][] represents an array that contains Gaussian coefficients, calculated according to Formula (3). The array sum[] has a size that coincides with the mask size because it has to store the sum of the multiplications of kernel coefficients with the pixels’ values in the same column. It is obvious that only when calculating a new value for the first pixel in every row, the full set of calculations are performed (mask ∗ mask number of multiplications and mask ∗ (mask − 1) number of summing-up operations), and for all remaining pixels in the row, the number of calculations decreases dramatically (mask number of store operations, mask number of multiplications, and 2 ∗ (mask − 1) number of summing-up operations).

Regarding the time complexity of the three discussed algorithms, it can be seen that the smallest time complexity described by Formula (4) is for the algorithm Gaussian1D, while the biggest time complexity, which is defined by Formula (5), is for the algorithm Gaussian2D. The algorithm Gaussian2DParSums produces the same result (filtered image) as the algorithm Gaussian2D, but it has better time complexity, which is defined by Formula (6), than the algorithm Gaussian2D, due to the optimization of calculations based on the use of previously calculated partial sums.

T (N, M) = N M + N (M - 1) = N (2 M - 1),

(4)

T (N, M) = M^{2} N + N (M^{2} - 1) = N (2 M^{2} - 1),

(5)

T (N, M) = \sqrt{N} (2 M^{2} - 1) + (N - \sqrt{N}) (2 M - 1) .

(6)

The time complexity of all the discussed algorithms depends on image size (N is the number of all pixels in the input image) and kernel size (denoted as M). It is assumed that the processed image has a square form, and, because of this, the number of pixels in one column is the square root of N. When the size of an image is large, it can be supposed that the time complexity of the algorithm Gaussian2DParSums will be approximately equal to the time complexity of the algorithm Gaussian1D. It can be observed that if the size of the image is a constant, then the time complexity has a linear relationship with the kernel’s size for the algorithm Gaussian1D, a quadratic relationship for the algorithm Gaussian2D, and a nearly linear relationship for the algorithm Gaussian2DParSums.

2.3. Experimental Settings

A dataset of color images was used to support our experimental evaluation of the proposed algorithms. This dataset contained five images in 24-bit BMP format, which is preferred due to its data-lossless feature. Table 1 presents information about the size and resolution of the selected images.

A mobile computer system with a multicore architecture was used for experimental performance evaluation, and its parameters are: processor Intel(R) Core(TM) i5-11320H CPU@3.20 GHz 3.20 GHz, 11th GEN, with 8 computational cores and 8 GB of operating memory. All selected images were filtered using MATLAB (version R2024b Trial Update 4) and the function imgaussfilt(). The filtered images were used later to evaluate the filtering effect in comparison with the one that is seen when using the developed algorithms based on 1D and modified 2D Gaussian kernels.

2.4. Evaluation Quality of Filtering and Time Performance

The ratio between signal and noise in the image was evaluated using the popular PSNR metric. This ratio is often used for the evaluation of denoising algorithms, and it is usually calculated as a logarithmic quantity using the decibel scale [23]. For the current research, the MATLAB function psnr() was used to evaluate the filtering effect, based on the PSNR metric.

To evaluate the time performance of the proposed algorithms, we used the MPI function MPI_Wtime(), which measures the elapsed time in seconds. An evaluation of parallel performance is typically based on two metrics—the well-known speed-up factor (S_p) and the efficiency [13]. The speed-up factor is a dimensionless measure for acceleration, which is calculated as the ratio of time for serial processing to the time for parallel processing, and the efficiency (E_p) is also a dimensionless measurement of the efficiency of parallel execution regarding the number of parallel processors (cores), which is calculated as the ratio of the measured speed-up for parallel execution to the number of parallel processors.

S_{p} = \frac{T_{s e r i a l}}{T_{p a r a l l e l}} and E_{p} = \frac{S_{p}}{p}

(7)

The number of parallel processors in Formula (7) is noted as p.

3. Results and Discussion

The dataset of color images was processed with the developed algorithms, based on 1D and modified 2D Gaussian kernels with kernel sizes of 3, 5, 7, 9, and 11. The images are filtered also with the built-in MATLAB function imgaussfilt() with values of the parameter sigma 0.5, 1, 1.5, 2 and 2.5, which values corresponds to the used kernel sizes for the developed algorithms (sigma is equal to half of the kernel radius, i.e., it is approximately a quarter of the kernel size). Table 2 details the original images and the selected ROI from each one. It also shows the filtered ROI with the algorithms Gaussian1D and Gaussian2D, along with the MATLAB function imgaussfilt(), using the maximal values of the kernel size and sigma. It can be seen that the visually observed difference between the original image and the filtered image using the algorithm Gaussian1D is smallest in comparison with the differences between the original image and the images filtered using the Gaussian2D algorithm and the imgaussfilt() function. This observation can be explained by the number of neighboring pixels used, which affects the calculated value of every processed pixel; this number is the smallest for the algorithm Gaussian1D (mask − 1 number of neighbors for algorithm Gaussian1D versus the (mask − 1) ∗ (mask − 1) number of neighbors for algorithm Gaussian2D). For all filtered images, a blurring effect with different strength levels is observed, but for the image “snow_trees”, this effect is the reason for the removal of some artefacts (a small black stick and tire tracks in the snow).

After filtering the images, a built-in MATLAB function psnr() was used to calculate the peak signal-to-noise ratio values, comparing the original and filtered images. The results are presented in Table 3. It can be seen that when the kernel size increases, the calculated PSNR values for all filtered images change, and this change is smallest when the Gaussian1D algorithm is used for filtering, which can be explained by a very small increase in the number of neighbors used for the filtering of every pixel when the size of the kernel increases. The highest PSNR values for every filtered image when one filtering algorithm was used were calculated for the smallest kernel size because then, the effect of filtering is minimal; thus, increasing the kernel size leads to decreasing PSNR values. It is evident that the highest PSNR values were calculated for images filtered using the Gaussian1D algorithm and imgaussifilt() function, which means that these images were of higher quality in comparison with those that were filtered using the Gaussian2D algorithm. Nevertheless, the PSNR values for images filtered using the Gaussian2D algorithm were also high.

When filtering the images with the developed algorithms, the time needed was measured. The execution time of the algorithms was tested with different numbers of parallel processes, using 1, 2, 4, and 8. Every image was processed 10 times with the selected parameters (algorithm, kernel size, and number of parallel processes). The obtained results were averaged for all images at each combination of selected parameters and are presented in Figure 2. It is evident that when the kernel size increases, then the time for processing increases too, and this increase is significant for the algorithm Gaussian2D and smaller for the algorithm Gaussian2DParSums, which corresponds to their theoretical time complexity evaluations. It can be seen that the increasing number of parallel processes leads to a decreasing amount of time being needed for filtering, which can be explained by the utilization of available computational cores. The speed-up factor and the efficiency of parallel execution were calculated based on the measured times. Figure 3 and Figure 4 graphically represent the values when processing the smallest and largest images with the developed algorithms, using different kernel sizes.

Table 4 presents the average execution time for 10 measurements using one and eight processes and algorithms: Gaussian1D, Gaussian2D, and Gaussian2DParSums. It can be observed that standard deviation is the biggest factor for the timing of the algorithm Gaussian2D, which can be explained by its large computational complexity in comparison with the other two algorithms (Gaussian1D and Gaussian2DParSums). Another observation is related to fluctuations in measured time, which were bigger when eight parallel processes were used for filtering, in comparison with serial processing.

It is observed that the speed-up factor increases significantly for the algorithm Gaussian2D when the kernel size increases, but its value increases slightly for algorithms Gaussian1D and Gaussian2DParSums because their computational cost has a linear relationship with kernel size, whereas the computational cost of the Gaussian2D algorithm depends on kernel size by a quadratic relationship. It is evident that the increasing speed-up factor corresponds to decreasing efficiency values because the workload of one computational core decreases when the overall computational task is distributed among more processors (cores). Thus, the low efficiency of one core when solving a particular problem (filtering, in the current work) means that its working time could be utilized to solve more problems, which, in turn, means that the energy consumption needed to solve one problem decreases. Another observation regarding the results presented in Figure 3 is related to the influence of image size on the acceleration of parallel execution. It can be seen that the speed-up factor for the same size of kernel is higher when the size of the image is larger. For the algorithm Gaussian1D, when the size of the kernel is 3, in the case of the smallest image (waterliliya), the speed-up factor for 8 parallel processes is about 1.89, whereas its value for the biggest image (old_Nessebar) is about 2.49. For the algorithm Gaussian2D, when the size of the kernel is 3, then, for the smallest image (waterliliya), the speed-up factor for 8 parallel processes is about 2.89, whereas its value for the biggest image (old_Nessebar) is about 3.61. For the algorithm Gaussian2DParSums, when the size of the kernel is 3, then, for the smallest image (waterliliya), the speed-up factor for 8 parallel processes is about 2.07, whereas its value for the biggest image (old_Nessebar) is about 2.79. These observations are in line with our expectations related to the influence of image size on time performance and the subsequent acceleration, which are described in Formulae (4)–(6). Increasing speed-up when the size of the image also increases is related to improving the ratio of time for maintaining the overall parallel execution of real data processing.

4. Conclusions

Reducing the time complexity of fundamental operations involving images, such as Gaussian filtering, leads to the effective utilization of contemporary multicore architectures regarding their energy consumption. Thus, the current research supports trends for sustainable ITC development, and the results can be summarized as follows:

the application of parallel processing using the MPI (Message Passing Interface) library significantly accelerates image filtering using Gaussian filters;
the use of previously calculated partial sums leads to a significant acceleration of computations in both cases, when the processing is sequential and when the processing is parallel;
the size of the Gaussian kernel significantly influences the time complexity of parallel algorithms that do not use partial sums, and slightly influence the time complexity of parallel algorithms that use partial sums;
the size of the kernel also influences the quality of the filtered images, which is proved using the popular PSNR metric;
the efficiency of parallel execution is higher when the acceleration of calculations using partial sums is not attempted, but this means that every computational core has more workload, in comparison with the scenario when partial sums are utilized to improve time performance and, thus, the use of partial sums leads to decreasing energy consumption for Gaussian filtering;
the fluctuations of measured time are greater when filtering is performed without using partial sums because, in this case, the time for processing is the longest; thus, the use of partial sums leads to more stable timing results, which yields more robust processing regarding time consumption when partial sums are used.

In future work, the presented parallel algorithms for Gaussian filtering could be implemented and evaluated both analytically and practically using other parallel architectures, such as GPU-based systems. As another direction for future work, this research could continue with an experimental evaluation of the time performance of the proposed algorithms when a large dataset of high-resolution images is filtered, and when an approximation of the integers of Gaussian coefficients is used to improve time performance.

Author Contributions

Conceptualization, A.B.-A., H.A. and I.H.; methodology, software, validation, formal analysis, investigation, and resources, A.B.-A. and H.A.; data curation, A.B.-A.; writing—original draft preparation, A.B.-A. and H.A.; writing—review and editing, A.B.-A.; visualization, H.A. and A.B.-A.; supervision, A.B.-A.; project administration, A.B.-A.; funding acquisition, A.B.-A. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by the University of Food Technologies (Science Fund).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Any data not contained within this article can be provided by the authors on request.

Conflicts of Interest

The authors declare that there are no conflicts of interest.

References

Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 4th ed.; Pearson Education: New York, NY, USA, 2018. [Google Scholar]
Mafi, M.; Martin, H.; Cabrerizo, M.; Andrian, J.; Barreto, A.; Adjouadi, M. A comprehensive survey on impulse and Gaussian denoising filters for digital images. Signal Process. 2019, 157, 236–260. [Google Scholar] [CrossRef]
Desai, B.; Kushwaha, U.; Jha, S. Image Filtering—Techniques, Algorithm and Applications. Appl. GIS 2020, 7, 970–975. [Google Scholar]
Mohanty, S.; Tripathy, S. Application of Different Filtering Techniques in Digital Image Processing. J. Phys. Conf. Ser. 2021, 2062, 012007. [Google Scholar] [CrossRef]
Vijeyakumar, K.N.; Joel, P.T.N.K.; Jatana, S.H.S.; Saravanakumar, N.; Kalaiselvi, S. Area Efficient Parallel Median Filter Using Approximate Comparator and Faithful Adder. IET Circuits Devices Syst. 2020, 14, 1318–1331. [Google Scholar] [CrossRef]
Hasan, S.K. FPGA Implementations for Parallel Multidimensional Filtering Algorithms. Ph.D. Thesis, Agriculture and Engineering Newcastle University, Newcastle upon Tyne, UK, June 2013. [Google Scholar]
Gawande, G.S.; Dhotre, D.R.; Choubey, N.; Mate, D.S. A MILP based Optimization and FPGA Implementation of Efficient Polphase Multirate Filters. Int. J. Intell. Syst. Appl. Eng. 2023, 11, 164–172. [Google Scholar]
Skirnevskiy, I.P.; Pustovit, A.V.; Abdrashitova, M.O. Digital image processing using parallel computing based on CUDA technology. J. Phys. Conf. Ser. 2017, 803, 012152. [Google Scholar] [CrossRef]
Bosakova-Ardenska, A.; Bosakov, L. Parallel image processing with mean filter. In Proceedings of the International Scientific Conference—UniTech, Gabrovo, Bulgaria, 16–17 November 2012; pp. I-362–I-365. [Google Scholar]
Stoica, V.; Coconu, C.; Ionescu, F. Parallel Implementation of Image Filtering Algorithms in Multiprocessor Systems. IFAC Proc. Vol. 2001, 34, 349–354. [Google Scholar] [CrossRef]
Kabbai, L.; Sghaier, A.; Douik, A.; Machhout, M. FPGA implementation of filtered image using 2D Gaussian filter. Int. J. Adv. Comput. Sci. Appl. 2016, 7, 514–520. [Google Scholar] [CrossRef]
Ivashko, A.; Zuev, A.; Karaman, D.; Moškon, M. Fpga-based implementation of a Gaussian smoothing filter with powers-of-two coefficients. Adv. Inf. Syst. 2024, 8, 39–47. [Google Scholar] [CrossRef]
Saxena, S.; Sharma, S.; Sharma, N. Parallel Image Processing Techniques, Benefits and Limitations. Res. J. Appl. Sci. Eng. Technol. 2016, 12, 223–238. [Google Scholar] [CrossRef]
Bozkurt, F.; Yağanoğlu, M.; Günay, F.B. Effective Gaussian Blurring Process on Graphics Processing Unit with CUDA. Int. J. Mach. Learn. Comput. 2015, 5, 57–61. [Google Scholar] [CrossRef]
Rybakova, E.O.; Limonova, E.E.; Nikolaev, D.P. Fast Gaussian Filter Approximations Comparison on SIMD Computing Platforms. Appl. Sci. 2024, 14, 4664. [Google Scholar] [CrossRef]
Fauzie, A.N.; Sakti, S.P.; Rahmadwati. Parallel Implementation of Gaussian Filter Image Processing on a Cluster of Single Board Computer. EECCIS 2023, 17, 82–88. [Google Scholar] [CrossRef]
Singh, K.; Nair, J.S. Satellite Image Enhancement Using DWT and Gaussian Filter. Int. J. Comput. Sci. Eng. 2018, 6, 289–297. [Google Scholar] [CrossRef]
Bausys, R.; Kazakeviciute-Januskeviciene, G.; Cavallaro, F.; Usovaite, A. Algorithm Selection for Edge Detection in Satellite Images by Neutrosophic WASPAS Method. Sustainability 2020, 12, 548. [Google Scholar] [CrossRef]
Devi, T.G.; Patil, N.; Rai, S.; Philipose, C.S. Gaussian Blurring Technique for Detecting and Classifying Acute Lymphoblastic Leukemia Cancer Cells from Microscopic Biopsy Images. Life 2023, 13, 348. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Kim, H.S.; Cho, S.G.; Kim, J.H.; Kwon, S.Y.; Lee, B.I.; Bom, H.S. Effect of Post-Reconstruction Gaussian Filtering on Image Quality and Myocardial Blood Flow Measurement with N-13 Ammonia PET. Asia Ocean J. Nucl. Med. Biol. 2014, 2, 104–110. [Google Scholar] [PubMed] [PubMed Central]
Bosakova-Ardenska, A. One approach for parallel algorithms representation. Balk. J. Electr. Comput. Eng. 2017, 5, 30–33. [Google Scholar] [CrossRef]
Bosakova-Ardenska, A.; Vasilev, N. Parallel algorithms of the scanning mask method for primary images processing. In Proceedings of the 5th International Conference on Computer Systems and Technologies, Rousse, Bulgaria, 17–18 June 2004; pp. 1–7. [Google Scholar] [CrossRef]
Sara, U.; Akter, M.; Uddin, M.S. Image quality assessment through FSIM, SSIM, MSE and PSNR—A comparative study. J. Comput. Commun. 2019, 7, 8–18. [Google Scholar] [CrossRef]

Figure 1. Parallel algorithm for Gaussian filtering.

Figure 2. Average execution time needed for processing all images in the data set.

Figure 3. Speed-up factor for parallel execution: (a) waterliliya kernel size 3; (b) old_Nessebar kernel size 3; (c) waterliliya kernel size 5; (d) old_Nessebar kernel size 5; (e) waterliliya kernel size 7; (f) old_Nessebar kernel size 7; (g) waterliliya kernel size 9; (h) old_Nessebar kernel size 9; (i) waterliliya kernel size 11; (j) old_Nessebar kernel size 11.

Figure 4. Efficiency of parallel execution: (a) waterliliya kernel size 3; (b) old_Nessebar kernel size 3; (c) waterliliya kernel size 5; (d) old_Nessebar kernel size 5; (e) waterliliya kernel size 7; (f) old_Nessebar kernel size 7; (g) waterliliya kernel size 9; (h) old_Nessebar kernel size 9; (i) waterliliya kernel size 11; (j) old_Nessebar kernel size 11.

Table 1. Information about the dataset of images used in this study.

Image	Resolution [Pixels]	Size [kb]
waterliliya.bmp	450 × 800	1057
street_art.bmp	700 × 1244	2552
snow_trees.bmp	1050 × 1865	5741
house.bmp	1452 × 2581	10,980
old_Nessebar.bmp	2000 × 3555	20,831

Table 2. Original images and example for filtered ROIs using a kernel size of 11 (sigma = 2.5).

Image	Original Image and Selected ROI	Filtered ROI Using Gaussian1D Algorithm	Filtered ROI Using Gaussian2D Algorithm	Filtered ROI Using MATLAB Function
waterliliya.bmp
street_art.bmp
snow_trees.bmp
house.bmp
old_Nessebar.bmp

Table 3. PSNR values calculated for the original images and filtered ones.

Algorithm	Kernel Size (Sigma)	Waterliliya	Street_Art	Snow_Trees	House	Old_Nessebar
Gaussian1D	3 (0.5)	29.15	30.37	36.22	35.66	44.87
	5 (1)	28.31	29.20	34.18	33.44	42.56
	7 (1.5)	28.27	29.15	34.04	33.27	42.38
	9 (2)	28.29	29.15	34.04	33.27	42.38
	11 (2.5)	28.31	29.16	34.05	33.28	42.38
Gaussian2D	3 (0.5)	26.13	25.78	31.21	31.79	39.23
	5 (1)	24.35	24.00	27.86	27.42	35.40
	7 (1.5)	23.43	23.27	26.31	25.39	33.63
	9 (2)	22.78	22.76	25.27	24.16	32.40
	11 (2.5)	22.27	22.37	24.51	23.30	31.39
MATLAB function imgaussfilt()	3 (0.5)	33.44	33.51	39.34	40.64	47.58
	5 (1)	25.94	25.42	30.01	30.48	37.51
	7 (1.5)	24.21	23.66	27.13	26.90	33.91
	9 (2)	23.22	22.77	25.56	24.85	31.92
	11 (2.5)	22.52	22.19	24.55	23.50	30.66

Table 4. Average execution time in seconds and standard deviation.

Algorithm	Kernel	Processes	Waterliliya	Street_Art	Snow_Trees	House	Old_Nessebar
Gaussian1D	3	1	0.0071 ± 0.0001	0.0173 ± 0.0003	0.0386 ± 0.0004	0.0747 ± 0.0030	0.1416 ± 0.0037
	3	8	0.0037 ± 0.0002	0.0080 ± 0.0005	0.0175 ± 0.0010	0.0304 ± 0.0031	0.0570 ± 0.0040
	5	1	0.0116 ± 0.0002	0.0279 ± 0.0002	0.0641 ± 0.0027	0.1219 ± 0.0026	0.2323 ± 0.0063
	5	8	0.0051 ± 0.0003	0.0111 ± 0.0008	0.0251 ± 0.0016	0.0442 ± 0.0038	0.0772 ± 0.0050
	7	1	0.0158 ± 0.0003	0.0385 ± 0.0009	0.0873 ± 0.0028	0.1688 ± 0.0054	0.3204 ± 0.0071
	7	8	0.0064 ± 0.0005	0.0151 ± 0.0005	0.0304 ± 0.0022	0.0542 ± 0.0046	0.0979 ± 0.0053
	9	1	0.0208 ± 0.0002	0.0510 ± 0.0011	0.1162 ± 0.0032	0.2253 ± 0.0048	0.4189 ± 0.0030
	9	8	0.0079 ± 0.0003	0.0179 ± 0.0013	0.0370 ± 0.0024	0.0660 ± 0.0051	0.1194 ± 0.0050
	11	1	0.0263 ± 0.0003	0.0644 ± 0.0011	0.1479 ± 0.0036	0.2831 ± 0.0038	0.5354 ± 0.0100
	11	8	0.0093 ± 0.0007	0.0212 ± 0.0012	0.0464 ± 0.0038	0.0744 ± 0.0045	0.1349 ± 0.0043
Gaussian2D	3	1	0.0231 ± 0.0010	0.0552 ± 0.0021	0.1255 ± 0.0036	0.2402 ± 0.0042	0.4545 ± 0.0046
	3	8	0.0082 ± 0.0003	0.0192 ± 0.0012	0.0372 ± 0.0022	0.0659 ± 0.0035	0.1259 ± 0.0058
	5	1	0.0681 ± 0.0008	0.1663 ± 0.0010	0.3734 ± 0.0025	0.7133 ± 0.0018	1.3566 ± 0.0036
	5	8	0.0198 ± 0.0016	0.0421 ± 0.0022	0.0858 ± 0.0041	0.1522 ± 0.0030	0.2876 ± 0.0105
	7	1	0.1371 ± 0.0013	0.3322 ± 0.0017	0.7512 ± 0.0036	1.4383 ± 0.0060	2.7304 ± 0.0057
	7	8	0.0361 ± 0.0032	0.0727 ± 0.0052	0.1556 ± 0.0035	0.2888 ± 0.0073	0.5305 ± 0.0092
	9	1	0.2252 ± 0.0009	0.5515 ± 0.0022	1.245 ± 0.0042	2.3857 ± 0.0038	4.5355 ± 0.0059
	9	8	0.0504 ± 0.0043	0.1153 ± 0.0034	0.2443 ± 0.0064	0.4545 ± 0.0125	0.8646 ± 0.0271
	11	1	0.3371 ± 0.0023	0.8232 ± 0.0020	1.8688 ± 0.0071	3.5789 ± 0.0039	6.8095 ± 0.0026
	11	8	0.0727 ± 0.0059	0.1644 ± 0.0057	0.3637 ± 0.0090	0.6723 ± 0.0194	1.2543 ± 0.0157
Gaussian2DParSums	3	1	0.0091 ± 0.0002	0.0220 ± 0.0006	0.0505 ± 0.0023	0.0978 ± 0.0055	0.1834 ± 0.0058
	3	8	0.0044 ± 0.0002	0.0098 ± 0.0006	0.0215 ± 0.0012	0.0363 ± 0.0020	0.0657 ± 0.0045
	5	1	0.0134 ± 0.0001	0.0332 ± 0.0021	0.0756 ± 0.0037	0.1413 ± 0.0025	0.2676 ± 0.0062
	5	8	0.0059 ± 0.0004	0.0132 ± 0.0006	0.0277 ± 0.0025	0.0483 ± 0.0026	0.0867 ± 0.0031
	7	1	0.0188 ± 0.0003	0.0454 ± 0.0008	0.1020 ± 0.0015	0.1966 ± 0.0030	0.3748 ± 0.0094
	7	8	0.0076 ± 0.0003	0.0165 ± 0.0012	0.0341 ± 0.0028	0.0593 ± 0.0033	0.1139 ± 0.0063
	9	1	0.0247 ± 0.0003	0.0593 ± 0.0008	0.1330 ± 0.0005	0.2549 ± 0.0013	0.4887 ± 0.0043
	9	8	0.0093 ± 0.0003	0.0198 ± 0.0014	0.0413 ± 0.0021	0.0748 ± 0.0059	0.1280 ± 0.0040
	11	1	0.0306 ± 0.0010	0.0731 ± 0.0004	0.1656 ± 0.0023	0.3182 ± 0.0039	0.6016 ± 0.0102
	11	8	0.0108 ± 0.0004	0.0237 ± 0.0012	0.0465 ± 0.0036	0.0844 ± 0.0057	0.1564 ± 0.0065

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bosakova-Ardenska, A.; Andreeva, H.; Halvadzhiev, I. Fast Parallel Gaussian Filter Based on Partial Sums. Eng. Proc. 2025, 104, 1. https://doi.org/10.3390/engproc2025104001

AMA Style

Bosakova-Ardenska A, Andreeva H, Halvadzhiev I. Fast Parallel Gaussian Filter Based on Partial Sums. Engineering Proceedings. 2025; 104(1):1. https://doi.org/10.3390/engproc2025104001

Chicago/Turabian Style

Bosakova-Ardenska, Atanaska, Hristina Andreeva, and Ivan Halvadzhiev. 2025. "Fast Parallel Gaussian Filter Based on Partial Sums" Engineering Proceedings 104, no. 1: 1. https://doi.org/10.3390/engproc2025104001

APA Style

Bosakova-Ardenska, A., Andreeva, H., & Halvadzhiev, I. (2025). Fast Parallel Gaussian Filter Based on Partial Sums. Engineering Proceedings, 104(1), 1. https://doi.org/10.3390/engproc2025104001

Article Menu

Fast Parallel Gaussian Filter Based on Partial Sums^†

Abstract

1. Introduction

2. Materials and Methods

2.1. Related Works

2.2. Parallel Algorithms for Gaussian Filtering

2.3. Experimental Settings

2.4. Evaluation Quality of Filtering and Time Performance

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Fast Parallel Gaussian Filter Based on Partial Sums †

Abstract

1. Introduction

2. Materials and Methods

2.1. Related Works

2.2. Parallel Algorithms for Gaussian Filtering

2.3. Experimental Settings

2.4. Evaluation Quality of Filtering and Time Performance

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Fast Parallel Gaussian Filter Based on Partial Sums^†