2.5. Self-Adjusting Guided Filtered Image Fusion
The guided filter method was first proposed by He et al. [
27]. A guided filter is a filter that preserves edges, e.g., a bilateral filter. A guided filter kernel is fast, regardless of its size and strength range, and is not impeded by a directional reversal structure. Guided filters were often used in image fusion in previous studies; thus, we optimized the filtering method to fit our algorithm:
A guided filter assumes that an output
is a linear transformation of a guidance image
in a window centered on a pixel
k, where
is the window, and
and
are linear correlation coefficients that minimize the squared difference between an output image
and input image
:
When the center pixel changes, the result image also changes. In order to reduce this variation, the result image is determined by averaging the estimates from and .
The guided filter was utilized in a sliding window, and filters are applied to the target area according to the size of the window. However, it should be able to weight the image boundaries while preserving a wide area. To select an accurate focus area according to the microscope’s field-of-view area, the window size was automatically adjusted with a self-adjusting guided filter [
28]. Therefore, the guided, filtered image, which was affected by window size, was expanded to one-quarter of the size of the entire image, which accelerated the parameter adjustment process. The scale factor s determines the rate of expansion. Thus, we set factor
s = 2 in the experiment.
Figure 3 presents the results for various values of the window size
r. If the
r is set to a small value, a gap between the fused areas occurs. On the contrary, if the
r is set too large, a fusion occurs with unnecessary parts of the image, making it impossible to create a natural all-in-focus image.
After multiplying the original image by the local focus area extraction mask, the focus areas obtained for each image were combined into a single all-in-focus image. Each image had a different focus area; therefore, different image sequence values were included. Additionally, for overlapping focus areas, we used the pixel-wise weighted averaging rule. The pixel-wise weighted averaging rule refers to the method of assigning weights to compensate for the brightness of images during the process of blending between pixels. The final focus area mask produced by the guided filter becomes blurred from the inside to the boundary lines, resulting in smaller pixel values. These pixel values are then regarded as weights. When the source images are fused with respect to the weights, smoothing results are obtained, while maintaining the boundaries between images. The procedure is shown in Algorithm 1.
Algorithm 1 Multi-focus image fusion algorithm. |
1: Input : Source images from fluorescence microscopies. |
2: Output , All-in-focus image. |
3://Obtain guided filtered focus map of source images |
4://Obtain output by selecting the pixels from the set of source images, which depends on the calculated weight of the guidance image for the respective pixels. |
5: for |
6: for |
7: |
8: //Arrange the calculated weights of the guidance image with respect to the source images. |
9: for /where is the number of source images to be fused |
10: |
11: //Obtain output by sequentially multiplying the source with the maximum weight. |
12: end for |
13: end for |
14: end for |
2.6. Objective Evaluation Metrics
An objective evaluation of fused images is difficult because there are no standard metrics for evaluating the image fusion process. “Full-reference” condition represents that reference image is secured, and there is a “no-reference” or “blind” condition where reference images are not available, as in many real applications. The image used in the first experiment is “Full-reference” condition, and the dataset used in the second experiment is “blind” condition [
29]. Therefore, the following objective assessment metrics were applied according to the conditions.
First of all, there are the “Full-reference” state-only evaluation methods:
is an information-based convergence indicator based on a normalization that overcomes the instability of mutual-information-based indicators. It was proposed by Hossny et al. [
30]:
Here, is the entropy of the image, and is the mutual information value between two images, and .
is an information-based fusion indicator proposed by Wang et al. [
31];
denotes the eigenvalues of a nonlinear correlation matrix:
is the most-well-known image fusion evaluation metric that measures the degree of gradient information preserved in fused images relative to input images [
32]:
Here, the width of the image is W, and the height is H; and and are representative of the edge strength and gradient information preserved in the fused image relative to the original image, respectively. The same notation applies to . and are the weights of and , respectively.
is an evaluation metric based on phase congruency. Phase congruency contains prominent feature information from images, such as edge and corner information [
33]:
Here, p, M, and m are the phase congruency, maximum moment, and minimum moment, respectively;, , and are the maximum correlation coefficients between fused images and input images; and α, β, and γ are the parameters used to adjust the significance of each of the three coefficients, respectively.
is a method based on the human visual system model. It consists of contrast filtering, local contrast calculation, contrast preservation, and quality guidance methods [
34]:
Here,
and
denote the contrast information of the input images preserved in a fused image, and
denotes the weight value of an input image.
is defined as the mean value of
as follows:
Peak signal-to-noise ratio (
PSNR) and structural similarity index measure (
SSIM) were used as full-reference quality assessment methods.
PSNR is an engineering term for the ratio between the maximum possible power of a signal and the power of corrupting noise that affects the fidelity of its representation [
35].
PSNR is most easily defined via the mean squared error (
MSE). Given a noise-free
image and its noisy approximation,
PSNR is defined as:
Here, is the maximum possible pixel value of the image. Because it is measured in logarithmic scale, the unit is dB, and the smaller the loss, the higher the value. For lossless images, the PSNR is not defined because the MSE is zero.
The
SSIM is used for measuring the similarity between two images [
36].
SSIM is a perception-based model that considers image degradation as a perceived change in structural information, while incorporating important perceptual phenomena, including both luminance-masking and contrast-masking terms. The difference with other techniques such as
MSE or
PSNR is that these approaches estimate absolute errors. Given an original image and distorted image,
SSIM is defined as:
Here, is the average of , is the variance of and the same notation applies to and . is the covariance of and . and are two variables to stabilize the division with the weak denominator.
No-reference methods were employed for fused images because reference images are commonly unavailable. One of the most representative no-referenced image quality assessments is BRISQUE, which was introduced by Mittal et al. [
37]. BRISQUE is an algorithm that operates on the assumption that if a natural image is distorted, then the statistics of the corresponding image pixels is distorted. A natural image is an initial image captured by a camera that is not processed. Natural images exhibit regular statistical characteristics. The histogram of pixel values takes the form of a Gaussian distribution when processing the MSCN for such an image. For an image quality evaluation, after processing the MSCN, the pixels values were matched with a generalized Gaussian distribution (GGD) to utilize information regarding the pixel distribution as a characteristic feature. The parameters and variance values were compared to the GGD with the most similar forms to evaluate the characteristics of the target image.
Additionally, we defined the NIQE method. This method was also proposed by Mittal et al. [
38]. The more similar the output of this method is to a test image, the better the quality of the test image. We also applied preprocessing using MSCN to divide images into patches. We could then derive BRISQUE characteristics within patches and calculate image quality values using mean vectors and covariance metrics.