Decomposed Multilateral Filtering for Accelerating Filtering with Multiple Guidance Images

This paper proposes an efficient algorithm for edge-preserving filtering with multiple guidance images, so-called multilateral filtering. Multimodal signal processing for sensor fusion is increasingly important in image sensing. Edge-preserving filtering is available for various sensor fusion applications, such as estimating scene properties and refining inverse-rendered images. The main application is joint edge-preserving filtering, which can preferably reflect the edge information of a guidance image from an additional sensor. The drawback of edge-preserving filtering lies in its long computational time; thus, many acceleration methods have been proposed. However, most accelerated filtering cannot handle multiple guidance information well, although the multiple guidance information provides us with various benefits. Therefore, we extend the efficient edge-preserving filters so that they can use additional multiple guidance images. Our algorithm, named decomposes multilateral filtering (DMF), can extend the efficient filtering methods to the multilateral filtering method, which decomposes the filter into a set of constant-time filtering. Experimental results show that our algorithm performs efficiently and is sufficient for various applications.

Filtering is a basic tool for handling such multimodal signals.Multilateral filtering, which is one type of edge-preserving filtering, successfully handles multiple signal information.Edge-preserving filtering with additional guidance information, called joint edge-preserving filtering, recently attracted attention from image processing and computational photography researchers for sensor fusion.Joint edge-preserving filtering helps transfer major characteristics from guidance images, which are not filtering images themselves.Various applications use the filters, including flash/no-flash photography [16,17], up-sampling/super resolution [18], compression noise removal [19], alpha matting [20], haze removing [21], rain removing [22], depth refinement [23,24], stereo matching [25,26], and optical flow estimation [27].
Joint/cross bilateral filtering [16,17] is a seminal work of joint edge-preserving filtering.The filter is naturally derived from bilateral filtering [28] by computing the kernel weight from a guidance image instead of an input filtering image.This formulation enables us to reflect the edge information of the guidance image (e.g., RGB, infrared, and hyperspectral images) to the filtering target image (e.g., RGB image, alpha mask, depth map, and optical flow).
We can expect a higher edge-preserving effect using multiple guidance images (e.g., a set of multi-sensor signals), and recently, we have been able to capture not only RGB images but also infrared, hyperspectral, depth, and other images by new devices (e.g., infrared/hyperspectral cameras and depth sensors).The images have different edge information and signal characteristics from RGB images.The multiple guidance information is helpful for improving signal visibility and the signal-to-noise ratio [29,30].In other cases, we can deal with a self-generated image and inversely rendered maps as an additional guidance signal [31,32].
There are two categories for using multiple guidance images in image filtering: highdimensional filtering [30,[33][34][35][36][37] and multilateral filtering [29,31,32,38,39].The former is additive logic, and the latter is multiplicative logic for additional kernels.The main difference is the severity of the restriction to compute the kernel weight.The restriction of additive logic is looser than that of multiplicative logic; hence, high-dimensional filtering can robustly smooth out noise or rich textures.By contrast, since the restriction of the multiplicative logic is severe, multilateral filtering produces fewer blurred regions.Each filtering method has advantages and disadvantages, but multilateral filtering is preferred when we expect a sharply edge-preserving effect.
A critical issue of edge-preserving filtering for multimodel sensing is computational time.This is because sensing is the gateway to all processing, and signal processing during sensing is expected to operate in real time.Therefore, many researchers have proposed acceleration methods for edge-preserving filtering.In particular, the acceleration for bilateral filtering has been actively discussed.The bilateral grid [40,41] is the seminal approach, and Yang et al. [42,43] extend it to bilateral filtering in constant time.Yang's method [43] has adequate efficiency in grayscale images, and recent work further accelerates the bilateral filter [44][45][46]; however, they are inefficient in color cases.There are several proposals [34,47,48] to approximate and accelerate bilateral filtering in the case of higherdimensional (color) images.Furthermore, hardware-friendly methods are proposed [49][50][51].However, these approaches have limitations for kernel weight, whose kernels are defined by the Gaussian distribution.Other efficient edge-preserving filters, which do not limit the Gaussian distribution, are proposed in contrast to the bilateral filtering acceleration.Guided image filtering [20], domain transform filtering [33], and adaptive manifold filtering [30] are representative examples.These filters have assumptions different from Gaussian smoothing but have excellent edge-preserving effects and efficiency.Note that these filters can handle similar signals better than those with different modalities and characteristics.
Multiple guidance images provide richer information for various applications; however, these efficient methods cannot individually handle multiple guidance images.Therefore, we propose an efficient algorithm for accelerating multilateral filtering, which is developed for multiple-guidance image filtering.Furthermore, we extend the efficient edge-preserving filters so that they can exploit multiple guidance images.
Our algorithm is based on the fact that n-lateral filtering is represented by the summation of (n − 1)-lateral filtering.Therefore, when multilateral filtering is expanded as an asymptotic expression, it becomes constant-time filtering since 1-lateral filtering is spatial filtering.Figure 1 denotes the overview of the proposed filter algorithm.The proposed filter-named DMF: decomposed multilateral filtering-recursively decomposes multilateral (n-lateral) filtering by splatting to (n−1)-lateral filtering until it is a constant-time filter.Then, the results of constant-time filtering for the decomposed components are merged into the result of multilateral filtering.The contributions of this paper are summarized as follows: 1.
Proposing a multilateral extension to the filter that uses the filtering output as a guidance image, such as rolling guidance filters [52] (Section 6.2).

Related Work
Due to physical constraints, a single image sensor cannot simultaneously capture rich information such as resolution, wavelength range, focus, dynamic range, and scene features.Image fusion is one way to solve this problem.Research on image fusion is active, with the number of papers increasing each year, as well as many survey papers [53][54][55][56][57][58][59][60][61][62].Image fusion involves smoothing, denoising, enhancement, sharpening, super-resolution, and blending for multiple signals to obtain the desired signal.Image fusion is mainly divided into digital photography image fusion and multimodal image fusion.
Digital photography image fusion combines images taken by the same sensor with different sensor settings and includes multi-focus image fusion, multi-exposure image fusion, multi-temporal image fusion, and multi-view image fusion.In multi-focus image fusion, an all-in-focus image is synthesized from images taken at different focus settings, and in multi-exposure image fusion, a wide dynamic range image is synthesized from images taken at different dynamic ranges.Multi-exposure image fusion also includes the use of different external flash environments.Multi-temporal image fusion synthesizes signals that vary along a time axis, while multi-view image fusion synthesizes signals from camera motion or multiple cameras capturing a scene.
Multimodal image fusion combines different characteristics of multiple sensors into one, including RGB-IR fusion, multi-hyperspectral-panchromatic image fusion, RGBdepth/LiDAR fusion, and medical image fusion (CT, PET, MRI, SPECT, X-ray), etc.In RGB-IR fusion, visible images are combined with IR images, taking advantage of the high contrast of IR and the good texture characteristics of RGB in the visible region.It also combines images using the different wavelength bands that can be captured by external flashes.In multi-hyperspectral-panchromatic image fusion, sensors that acquire images in different wavelength bands and resolutions are combined, and each sensor often has a different resolution and noise sensitivity.The objective is to improve the resolution and noise sensitivity of each sensor.RGB-depth/LiDAR fusion corrects depth sensor output from RGB images, including upsampling of depth information, missing interpolation, and contour correction noise reduction.Medical image fusion integrates the output of various medical sensors in the same dimension to assist in the diagnosis.
Among these image fusion methods, those that improve the acquisition signal are called sharpening fusion, which aims at signal denoising, sharpening, contrast improvement, and resolution improvement.In image fusion, various tools are used, such as weighted smoothing filtering, morphology filtering, principal component analysis (PCA), Laplacian pyramid, discrete cosine transformation (DCT), discrete Fourier transformation (DFT), discrete wavelet transform (DWT), etc.This paper is an extension of the weighted smoothing method.In particular, the proposed method extends existing smoothing/weighted smoothing methods to guided smoothing and has a wide range of applications.

Preliminaries
In this section, we review the previous work of constant-time bilateral filtering proposed by Yang et al. [42,43].Bilateral filtering [28] is a representative edge-preserving smoothing filtering defined as a finite impulse response (FIR) manner.This filtering achieves edge-preserving effects by filtering in the range and spatial domains; thus, its filtering kernel weights are derived from a product of spatial and range weights based on a Gaussian distribution.Let input and output images be denoted as I, O B : S → R, where S ⊂ Z D is the spatial domain, R = [0, n−1] d ⊂ R d is the range domain, and d is the color range dimension (generally, D = 2, N = 256, and d = 3), respectively.Bilateral filtering is formulated as follows: where p, q ∈ S represents a target pixel and a neighboring pixel of p, respectively.I p , I q ∈ R are pixel values at p, q.N p ⊂ S is a set of neighboring pixels of p. f S : S × S → R, f R : R × R → R are weight functions based on the Gaussian distribution whose smoothing parameters are σ S and σ R , respectively.Here, we can formulate joint bilateral filtering [16,17] by replacing I in (1) with an arbitrary additional guidance image J : S → R.
Naïve bilateral filtering is O(r 2 ) per pixel algorithm, where r is the filtering kernel radius; thus, the computational complexity increases exponentially when the kernel size is large.Several constant-time-per-pixel algorithms for bilateral filtering have been proposed to solve this problem.In particular, the algorithm proposed by Yang et al. [42,43] is the basis of the proposed method.
Yang et al. proposed a constant-time algorithm by extending the bilateral grid [40,41].The algorithm decomposes bilateral filtering into a set of spatial filtering that can be computed in constant-time (e.g., box filter using integral image [63,64] and the recursive Gaussian filter [65][66][67][68]).The decomposition is conducted by computing principle bilateral filtered image components (PBFICs) [43] from the input or guidance image.Since arbitrary range filtering weights can generate PBFICs, the algorithm can compute the arbitrary bilateral filtering response in the range kernel.
Yang's algorithm [43] is further extended to apply to multichannel images in [42].The extended algorithm computes multichannel images by preparing multichannel PBFICs with combinations of pixel values in each channel.However, this extension requires uniform processing for all channels.In other words, we cannot filter for each channel differently.This indicates that the algorithm is extendable when we compute multichannel or multiple guidance images with differential characteristics in each channel.
Our algorithm is inspired by Yang's algorithm [42,43], which represents bilateral filtering by a set of spatial filtering.In contrast, our algorithm decomposes a filter for multichannel images into arbitrary constant-time filters.

Relationship between Multilateral Filtering and Higher-Dimensional Filtering
In this section, we compare the filtering properties between multilateral filtering (MF) and high-dimensional filtering (HDF).The main difference between them is the logic to compute the filtering weight.The weight of multilateral filtering f M ∈ R is computed by the multiplicative logic from spatial weight and range weights of multiple guidance images: where f i R : R i × R i → R is a filtering weight for the i-th guidance image J i : S → R i , where R i is the range domain of J i .m is the number of guidance images.An early work on MF was proposed by Choudhury and Tumblin [32].Each range weight f i R for the guidance image is individually defined to represent the characteristics of the image.
HDF's weight f H ∈ R is computed by the additive logic: where ρ ∈ R denotes an arbitrary weight function at the pixel p; ℓ γ ∈ R denotes an arbitrary norm function; V p denotes higher-dimensional information consisting of spatial and range information, e.g., V p = (x p , y p , r p , g p , b p ) in RGB image, and |V p | ∈ Z is the size of V p .The work of Gastal and Oliveira [30] is a successful extension for HDF with multiple guidance information.They exploited additional guidance information to increase higher-dimensional information V.
The two logics differ in terms of the severity of the restriction to compute the kernel weight; the multiplicative logic's restriction is more severe than the additive logic.The difference affects the edge-preserving performance.Figure 2 shows examples of HDF and MF weights.HDF assigns the low weights as a whole, even if the guidance pixel is hardly relevant to the target pixel.In contrast, MF assigns the low weights with the guidance pixel having a similar target pixel value.In this way, MF has a high edge preservation effect; hence, it is preferred when it is significant.Difference of filtering weights between HDF and MF.HDF weights are computed using the method in [30].The red point represents the target pixel to compute the kernel weights.

Proposed Method: Decomposed Multilateral Filtering
The proposed filter of DMF first decomposes MF until constant-time filtering.This allows us to convert the computational complexity from O(r 2 ) to O(1) per pixel.This section defines MF and proves its decomposability.Algorithm 1 reviews the flow of DMF.Next, we discuss the extension of the algorithm.

Definition of Multilateral Filtering
MF assumes that the filtering weight is derived from the multiplicative logic discussed in Section 4. Furthermore, MF is equivalent to n-lateral filtering when n−1 guidance images are used for filtering.When n = 1 or 2, n-lateral filtering means spatial filtering or bilateral filtering, respectively.Therefore, we assume n ≥ 3 in this section and compute n-lateral filtering output O n : S → R as follows: where f i R denotes the range filtering weight for i-th guidance image J i .The n-lateral filtering weight is f n ∈ R. Equations ( 4) and ( 5) are the basic formulation of MF.Here, we basically define the first filter of f 1 as a spatial filter in (6), which is an arbitrary linear-time invariant (LTI) filter (e.g., box, circler, Gaussian, and Gabor filters) and linear-time variant (LTV) filters (e.g., spatially adaptive Gaussian filter [69]).LTI filters can be performed in O(1) by sliding DCT [68] filtering, but adaptive filtering has difficulty in converting O(1) filters.

Recursive Representation for Decomposed Multilateral Filtering
We introduce DMF and prove the decomposability of MF in this subsection.In (5), the n-lateral filtering weight f n can be replaced with the product of its one-dimensional lower weight f n−1 and the range weight f n−1 R : We can re-formulate MF from (4) using (7) as follows: This form shows that we can express n-lateral filtering using (n−1)-lateral filtering weight.Furthermore, we deform (8) by the additional assumption that the pixel values of the guidance images are discrete.Let c ∈ R n−1 = [0, N n−1 − 1] be a constant value, where N n−1 is the number of tones in the range of the (n−1)-th guidance image.When a pixel value in the n-th guidance image J n−1 p in ( 8) is replaced by a constant value c, it is rewritten as follows: where ∥ • ∥ 1 is ℓ 1 norm operator.We call C n−1 c : S → R a component image of n-lateral filtering, and its pixel value at p is denoted by 9) can be cached as images in constant-time; hence, we express these coefficients as the following images: where : S → R are the elements of the denominator and numerator in (9), respectively.We call the processes for Equations ( 11) and (12) as splatting followed by the paper [47,48], and we call the images as coefficient image.For simplification, we rewrite (9) using these coefficient images and the convolution operator * : where the pixel operator p can be dropped.Figure 3 shows the splatting procedure in DMF.The denominator and numerator in ( 13) represent (n−1)-lateral filtering.This indicates that n-lateral filtering has been decomposed into (n−1)-lateral filtering.Therefore, MF can be decomposed recursively: where g x • denotes a decomposing operator, as described in Equations ( 11)- (13).Equation ( 14) summarizes the DMF formulation.Since f 1 (e.g., Gaussian filtering) can be computed in constant-time per pixel by recursive algorithms, and the decomposing operation is independent of kernel size, DMF can also be computed in constant-time.

Tonal Range Subsampling
The exact filtering result can be obtained by computing coefficient images for all values c ∈ R n−1 ranged in the guidance image J n−1 .Here, we increase efficiency by quantizing the guidance tonal ranges.Let L n−1 be a quantized set of R n−1 , and T n−1 = |L n−1 | be the number of tones in a quantized tonal range of the (n−1)-guidance image, where where k ∈ {0, 1, . . ., |T n−1 |} that the return value is in the quantized range domain.We can obtain the final output of DMF by linear interpolation of the current and next coefficient images (i.e., C n−1 s.t.k = arg min

Spatial Domain Subsampling
Considering the sparsity of the coefficient images, we can also apply subsampling in the spatial domain for further increasing efficiency, as discussed in [41,42].In the DMF case, we can apply spatial subsampling to the coefficient images in several steps: the first and the second decomposition.If we apply spatial subsampling to DMF in the n-lateral filtering splatting, the process is computed as: where X ↓ and X ↓↑ (X = {K, W, C}) are the downsampled and upsampled images, respectively.We use the average nearest-neighbor pixels and linear interpolation for subsampling and upsampling, respectively.Our method can apply different ratios of spatial subsampling to arbitrary guidance channels based on the sparsity of each channel (e.g., YUV image components of JPEG and MPEG format, RGB-D images).The flexibility is an advantage for Yang's algorithm [42].

Beyond Gauss Transform
DMF can deal with any multilateral filtering responses and does not limit the Gauss transform [70], which is the combination of Gaussian filtering.DMF can select arbitrary ranges and spatial filters.Furthermore, we can select the filtering responses by changing the final filtering step.
We should have DMF until the spatial filtering in (6).In contrast, DMF does not always require decomposition until it is spatial filtering.Specifically, we can apply any joint edge-preserving filters for the final filtering step while decreasing the number of decompositions.Some edge-preserving filtering can handle multichannel signals in the designed weight function.For example, high-dimensional Gaussian filtering handles multichannel signals by the Gaussian distribution with the Euclid distance; instead, domain transform filtering uses the geodesic distance.Guided image filtering handles them by the local linear model with ℓ 2 norm between signals.
Therefore, when the final filter is performed in edge-preserving filtering, the DMF decomposition can be reduced by the number of dimensions handled by the edge-preserving filtering.Let the handling signal set be G = {J s , . . ., J 1 }, where s is the number of handling channels (i.e., s = 3 in the RGB image case).Using edge-preserving filtering, (13) of the final step is replaced as: where H G * represents any joint edge-preserving filtering with the guidance signal set as G.
The representation allows us to extend various edge-preserving filtering methods to handle multiple guidance images.This fact is helpful for various applications since the required filtering properties, e.g., local linearity [20] and geodesic distance [33], are different by application.Note that it has the potential to be faster because of the merged treatment of dimensions, but the filter may not work well if the characteristics of the combined set are not identical.

Multilateral Rolling Guidance Filtering
MF is also helpful in self-generating multiple guidance information from single guidance information, such as rolling guidance filtering [52].Rolling guidance filtering is iteratively processed using the filtered image as the guidance image.In this regard, the filtering image is fixed as the input image and the guidance image varies.This iterative representation can be applied to multilateral filtering with some modifications.We call it multilateral rolling guidance filtering (MRGF) and show the actual processing in Figure 4.The main difference is the filter output as an additional guidance image.
MRGF is specifically practical when edge information is essential, such as image segmentation and feathering.Since we can reflect the smoothed or refined results to the filtering target image, it tends to refine the desirable features for the target image.Significantly, the first estimated maps of scene properties often contain noises and errors.MRGF has good performance in the refinement of maps.We verify the performance of MRGF in the following experimental part.

Experimental Results
We evaluated the proposed filter of DMF in terms of accuracy and efficiency.The implementations for DMF and competitive methods are written in C++, and the codes are parallelized by OpenMP and vectorized by AVX2.We used Intel Core i7 7700K (four cores, eight threads) and Visual Studio2022 compiler for experiments.

Accuracy Evaluation
First, we evaluated how much the DMF result corresponds to the naïve implementation of the MF result.In our experiments, we applied our algorithm to trilateral filtering [29].Note that the Gauss transform is applied to the trilateral filtering weights for spatial and range weights, where the standard deviations are σ s , σ r1 , and σ r2 .Here, σ r1 and σ r2 are the parameters for the tonal ranges of the guidance and filtering images, respectively.We apply recursive Gaussian filtering with sliding DCT [68,84] for spatial filtering.We evaluate the accuracy of our algorithm by flash/no-flash denoising [16,17].The range kernels f R 1 and f R 2 for MF are computed from flash images and no-flash images, respectively.We use the peak signal-to-noise ratio (PSNR) [85,86] as the objective evaluation method of the approximation accuracy between naïve results and the approximation results.The evaluation formula used is as follows: where ∥ • ∥ 2 is the ℓ 2 norm operator, A is approximated signals proceded by DMF, and G is ground truth signals produced by naïve MF. S is the number of the elements in the signal A and G, S = |A| = |G|, where | • | returns the number of vector elements.
Figure 5 shows the results of the filtering accuracy in terms of the number of coefficient images.Note that T 2nd and T 3rd are the tonal-quantized numbers of coefficient images for the flash and no-flash images, respectively.The filtering accuracy in each case is high overall.For this result, eight coefficient images are enough.This trend is also the same in spatial subsampling, and spatial subsampling is practical because the PSNR accuracy is over 45 dB.It can be seen that the PSNR degradation is more significant when downsampling at the first decomposition.As shown in the next section, downsampling at the first stage has a greater speedup effect; thus, it is up to the application to decide which one to choose.Figure 6 shows the filtering accuracy of the smoothing parameters.Although the accuracy of DMF varies depending on the parameters, ours has a high accuracy.We can see that it is not very sensitive to changes in spatial parameters σ s .Each of the two guides has similar range parameters σ r1 and σ r2 .The smaller the range parameter, the lower the approximation accuracy tends to be, and the effect is more pronounced when the number of decompositions is small.Initial subsampling is also susceptible to this effect.However, the proposed method has an approximation accuracy that is generally better than 45 dB for all parameters, which is sufficient because it becomes difficult for a person to distinguish between two images at around 40 dB [87].

Efficiency Evaluation
We compare the computational time in two cases for efficiency evaluation.One is a comparison between naïve MF and DMF combined with some edge-preserving filters.Another is a comparison with/without subsampling.In this experiment, we apply real-time bilateral filtering (RTBF) [42,43], guided image filtering (GIF) [20] and domain transform filtering (DTF) [33] to (20).Note that we call DMF with these filters DMF-Gauss, DMF-GF, and DMF-DTF, respectively.Since RTBF can be interpreted as DMF with one guide image for Gaussian filtering, this is equivalent to DMF with two guide images for Gaussian filtering.Therefore, we refer to it as DMF-Gauss.For this experiment, flash and no-flash images were converted to grayscale, and the RGB no-flash image was filtered.In DMF-Gauss, the two-channel images were used as guides; in DMF-DFT and DMF-GIF, the no-flash images were used as guides for DMF, and the flash images were used as guides for DTF and GIF.Note that joint filtering is available for DTF and GIF.Cache-efficient filtering was computed using a one-pass version of Gaussian filtering for DMF-Gauss and box filtering for GIF [64]. Figure 7 shows the processing time results.The processing time of the naïve ML-Guass increases exponentially as the filtering kernel size increases, whereas DMF can be computed in constant-time from Figure 7a.DMF is especially efficient when we use GIF or DTF for the filtering step described in Section 6.1.Furthermore, DMF becomes more efficient by subsampling the spatial domain as shown in Figure 7b.Since DMF and GIF are not decomposable by the proposed method, only a one-step decomposition is possible.Therefore, the computation time for the second decomposition subsampling has not been reported.

Denoising Performance Evaluation
Here, we evaluate the denoising performance of the proposed method; note that it is not the approximation accuracy evaluated by Section 7.1.In our experiments, we used RGB-IR images and simulated RGB-IR fusion by adding noise to the RGB images.Performance is evaluated in terms of PSNR for the noiseless RGB image and the de-noised image; the IR image is not evaluated in terms of PSNR because it is not noiseless and is not the final visible image.The comparison methods are redundant DCT denoising (DCT) [74], domain transform filtering (DTF) [33], guided image filtering (GIF) [20], cross-field joint image restoration (CFJIR) [88] and high-dimensional kernel filtering (HDKF) [37].DCT, DTF, and GIF are extended by the proposed method to handle an additional guidance IR image, named DMF-DCT, DMF-DTF, and DMF-GIF.These methods were chosen for their highspeed performance.CGJIR and HDKF already use the characteristics of the guide image; thus, the proposed method extension is ineffective.For evaluation images, we used the RGB-IR dataset, which includes ten images [37] (https://norishigefukushima.github.io/TilingPCA4CHDGF/ (accessed on 16 January 2024)).
Tables 1-3 show PSNR results for each method in different noise levels.It can be seen that the classical method, DCT, has the best performance on average for all noise levels due to the DMF extension.CGJIR and HDKF are new dedicated methods for RGB-IR denoising, and performance comparable to these methods has been achieved by extending this method.All DMF extensions also show a steady improvement in performance.

Channel Perfomance Evaluation
Here, we evaluated the denoising effect of the number of channels for flash and no-flash images.In Section 7.1, guide images are grayscaled 2-channel, but here, we use two RGB images, 6-channel.In addition, the number of channels is controlled by using PCA dimensionality compression for guide images [37,89].Note that the denoising performance is different from the approximation performance.We used a flash/no-flash image dataset [37], which contains ten images.Images are filtered by multilateral filtering.In addition to PSNR, we used structural similarity (SSIM) [90], which is a more robust quality metric.Noise was only added in no-flash images.Tables 4 and 5 show the results for each metric.On average, the optimal value is taken by four channels in every metric.The SSIM, which is said to have a high human subjective evaluation value, shows that the value is high enough even for two channels.
Table 6 shows the computational time.The computation time increases exponentially with the number of channels.This indicates that we are suffering from the curse of dimensionality.Therefore, it is better to have as small a number of channels as possible.

Memory Usage Analysis
The memory requirement has linear relations in the number of pixels N p .The number of tones N t has exponential relations in the number of channels N c and multiple guidance images N j .Consequently, the amount of memory required is O(N p N N c N j t ), according to Algorithm 1.
The vast memory requirement is one of the limitations, and the limitation is inherited from previous work [42,43].However, tonal range and spatial domain subsampling can moderate the memory requirement.We can also make memory requirements independent of the number of channels by processing DMF, as discussed in [42,43].The implementation, however, loses parallelizability and computational efficiency somewhat.

Multilateral Filtering for Computational Photography
We verify the effectiveness of MF and DMF by applying several applications of sensor fusion in computational photography.

Flash/No-Flash Denoising
Flash/no-flash denoising [16,17] is the representative application for edge-preserving filtering with multiple guidance images.Flashing sometimes causes false edges (e.g., appearance/disappearance of shadow edges), as shown in Figure 8a.Joint bilateral filtering can remove noise in the no-flash image, but it simultaneously preserves the false edges of the flash image (Figure 8c).The conventional method requires multiple steps [17] to solve the problem, while MF requires only one step.As shown in Figure 8d, MF can remove noise while preventing false edges from being transferred.Note that we used the value information in the HSV color space of the no-flash image and the color-flash image as the guidance images.In addition, our algorithm can be efficiently computed by applying efficient edge-preserving filtering, such as domain transform filtering.

Depth Map Refining
Trilateral filtering is effective for refining degraded depth maps by lossy compressing [29].In the case of a single guidance image, the object boundaries in the depth maps are blurred even if the artifacts are removed; hence, there is a trade-off between denoising performance and the edge preservation effect (Figure 9c).MF can improve this problem by considering both the edges in the depth map and the guidance image (Figure 9d).Although this depth-refinery experiment targets removing coded artifacts, MF can also be applied to noise removal for a depth sensor.

Feathering
We demonstrate the property of our algorithm beyond the Gauss transforms.Guided feathering [20] refines a binary mask for alpha mating near the boundary of the object.For guided feathering, guided image filtering has excellent performance in terms of efficiency and accuracy [20].The result of naïve guided image filter is shown in Figure 10c.We can confirm that the feather can be computed in detail; however, several noises are simultaneously caused near the object boundary regions.This is because the local linear model of the guided image filter is violated.By contrast, MRGF results hardly include such noises, as shown in Figure 10d, while the detailed feather is computed.This result indicates that MRGF prevents the violation.

Haze Removing
Furthermore, our algorithm with guided image filtering is also effective for haze removal [21].The large kernel size for filtering is required in haze removal; thus, guided image filtering violates the local linear model, as well as guided feathering.Consequently, some haze remains in Figure 11b.On the contrary, MRGF with guided image filtering can suppress expansion in different objects, as shown in Figure 11c.

Conclusions
This paper presents an efficient algorithm of edge-preserving filtering with multiple guidance images for sensor fusion signals.Our algorithm, named decomposed multilateral filtering (DMF), can accelerate general multilateral filtering with the Gauss transform and extend various edge-preserving filtering methods to exploit multiple guidance images.In addition, we introduced a method to apply multilateral filtering for the output of multilateral filtering, such as rolling guidance filters [52], named multilateral rolling guidance filtering (MRGF).The experimental results showed that our algorithm has high accuracy and high efficiency.Furthermore, the proposed method is verified by various applications: flash/no-flash denoising, depth map refining, feathering, and Haze removal.
The limitations of our algorithm are that the computational time depends on the image dimensionality and the number of guidance images.However, this problem can be solved by clustering [37,91].In addition, automatic adjustment of the downsampling amount is also an issue.These issues can be resolved by extending Gaussian KD-trees [47] and permutohedral lattice [48].

Figure 1 .
Figure 1.Algorithm overview.n-lateral filtering denotes multilateral filtering that multiplies a spatial filter and n−1 range filters.Examples of multiple guidance information are flash images, segmentation masks, and depth maps.The key point of the proposed algorithm is that it decomposes multilateral filtering into a set of constant-time filters.For more information on implementation, our code is available at https://fukushimalab.github.io/dmf/(accessed on 16 January 2024).

Figure 2 .
Figure2.Difference of filtering weights between HDF and MF.HDF weights are computed using the method in[30].The red point represents the target pixel to compute the kernel weights.

Figure 4 .
Figure 4. Multilateral rolling guidance filtering.Note that the constraining information J 2 is the same as the filtering image I when the filtering process is the first time.

Figure 5 .
with respect to T 2nd PSNR accuracy with respect to the number of coefficient images.The parameters are σ r1 = 32, σ r2 = 32, σ s = 8.SS denotes spatial subsampling rates, ×1  16 .We tested 4 images for the input image.

Table 1 .
PSNR accuracy of denoising (dB).The Gaussian noise parameter is σ = 10.* Filters that already use the characteristics of the guide image.Bold numbers mean the best results.

Table 2 .
PSNR accuracy of denoising (dB).The Gaussian noise parameter is σ = 20.* Filters that already use the characteristics of the guide image.Bold numbers mean the best results.

Table 3 .
PSNR accuracy of denoising (dB).The Gaussian noise parameter is σ = 30.* Filters that already use the characteristics of the guide image.Bold numbers mean the best results.

Table 4 .
PSNR accuracy metrics where higher values indicate better flash/no-flash denoising with PCA (dB).Gaussian noise parameter for no-flash images is σ = 10.Bold numbers mean the best results.

Table 5 .
SSIM accuracy metrics where higher values indicate better flash/no-flash denoising with PCA.Gaussian noise parameter for no-flash images is σ = 10.Bold numbers mean the best results.

Table 6 .
Processing time of filtering with multi-channel guide images.