Two-Dimensional Orthonormal Tree-Structured Haar Transform for Fast Block Matching

: The goal of block matching (BM) is to locate small patches of an image that are similar to a given patch or template. This can be done either in the spatial domain or, more efﬁciently, in a transform domain. Full search (FS) BM is an accurate, but computationally expensive procedure. Recently introduced orthogonal Haar transform (OHT)-based BM method signiﬁcantly reduces the computational complexity of FS method. However, it cannot be used in applications where the patch size is not a power of two. In this paper, we generalize OHT-based BM to an arbitrary patch size, introducing a new BM algorithm based on a 2D orthonormal tree-structured Haar transform (OTSHT). Basis images of OHT are uniquely determined from the full balanced binary tree, whereas various OTSHTs can be constructed from any binary tree. Computational complexity of BM depends on a speciﬁc design of OTSHT. We compare BM based on OTSHTs to FS and OHT (for restricted patch sizes) within the framework of image denoising, using WNNM as a denoiser. Experimental results on eight grayscale test images corrupted by additive white Gaussian noise with ﬁve noise levels demonstrate that WNNM with OTSHT-based BM outperforms other methods both computationally and qualitatively.


Introduction
Block matching (BM) is a fundamental method to locate small patches in an image that match a given patch which is referred to as a template. It has many practical applications, such as object detection [1], object tracking [2], image registration [3], and image analysis [4,5], to name few. Block matching requires the vast computations due to a large search space involving many potential candidates. Full search (FS) algorithm is generally most accurate BM, in which the similarity scores of all candidate windows to the template are calculated in a sliding window manner in the spatial domain. To speed up the matching procedure, various fast algorithms have been proposed. They can be classified into the following two main categories: full search equivalent and non full search equivalent algorithms. Full search equivalent algorithms accelerate the BM by pruning many candidate windows that cannot be the best match windows. These algorithms ensure the results of the full search algorithm. Conversely, non full search equivalent algorithms accelerate by limiting the scope of the search space or by using approximated patterns. The results of non full search equivalent algorithms may be different from those of the full search algorithms. Many full search equivalent algorithms have been proposed in literature, see e.g., [6,7]. The BM methods can be also categorized into spatial-and transform-based. Among transform-based methods, decompositions by the rectangular orthogonal bases, such as orthogonal Haar and Walsh, are most studied ones [8,9]. As it was demonstrated in [8], BM in orthogonal Haar transform (OHT) domain appears to be more efficient than BM based on Walsh-Hadamard transform (WHT) [9], Gray-Code kernels(GCK) [10], and incremental dissimilarity approximations (IDA) [7]. One of the reason behind this is the use of the integral image, technique originally proposed by Crow [11] and broadened later by Viola and Jones [12]. Once the integral image is generated, the sum of pixel intensities of a rectangle region in the image can be easily obtained by three operations (two subtractions and one addition), regardless of the size of a region. Thus, as it was demonstrated in [8], integral image can be a useful tool to calculate OHT coefficients, and OHT is efficient especially when the templates size is large. To evaluate the speed-up over FS equivalent methods, a template of size 2 n × 2 n (n ≥ 4) was considered, with the standard deviation of pixel intensities in the template greater than 45. In [13], it was reported that the algorithm based on OHT was faster than other algorithms, including low resolution pruning (LRP) [14], WHT [9], and fast Fourier transform (FFT).
Despite the above mentioned benefits of OHT-based BM, it has the following drawback -the block size shall be a power of 2, restricting an applicability of OHT-based BM methods, e.g., in nonlocal image restoration [15][16][17][18][19], in which the size of patch is important for the restoration performance. Nonlocal image restoration methods use the fact that there exists a high level of self-similarity (fractal similarity) in natural images, and one can use this for collaborative processing of similar patches extracted from an image. Non-local image denoising method uses BM to collect similar patches and process them collaboratively. The denoising performance directly depends on patches collected in the image. One of the examples of such application is image denoising [15][16][17], where various size of templates (a regions centered at each pixel) are used depending on noise level. For example, non-local mean denoising [15], uses the template size is 7 × 7 for a moderate noise level; in weighted nuclear norm minimization denoising method [16], templates of sizes 6 × 6, 7 × 7, and 8 × 8 are used.
In the present paper, we propose a specific design of the orthonormal tree-structured Haar transform (OTSHT) for fast BM with an arbitrary size. The one-dimensional OTSHT [20], proposed by one of the authors, has a freedom of the design and meets the requirements of fast BM. We present the mathematical expressions defining 2D OTSHT, construct several types of the two-dimensional OTSHTs including two prime tree structures, and evaluate them as FS equivalent algorithms in terms of speed and pruning performance. In addition, as a non FS equivalent algorithm, we demonstrate the applicability of the proposed OTSHT in the state-of-the-art image denoising. The obtained results demonstrate that the new method is faster and even produces slightly better PSNR than those where FS or OHT are used. This paper extends the results of the initial study, presented in [21,22].
The paper is organized as follows: We present the mathematical expression and concrete basis images of OTSHT for BM in Section 2. The fast BM algorithm using OTSHT is described in Section 3. Our evaluations of the specific designs of OTSHT for BM are detailed in Section 4. The application to image denoising is demonstrated in Section 5. Finally, in Section 6, we conclude our study.

Basis Images of Two-Dimensional Orthonormal Tree-Structured Haar Transform for Fast Block Matching
In this section, we consider the basis images of orthonormal Haar transform for fast BM with an arbitrary patch size. To do this, extending the OTSH transforms, introduced in [20], to the 2D and select two extreme cases of these transforms: one based on balanced-binary tree decompositions and the second one on the logarithmic tree decomposition.

Binary Tree and Interval Subdivision
Two-dimensional orthonormal tree-structured Haar transform is designed by an arbitrary binary tree having N leaves with d depth.
In the binary tree, the topmost node is referred to as a root and the bottom nodes are referred to as the leaves. Each node is labeled by α. The labeling process starts from the root. The left and right children of the root are labeled as 0 and 1, respectively. When the node has two children, the left and right children are labeled by adding 0 and 1 to the right end of the precedent node label, respectively.
Let α 0 and α 1 be the left and right children of the node α, respectively. Let ν(α) be the number of leaves that node α has. The interval, I α , of node α is defined from the structure of the binary tree. Intervals I root , I 0 , and I 1 are defined as Otherwise, for I α = [a, b), and Figure 1 shows the binary tree and the interval splitting. The tree has three leaves with depth two. A circle represents a node. The number above the circle is the label and the number in the circle is the number of the leaves that the node has.

Orthonormal Tree-Structured Haar Transform Basis Images
Label β is introduced for the vertical direction in addition to label α for the horizontal direction. A total of N 2 basis images of size N × N are generated from a binary tree having N leaves.
There are four functions for constructing the basis images of OTSHT for BM. The function for regions (I root × I root ) is defined as which is used once to generate the first basis image. Otherwise, for region (I α × I β ), the following functions are used: The interval of the nodes of focus is used for generating the positive and negative value regions, where the region is decomposed according to the intervals in the horizontal and vertical directions. Figure 2 illustrates a set of procedures for decomposition of the region when the nodes of focus are (α, β).
First, the region is divided by ϕ i . Next, the positive region in white is divided by ϕ 2 , Finally, the negative region in black is divided by ϕ 3 . This procedure is iterated until all space cannot be divided.
A positive value and a negative value regions are represented in white and black, respectively. First, region (I α × I β ) is vertically divided into two regions (I α × I β 0 ) and (I α × I β 1 ), and the value at each region is assigned by (7). Then the positive value region (I α × I β 0 ) is horizontally divided into two regions (I α 0 × I β 0 ) and (I α 1 × I β 0 ) by (8), while the negative value region is divided into two regions (I α 0 × I β 1 ) and (I α 1 × I β 1 ) by (9).

Balanced Binary Tree and Logarithmic Binary Tree
Tree structured Haar transform has a freedom of the design. We consider two prime tree structures, balanced binary tree and logarithmic binary tree. They are extreme cases.
The logarithmic binary tree is the special case of the Fibonacci p-tree [20] when p → ∞. Figure 5a shows an example of the logarithmic binary tree of the depth 4 having N = 5 leaves. Figure 5b shows the appearance of logarithmic binary tree-based (L-) OTSHT basis images generated by (6)- (9). In the set of 25 basis images, there are totally r = 15 rectangles with the different sizes: Appearance of generating the basis images. As we have seen, although the number of leaves is the same, the different structures are constructed, which leads to the different number of rectangles. The number of rectangles with different sizes affects the computational complexity.

Relation between OHT and OTSHT
The OHT is the special case of OTSHT, when the tree for constructing OHT is a full balanced binary tree. Figure 6a,b shows the full binary tree having four leaves and the appearance of generating OHT basis images, respectively.

Fast Block Matching Algorithm Using Two-Dimensional Orthonormal Tree-Structured Haar Transform
OTSHT is used for both FS-equivalent fast BM algorithm and non FS-equivalent one. In both algorithms, the similarity of all candidate patches to the template is calculated by SSD in the transform domain. Let x j be the column vector of the j-th window in a proper order. The k-th OTSHT coefficient, where h k is the column vector of the k-th OTSHT basis image in a proper order. In practice, since the elements of OTSHT basis images have +1 and −1, forming a rectangle region, the OTSHT coefficient is obtained by just few operations with the integral image [11,12]. Moreover, the strip sum technique reduces the number of operations [8].

FS-Equivalent Algorithm Using OTSHT
The OTSHT can be used for FS-equivalent algorithm. The fast FS-equivalent algorithm using OHT [8] is applicable to OTSHT. The operations are significantly reduced by iterative pruning process described below.
When an appropriate threshold is used, one may securely reject the windows with sum of squared differences (SSDs) above the threshold: If then the j-th window is rejected from the search, where X K j and X K t are the OTSHT coefficients including the first to K-th ones, i.e., X K j = [X j (1), X j (2), . . . , X j (K)] T , of the j-th window and the template, respectively. Once the window is rejected, neither the OTSHT coefficient nor the SSD of the window is calculated. For each iteration of k, the k-th OTSHT coefficient and the SSDs of the remaining windows are calculated. The iteration is performed until the number of remaining windows is small. Algorithm 1 shows the pseudo code for FS-equivalent algorithm using OTSHT. set the k-th OTSHT coefficient of x t to X t (k) 6: for each patch x j in x 7: if Flg j == 'true 8: set the k-th OTSHT coefficient of x j to X j (k) 9: if ||X K j − X K t || 2 2 ≥ threshold 10: Flg j = ' f alse 11: end 12: end 13: end 14: if the number of 'true in Flg is enough small 15: break 16: end 17: end 18: FS in remaining candidates Output: estimated window

Non FS-Equivalent Algorithm Using OTSHT
The OTSHT can be used for non-FS-equivalent algorithm. Instead of the iterative pruning process of the FS-equivalent algorithm mentioned above, the number of OTSHT basis images is limited for reducing the computational load. The similarity using the first to K-th OTSHT coefficients are calculated at a time. The number K is determined by users. Algorithm 2 shows the pseudo code for non FS-equivalent algorithm using OTSHT.

Algorithm 2: non FS-equivalent BM.
Input: template t of size N × N and image x 1: make basis images 2: make the integral image of x 3: for k = 1: K 4: set the k-th OTSHT coefficient of x t to X t (k) 5: for each patch x j in x 6: set the k-th OTSHT coefficient of x j to X j (k) 7: end 8: end 9: estimated window j =min j ( ||X K j − X K t || 2 2 ) Output: estimated window Figure 7a,b shows the number of additions per pixel and the number of memory fetch operations per pixel, respectively, for computing the OTSHT coefficients using strip sum technique [8] (referred to as (S)), and integral image only (referred to as (I)). Compared to the number of operations for OHT (i.e., N = 8 or N = 16), the number of additions and memory fetch operations for B-OTSHT coefficients does not gain much, while that for the L-OTSHT is more than double and increases as N increases.  With regard to memory usage, when the width and height of an image are J 1 and J 2 , respectively, and r rectangles having N h different heights in OTSHT basis images, J 1 J 2 N h memory size will be required for the horizontal strip sum technique [8], J 1 J 2 memory for the integral image, and J 1 J 2 memory for saving the similarity. Therefore, N h time more memory size is required for the strip sum technique. Table 1 summarizes the number of rectangles with different sizes having different heights and different widths in the set of N 2 OTSHT.

Experimental Section
In experiments, the fast BM algorithm using OHT and the fast BM algorithm using OTSHT are simply denoted by OHT and OTSHT, respectively, unless otherwise specified. We evaluate OTSHT in comparison to OHT and FS. All experiments are implemented using MATLAB and performed on Macintosh with 4.0 GHz core i7. Eight test images [23] were used for the evaluation.

Pruning Performance of Different Tree Structures
We evaluated the tree structures for the OTSHT basis images. We consider five examples of binary tree having N = 9 leaves shown in Figure 8.   Table 2 summarizes the number of rectangles of different sizes having different heights. From the trees shown in Figure 8a-e, we construct the OTSHT basis images, which are referred to as B-OTSHT, OTSHT(1), OTSHT(2), LR-OTSHT, and LL-OTSHT, respectively.  Figure 9 shows the percentage of remaining windows after pruning, which was conducted every k-th basis image. The number is averaged over 100 templates. In this experiment, the performance of OTSHT(1) is only slightly better than that of B-OTSHT and OTSHT (2). On the other hand, the performances of LR-OTSHT and LL-OTSHT were not satisfactory.

FS Equivalent Algorithm
We performed OTSHT, OHT and FS for evaluating the processing time. The template size, N × N, was changed from N = 5 to 15. 100 templates were chosen every 55 pixels in the raster scan. Balanced binary tree is used for constructing OTSHT basis images. All of the results are identical to FS. Figure 10 shows the mean processing time. The processing time of FS increases linearly as N increases, while the plot of OTSHT is flat. The OTSHT is faster than FS when N is greater than or equal to 7. The OHT is just a bit faster than OTSHT but the size of it is limited to be power-of-two. In [8], the time speed-up of algorithms over FS was examined reporting the speed up of OHT over FS to be roughly 10 times faster when N = 16. In our experiment, the speed up of OHT over FS was 6 times faster due to the fact that we do not use the particular template used in [8] showing high standard deviation of pixel intensities in the template.

Image Denoising Application
We have compared the OTSHT to FS and the 8 × 8 OHT [8] within the framework of image denoising, where the denoising performance depends on collecting similar patches. For this purpose, as an image denoising method, the weighted nuclear norm minimization (WNNM) [16] has been used. In WNNM, the optimal patch size and other parameters are set depending on noise level, which are shown in Table 3. Noise added to the image was white Gaussian with zero mean and the standard deviation of σ, where σ = 10, 20, 30, 40, and 50. The OTSHT(OHT) and FS are used as the procedure of collecting similar patches in WNNM, which are referred to as WNNM-K and WNNM-FS, respectively. The pseudo code is shown in Algorithm 3. From the observation in Sections 4.1 and 4.2, we constructed the OTSHT basis images from the balanced binary tree and used the non FS-equivalent algorithm where K = 2, 4, 8, and 16 described in Section 3.2 because the speed up over FS cannot be expected when the patch size is small.

5:
BM for collecting similar patches to form similar patch groupỹ t by SSD j end 10: aggregatex t to form the clean imagex (i) 11: end Output: clean imagex (max i ) First, we compare the OTSHT to FS. Figure 11a shows the mean PSNR of WNNM-FS and WNNM-K, K = 2, 4, 8, and 16 in different noise levels. The PSNR of WNNM-2 was below that of WNNM-FS but almost the same when the noise level is low. When K ≥ 4, the PSNR of WNNM-K is almost the same as that of WNNM-FS. The PSNR of WNNM-16 is slightly higher than that of WNNM-FS. The PSNR of each image is shown in Table 4. The best PSNR is shown in bold. We observe that there is almost no difference of PSNR between WNNM-K and WNNM-FS, often the results of the first are even better, although FS is generally considered as more accurate than non-FS equivalent algorithm. The reason behind this is that BM in the spatial domain is not efficient for noisy images since it may result in matching noisy patterns, and thus, deceasing the denoising performance. The filtered images by these methods are almost undistinguishable. Figure 11b shows the mean processing time in different noise levels. The number of y-axis in the bar chart is the number of limited basis images, K. The processing time for BM and the other denoising method's modules are expressed in blue and yellow bars, respectively. The processing time for BM of WNNM-2 is 46 to 56 percent of that of WNNM-FS; WNNM-4, 53 to 63%; and WNNM-8, 62 to 75%. In addition, the larger is the patch size, the more efficient is the procedure. The OTSHT reduces the processing time while keeping the same PSNR level as FS.   Next, we compare the OTSHT to the 8 × 8 OHT used in WNNM-K (K = 2, 4, 8, and 16). Although the OHT cannot be used in WNNM with the optimal patch size, we force the 8 × 8 OHT to WNNM by fixing the patch size to 8 × 8 for evaluating the performance in the different patch sizes. Figure 12 shows the PSNR and processing time of the OTSHT and the 8 × 8 OHT. The number of y-axis in the bar chart is the number of limited basis images, K, in the processing time. The processing time for BM and the other denoising method's modules are expressed in blue and green bars, respectively.
We observe the mean PSNRs of the OTSHT are larger than those of the 8 × 8 OHT and the other processing time of the OTSHT is approximately 50 seconds faster than that of the 8 × 8 OHT. This is due to the fact that the patches collected by the 8 × 8 OHT contain extra regions that are not appropriate for collecting similar patches and for processing in other modules. The PSNRs of each image are shown in Table 4, where the best PSNR is shown in bold. When σ = 10 and 20, in WNNM-2 and WNNM-4, the PSNRs of OTSHT were 0.33 to 0.35 higher than those of OHT; WNNM-8, 0.28 to 0.29. When σ > 30, the PSNRs of OTSHT were almost the same as those of OHT in WNNM-2, 4, 8, and 16.

Conclusions
We have considered the fast block matching (BM) based on orthonormal tree-structured Haar transform (OTSHT). We have described how to construct the two-dimensional OTSHT and use them for BM with a freedom of the design. The OTSHT can be used for FS-equivalent BM and non FS-equivalent one. In FS-equivalent BM, the conventional techniques, such as pruning and strip sum via integral image, are used for speed up. In non FS-equivalent BM, limited basis images are used. As a FS-equivalent BM, we have evaluated the computational complexity and pruning performance on the design of tree structures. We have demonstrated that the OTSHT based on balanced binary tree is more efficient than that based on the logarithmic binary tree, with respect to pruning performance and computational cost. As a non FS-equivalent BM, we have demonstrated the capability of the introduced method in image denoising application, where an arbitrary template size is used, depending on a noise level. In all our experiments, we have observed that not only PSNR values but also visual appearance of denoised images by WNNM-K and WNNM-FS are extremely close, so we can conclude that filtered images by these methods are almost indistinguishable. Thus, to conclude, the main advantage of the proposed WNNM-K is that it can effectively substitute a baseline WNNM (where FS is used for BM) providing a significant reduction of its computational time.