Image Inpainting Based on Multi-Patch Match with Adaptive Size

Patch-based image inpainting methods iteratively fill the missing region via searching the best sample patch from the source region. However, most of the existing approaches basically use the fixed size of patch regardless of content features nearby, which may lead to inpainting defects. Also, global match is needed for searching the best sample patch, but only to fill one target patch in each iteration, resulting in low efficiency. To handle the issues above, we first evaluate the nonuniformity in an image, by which the patch size is adaptively determined. Moreover, we divide the source region into multiple non-overlapping subregions with different nonuniformity levels, and the patch match proceeds in every subregion, respectively. This strategy not only saves the match time for single target patch, but also reduces the mismatch, and enables the simultaneous filling of multiple target patches in a single iteration. Experimental results show that in comparison to previous patch-based works, our method has achieved further improvement both in quality and efficiency. We believe our method could provide a new way for patch match with better accuracy and efficiency in image inpainting tasks.


Introduction
Digital image inpainting is one of the research hotspots in the field of image restoration, which fills the missing areas with plausible content or replaces the unwanted objects with background utilizing the neighborhood information in digital images. Typical applications are such as restoration of damaged photos and ancient paintings, filling the holes in a virtual-view image [1], and removing the watermark or text in a picture. The purpose is to make the restored image seem as natural as possible, without noticeable traces of inpainting.
There are mainly two categories of traditional approaches for image inpainting: diffusion-based methods and patch-based methods. The general principle of the diffusion-based methods is to diffuse the known information into the missing regions in an iterative process, modeled by partial differential equation (PDE). Inspired by the propagation of heat flow, Bertalmio et al. [2] introduced the first diffusion-based method, and proposed the strategy of propagating the linear structure (i.e., isophote) from the source region (i.e., known region) into the missing region. Chan et al. [3] applied the total variational to the image inpainting for the first time (a.k.a. TV model), which converted the image inpainting into a mathematical problem that using the Euler-Lagrange equation to solve the extreme of (1) We first propose a metric to evaluate the nonuniformity in an image, and; (2) To achieve a more accurate and flexible inpainting, the patch size is adaptively determined according to its nonuniformity; (3) To save the match time, our subregion search strategy allows the match only between patches with similar content. This trick not only helps to narrow down the match area to a large extent while without missing the optimal sample patch, but also skips those bad sample patches to avoid mismatch; (4) To reduce the total number of iterations, our multi-patch match strategy enables the patch match to proceed in multiple subregions with different nonuniformity levels, so that multiple target patches can be filled in a single iteration; The rest of the paper is organized as follows: Section 2 will briefly introduce the classical Criminisi's algorithm, the baseline of our work. Section 3 shows the details of our improvements, including the nonuniformity model, determination of adaptive patch size, strategy of subregion search, and multi-patch match. Experimental results and analysis will be given in Section 4, and we compare our results with related patch-based methods proposed in recent years. Section 5 draws the conclusion and presents future work.

Related Work
We choose Criminisi's algorithm [10] as the baseline of our work since it is the pioneer work and has the most representative framework in patch-based inpainting methods. In this section, we first briefly introduce how it works, and then give some analysis about its shortcomings and the improvements in other related works.
For the convenience of expression, we define some notations first. As shown in Figure 1, I represents the entire incomplete image, the missing area (target region) is represented by Ω, Φ is the known area (source region) and is defined as Φ = I − Ω. δΩ is 1-pixel-wide outer boundary of Ω (δΩ ⊂ Φ), which is called the filling front, other symbols will be introduced later.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 3 of 18 (4) To reduce the total number of iterations, our multi-patch match strategy enables the patch match 93 to proceed in multiple subregions with different nonuniformity levels, so that multiple target 94 patches can be filled in a single iteration;

95
The rest of the paper is organized as follows: Section 2 will briefly introduce the classical 96 Criminisi's algorithm, the baseline of our work. Section 3 shows the details of our improvements, 97 including the nonuniformity model, determination of adaptive patch size, strategy of subregion 98 search, and multi-patch match. Experimental results and analysis will be given in Section 4, and we 99 compare our results with related patch-based methods proposed in recent years. Section 5 draws the 100 conclusion and presents future work.

Related Work
has the most representative framework in patch-based inpainting methods. In this section, we first 104 briefly introduce how it works, and then give some analysis about its shortcomings and the 105 improvements in other related works.

106
For the convenience of expression, we define some notations first. As shown in Figure 1, I represents the entire incomplete image, the missing area (target region) is represented by Ω , Φ is 108 the known area (source region) and is defined as , which is called the filling front, other symbols will be introduced later.

112
The central idea of Criminisi's algorithm is that in each iteration, find the point with the highest 113 priority on the filling front, and establish a target patch centered at the point, then globally search for 114 the best sample patch; finally, the best sample patch is copied to the target patch to fill its unknown 115 part. Repeat the above steps iteratively until the missing area is completely filled. Specifically, the 116 algorithm consists of the following three steps:  In order to determine the filling order, the priorities need to be computed for all pixels along the 119 filling front Ω δ . Given a pixel ∈ ( Ω) p p δ , the priority P(p) is defined as follows: where C(p) is the confidence term and D(p) is the data term, they are defined as follows: The central idea of Criminisi's algorithm is that in each iteration, find the point with the highest priority on the filling front, and establish a target patch centered at the point, then globally search for the best sample patch; finally, the best sample patch is copied to the target patch to fill its unknown part. Repeat the above steps iteratively until the missing area is completely filled. Specifically, the algorithm consists of the following three steps:

1.
Calculate the priorities along the filling front In order to determine the filling order, the priorities need to be computed for all pixels along the filling front δΩ. Given a pixel p (p ∈ δΩ), the priority P(p) is defined as follows: where C(p) is the confidence term and D(p) is the data term, they are defined as follows: where Ψ p is a 9 × 9 patch centered at p, and Ψ p is the area of Ψ p ,∇I p represents the gradient at p and ⊥ is the orthogonal operator, so ∇I ⊥ p denotes the isophote vector. n p is a unit vector orthogonal to the filling front δΩ at the point p (see Figure 1). It can be seen that C(p) represents the amount of known information contained in patch, while D(p) shows how strong a linear structure contained in Ψ p . The priority model encourages those areas with more known pixels and strong linear structures to be filled first. The initial value of C(p) is set to C(p) = 1 (∀p ∈ Φ), C(p) = 0 (∀p ∈ Ω).

2.
Search for the best sample patch to fill one target patch After calculating the priorities of all pixels on the filling front δΩ, find the pixelp with the highest priority, establish a target patch Ψp centered atp, use a sample patch Ψ q of the same size as Ψp to traverse the entire source region Φ to search the best sample patch Ψq that is most similar to Ψq. The similarity metric is the sum of squared difference (SSD) between the known pixels in Ψp and corresponding pixels in Ψq. The best sample patch Ψq satisfies the following equation: then the Ψq is copied to the unknown part of the Ψp so that one target patch is filled.

Update information
After Ψp is filled, the update rule of confidence term for new pixels p in Ψp is as follows: the data term of p is directly copied from its source pixel. Finally, update the source region Φ, the missing region Ω, and the filling front δΩ. So far, a single iteration is finished. Repeat the above steps until Ω is completely filled. Criminisi's algorithm has obtained relatively satisfactory results in filling large-area holes or removing objects in a picture, although shortcomings exit, some of them have been fixed by related works, they are listed as follows: (1) The confidence term in the priority model may encounter a sharp decline after multiple iterations, while the fluctuation of the data term is relatively stable. Thus, the priority is more likely to be restricted by lower confidence term and become unreliable, leaving incorrect filling order and structural error propagation. Later, Zhou et al. [11] demonstrated that different weighted-priority should be chosen for specific structures to get better inpainting results. Liu [12] and Cao et al. [13] changed the priority formula into the exponential and addition form respectively to prevent the confidence term from falling too quickly. Xi et al. [14] eliminated the dependence on the shape of the target region and preserved the stability of confidence term by introducing the gray entropy; (2) The similarity criterion between sample patch and target patch used in Criminisi's algorithm is the sum of squared differences (SSD) of corresponding pixels in two patches, which only takes the pixel value into account and does not make full use of the structural information.
Martínez-Noriega et al. [15] added the Hellinger distance to measure the similarity of the probability distribution between two patches. Ran [16] introduced a metric of the structural similarity between two patches. Liu et al. [17] also defined a new match rule by taking structure tensor into consideration. These works have successfully reduced the rate of mismatch; (3) Criminisi et al. chose a fixed patch size of 9×9 pixels, which is unreasonable, since the patch size directly affects its capability to capture the local texture features and has an important influence on inpainting quality. Different sizes of patch should be applied on regions with different uniformity levels. Generally, smaller patches should be applied on high-frequency areas with more textures and structures to achieve a finer filling, while larger patches are appropriate for flatter areas to speed up the filling process, this idea was also demonstrated in Reference [1]. There are relatively few researches about this issue. Zhou et al. [18] determined the patch size automatically with gradient histograms involved, and this was implemented as an optimization problem that requires extra continuous iterations. However, this process is computationally expensive and takes a longer time; (4) Global search in source region is required in Criminisi's method in order to find the best sample patch for the target patch, which needs a large amount of calculation and match time. Liu et al. [17] narrowed the match area by picking out those candidates whose sum of the pixel values is close to the target patch's, but this cannot guarantee that the bad candidates are excluded; (5) Criminisi's and related patch-based inpainting techniques require a large number of iterations to completely fill the unknown area, since only one target patch can be filled in a single iteration. At present, there is no related research to handle this deficiency.
Compared with problems 1 and 2, there are relatively fewer studies on problems 3 and 4. Aiming at problems 3-5 mentioned above, we provide a novel solution: We first introduce the nonuniformity model, by which the patch size will be determined adaptively to address problem 3. Besides, the strategy of subregion search is proposed to address problem 4 in an effective way. Moreover, the strategy of a multi-patch match is proposed to handle problem 5 for the first time.

Proposed Approach
Our method uses a similar framework as Criminisi's [10]. Based on that, we add a step to evaluate the nonuniformity in an image at the beginning of the process. Based on nonuniformity and filling priority, multiple centers of the target patches are located on the filling front, patch sizes are determined by the nonuniformity, and each target patch's search area is limited from the global source region to particular subregion with similar content. After an iteration, these target patches will be filled with the content borrowed from the source region. Figure 2 briefly illustrates the basic framework of our work.

Evaluate the Nonuniformity
Evaluating the nonuniformity is the dependency of the other parts in our work, including the patch size determination, subregion search and multi-patch match. Therefore, the nonuniformity Appl. Sci. 2020, 10, 4921 6 of 17 needs to be quantified first. Given a patch of pixels that under a certain distribution, the standard deviation in statistics can effectively characterize the nonuniformity of pixel values, also can be seen as a measure of the texture feature [20]. Our nonuniformity is exactly based on the local standard deviation. Concretely, let S p denote a square window centered at point p (p ∈ Φ), w s is the width of S p determined by the image size H × W, by default, w s = max(2 min(H,W) 100 is the rounding operator. The local standard deviation at p is obtained by computing the standard deviation of all the known pixels in S p : where σ(p) is the local standard deviation at p, µ is the mean value in S p . Appling Equation (6) for all the pixels in source region, we obtain a map of local standard deviation σ(Φ), then σ(Φ) is normalized to the interval [0, 1] as the following equation: where σ(Φ) is the normalized local standard deviation map. We find that in most cases if we directly use σ(Φ) as the descriptor of the content without post-processing, its data distribution will be very uneven: The lower part is over-crowded and less distinguishable, while the higher part is too sparse (see Figure 3b for example). In Section 3.2, we intend to evenly divide the interval [0, 1] into subintervals so that every pixel can be categorized and be treated accordingly. Intuitively, it is better to stretch the crowded data to a relatively even distribution to fit the evenly-divided subintervals. We find that the widely-used histogram equalization is a simple and effective way to achieve this purpose, without any parameter to be set manually. Therefore, we equalize the σ(Φ) to make full use of the space [0, 1]. Let histeq(.) be the operation of histogram equalization: where T is the value equalized from σ and we define it as the nonuniformity. For instance, Figure 3c shows the nonuniformity map computed from Figure 3a.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 7 of 18 We find that the widely-used histogram equalization is a simple and effective way to achieve this 211 purpose, without any parameter to be set manually. Therefore, we equalize the (Φ) σ to make full 212 use of the space [0,1]. Let histeq . be the operation of histogram equalization: where T is the value equalized from  σ and we define it as the nonuniformity. For instance, Figure   214 3c shows the nonuniformity map computed from

218
As shown in Figure 3c, each pixel p in Φ has a certain value of nonuniformity T(p), which 219 reflects the number of details contained in neighborhood of p . Intuitively, areas with more details 220 (such as textured regions and line structures) will obtain a higher nonuniformity, whereas flatter 221 areas (such as the background) will get a lower nonuniformity. For unknown areas, the As shown in Figure 3c, each pixel p in Φ has a certain value of nonuniformity T(p), which reflects the number of details contained in neighborhood of p. Intuitively, areas with more details (such as textured regions and line structures) will obtain a higher nonuniformity, whereas flatter areas (such as the background) will get a lower nonuniformity. For unknown areas, the nonuniformity makes no sense. If the unknown areas are filled with new pixels during the filling process, the nonuniformity of the new pixels is directly updated from their source pixels.

Adaptive Target Patch Size
Criminisi's algorithm utilizes a fixed patch size during the whole process. However, given an image, some areas contain rich textures and details, others may have little texture distribution. If a larger patch is used in rich-textured areas, stitching cracks are often easy to occur, thus losing the consistency of the texture structure. Therefore, the smaller patch should be used to achieve a smoother and natural texture propagation. Moreover, for those relatively flat regions, a larger patch is applied to prevent the staircase effect and speed up the filling process. It is more reasonable to apply different sizes of patch for different regions.
Denote n as the number of the sizes of patch used in our work. For a certain target patch Ψp, its size w Ψp × w Ψp is specified by the nonuniformity T(p), as the following equation: where • is downward rounding operator. The purpose of Equation (9) is to evenly divide the interval [0, 1] into n subintervals with the step of 1 n . Suppose T(p) falls in the k th subinterval [ k−1 n , k n ), (k = 1, 2, . . . , n , as k decreases from n to 1, w Ψp increases from 3 to its maximum size (2n + 1) with the step of 2 (the size should be odd), namely, the patch size is inversely proportional to the nonuniformity. Note that n should not be too small, or the choices of patch will be too limited to adaptively fit the different situations; neither too large, otherwise the max size of patch will also grow too large. If we paste an oversized patch into the target region, it is more likely to cause the stitching inconsistency even if in flat areas. We empirically let n change along with the image size H × W as ], 1), since we did not find the obvious evidence to show that there exists an optimal value of n.

Subregion Search
In Criminisi's algorithm, the global search for the best sample patch is required for every target patch, which is computationally expensive and unnecessary. In fact, it is more appropriate to let the target patch selectively match those sample patches that have similar content to the target patch, and skip those sample patches that are far different from the target patch. By doing so, firstly the search area will be narrowed from the entire source region to its subregion with the similar content to the target patch, so that the match time and the computation could be reduced. Moreover, a large number of unsatisfactory sample patches can be filtered out to reduce the mismatch, thereby improving the inpainting accuracy to a certain extent. Our "subregion search" strategy comes as follows.
Considering if the target patch centerp is in a rich-textured area or strong edge, it is nearly impossible for those sample patches Ψ q from poor-textured or flat area to serve as the ideal sample patches, instead, desired sample patches are supposed to have the similar content to Ψp and similar content means the similar nonuniformity level. Based on the above consideration, we narrow down the search area of Ψp from the entire source region to its subregion according to the nonuniformity map T(Φ) obtained by Equation (8). Specifically, the set of pixel q that satisfies the following equation is defined as the restricted subregion φ , which serves as the search area of the target patch Ψp: where α is a parameter adjusts the strictness of limiting the search area. Smaller α means: (1) The subregion for search shrinks, the algorithm will be very careful with choosing the possible ideal candidates (see Figure 4a); (2) Lower fault-tolerance during the patch match (related to item 1), the method is more likely to miss the optimal candidate if search area is too small; (3) There will be more patch matches in an iteration because the space saved for each subregion makes room for more subregions (target patches) to get involved (see Section 3.4). By default, α = 0.1. Figure 4b shows how subregion search works, the subregion φ for search is colored with respect to its target patch Ψp (blue patch, the size is magnified for clearer visualization). Only those sample patches whose center falls into subregion φ are considered potentially ideal (green patches), since they have the similar content to Ψp, while those patches whose center outside the subregion φ are considered undesirable (red patches) and will be ignored during match process for they are far different from Ψp. Intuitively, this strategy only allows the match between Ψp and a small portion of similar sample patches, which greatly saves the match time while maintaining the match accuracy.    P δ on Ω δ should be calculated by Equation (1). Based on this, we then select multiple 286 pixelsˆi p with "the highest priority" on Ω δ by considering its nonuniformity distribution ( Ω) T δ .

287
According to the idea that "target patch and its ideal sample patch should have similar content" in Table 1 to show how to generate multiple target patches on filling front Ω δ .

Multi-Patch Match
During a single iteration of the existing patch-based methods, including Criminisi's, only one target patch can be filled after the match process, so that a large number of iterations are required to completely fill the missing area. To reduce the total number of iterations, we also propose the strategy of "multi-patch match": In each iteration, we appropriately select multiple pixelsp i (i = 1, 2, 3, . . .) with "the highest priority" on filling front, then generate multiple target patches Ψp i centered atp i , and multiple best sample patches Ψq i are also searched to fill Ψp i correspondingly.
Criminisi et al. addressed that the filling order based on the priority model is crucial to prevent error inpainting. We also let the filling priority keep working in our approach. At first, the filling priority P(δΩ) on δΩ should be calculated by Equation (1). Based on this, we then select multiple pixelŝ p i with "the highest priority" on δΩ by considering its nonuniformity distribution T(δΩ). According to the idea that "target patch and its ideal sample patch should have similar content" mentioned in Section 3.3, and nonuniformity can be used as a scalar descriptor of content in a patch, match process of the two target patches with different nonuniformity levels can be considered independent, because the patch match in rich-textured regions or strong edges may not disturb the patch match in poor-textured or flat regions, and vice versa. Thanks to this, our multi-patch match is feasible. To reach this goal, a reasonable idea is to divide δΩ into multiple subsections with different nonuniformity intervals, letp i be the highest priority pixel on each subsection, and multiple target patches Ψp i are generated centered atp i . Specifically, we present the following algorithm shown in Algorithm 1 to show how to generate multiple target patches on filling front δΩ.

2.
Find the highest priority pixelp i on δΩ.

3.
Define the subsection δΩ i of δΩ as: δΩ i = p T(p) − T(p i ) ≤ 2α, p ∈ δΩ , then reset the priorities to zero for all pixels on δΩ i . 4.
If there exits any non-zero priority pixels on δΩ, let i = i + 1, back to step 2. Otherwise, go to step 5.
Note that we do not directly divide the filling front into subsections all at once, but in a progressive way that the current patch centerp i determines the current subsection δΩ i and the next patch center p i+1 will be born outside the union of existing subsections δΩ 1 ∪ . . . ∪ δΩ i , namely, both subsections and patch centers are generated synchronously. This comes from the idea that the filling priority model always comes the first, then followed by subsection division. Illustration for above steps is shown in Figure 5. Figure 5f shows the final state, the nonuniformity of target patches Ψp i are in different intervals, and each patch center also owns the highest filling priority on its subsection. As explained above, the order of Ψp i may not be consistent with the descending order of their nonuniformity, but follows the order of filling priority. that the filling priority model always comes the first, then followed by subsection division.

302
Illustration for above steps is shown in Figure 5. Figure 5f shows the final state, the nonuniformity of

311
After generating multiple target patches, the strategy of "subregion search" mentioned in 312 Section 3.3 is also involved. For each target patch Ψ p i , subregion φ i with the similar content is 313 assigned by Equation (10). As shown in Figure 6, during a single iteration, the patch match between 314 Ψq i and Ψ p i can proceed in their corresponding subregion φ i , respectively, and each subregion 315 φ i will produce a best sample patch Ψ q i to fill its target patch. In this way, multiple patches can 316 be filled in one iteration. By combining the subregion search and multi-patch match strategies, the 317 speed of algorithm can be effectively improved. (c) reset priorities on δΩ 1 to 0, find the next pixelp 2 with maximum priority, define δΩ 2 that has similar content top 2 ; (d,e) continue to find allp i and δΩ i on δΩ until there is no non-zero priority pixel on δΩ; (f) generate target patches Ψp i centered atp i .
After generating multiple target patches, the strategy of "subregion search" mentioned in Section 3.3 is also involved. For each target patch Ψp i , subregion φ i with the similar content is assigned by Equation (10). As shown in Figure 6, during a single iteration, the patch match between Ψ q i and Ψp i can proceed in their corresponding subregion φ i , respectively, and each subregion φ i will produce a best sample patch Ψq i to fill its target patch. In this way, multiple patches can be filled in one iteration. By combining the subregion search and multi-patch match strategies, the speed of algorithm can be effectively improved. It must be stressed out that step (2) in Table 1 These two equations will avoid the overlap of the nonuniformity 323 interval and spatial scope between any two subregions, i.e.,

324
ensuring that the match process in every subregion is independent of each other.

325
The description of our overall algorithmic steps is shown in Table 2.

Generate multiple target patches Ψ
p i on Ω δ , as shown in Table 1.

4.
Assign subregion φ i as the search scope for every Ψ p i as Equation (10).

5.
Match the best sample patch Ψ q i for Ψ p i in φ i using Equation (4).

6.
Fill the unknown part of Ψ p i with corresponding pixels in Ψ q i .

7.
Update the data term for new pixels as Equation (5), confidence term and nonuniformity term are directly copied from their source pixels.

Update region Ω and Φ , if = ∅ Ω
, exit the whole process. Otherwise, back to step 2.

328
In this section, we evaluate the performance of the proposed method by conducting two types   It must be stressed out that step (2) in Algorithm 1 ensures that ∀p i ,p j ∈ δΩ (i j), the relation T(p i ) − T(p j ) ≥ 2α always hold, in addition, Equation (10) guarantees that ∀q ∈ φ i , there always exists that T(q) − T(p i ) ≤ α. These two equations will avoid the overlap of the nonuniformity interval and spatial scope between any two subregions, i.e., T(φ i ) ∩ T(φ j ) = ∅, φ i ∩ φ j = ∅ (i j), ensuring that the match process in every subregion is independent of each other.
The description of our overall algorithmic steps is shown in Algorithm 2.

Experimental Results
In this section, we evaluate the performance of the proposed method by conducting two types of experiments: Image restoration and object removal. The experiment is conducted on a computer with 2.2GHz CPU and 4 GB RAM, and implemented via MATLAB.

Instance Test
We first show our inpainting results qualitatively and quantitatively by making a few instance tests. To make a comparison, several previous patch-based methods are also applied to our experiment, including Criminisi's exemplar-based inpainting method [10], Liu's method based on structure tensor [17], and Zhou's method using adaptive size based on gradient histograms [18]. One of the most representative diffusion-based methods, CDD model [4] proposed by Shen et al., is also involved in our test (3 k iterations employed). These instances include a portrait: Lena (512 × 512); two natural scenes: House (512 × 512), Sculpture (256 × 256); along with two pure texture images: Irregular pattern Texture I (640 × 640) and regular pattern Texture II (640 × 640) from Brodatz dataset [21,22]. These images are masked with black color that represents for missing areas. Both subjective and objective evaluation are compared among the proposed and previous methods mentioned above, where the subjective evaluation is the visual effects of completed images, as shown in  involved in our test (3 k iterations employed). These instances include a portrait: Lena (512 × 512); two natural scenes: House (512 × 512), Sculpture (256 × 256); along with two pure texture images: Irregular pattern Texture I (640 × 640) and regular pattern Texture II (640 × 640) from Brodatz dataset [21,22]. These images are masked with black color that represents for missing areas. Both subjective and objective evaluation are compared among the proposed and previous methods mentioned above, where the subjective evaluation is the visual effects of completed images, as shown in Figures 7-11.  involved in our test (3 k iterations employed). These instances include a portrait: Lena (512 × 512); 339 two natural scenes: House (512 × 512), Sculpture (256 × 256); along with two pure texture images: Irregular pattern Texture I (640 × 640) and regular pattern Texture II (640 × 640) from Brodatz dataset 341 [21,22]. These images are masked with black color that represents for missing areas. Both subjective 342 and objective evaluation are compared among the proposed and previous methods mentioned above,
Subjective visual effects in Figures 7-11 show that compared with other methods, our results have obtained the least defects, and are most similar to the ground truth. As for the Lena (Figure 7), especially in those areas with rich textures and strong linear structures, such as the corner of eyes and the edge of the hat, our method will automatically generate a smaller target patch, ensuring that structures and textures are better preserved. In Reference [10], the fixed size of patch may not work well especially for areas with rich details, since the patch size can easily exceed the scale of the texture element, which could easily lead to stitching error and structure discontinuity. Reference [17] also uses a fixed size of patch, although a few of flaws still occur, benefiting from its improved priority model based on structure tensor, the algorithm has also achieved relatively good inpainting results. Reference [18] utilizes an adaptive size of patch based on gradient histogram; however, this method will generate a larger patch when connecting strong edges, as discussed above, this may lead to incorrect propagation of structures. Different from patch-based methods, Reference [4] does not copy patches from elsewhere, and performs better when dealing with such slim scratches by diffusion. As for the House (Figure 8), previous patch-based methods have occurred mismatch marked in red circles. This is because in the process of finding the best sample patch, both References [10] and [18] search the sample patch from the entire source region without filtering, and they are easier to match an inappropriate candidate if the match metric does not work well. In Reference [17], although the search area is limited by picking out those sample patches whose sum of pixel values falls within a certain range near its target patch's, however, this strategy does not always work well because the sum in a patch may reflect very limited information, those unsatisfactory sample patches may not be well excluded. Moreover, Reference [4] starts to show some diffusion artifacts as the scratches become thicker. In contrast, the proposed method searches for sample patch only in those areas that have similar content to the target patch, even if the similarity metric function loses its effect, it can still avoid selecting wrong sample patches to reduce mismatch. The Sculpture and Texture I, the richtextured images with thick scratches in Figures 9-10, show that the structures and textures are well preserved in our results, whereas Reference [4] introduces noticeable blurring artifacts in such case and this is also known as the common issue for diffusion-based methods. Texture II ( Figure 11) is a regular pattern, therefore, we believe additional steps are required to automatically perceive the scale of texture element for this periodically arranged textures to decide the optimal patch size before (c) CDD model [4]; (d) Criminisi's method [10]; (e) Liu's method [17]; (f) Zhou's method [18]; (g) proposed method. Figures 7-11 show that compared with other methods, our results have obtained the least defects, and are most similar to the ground truth. As for the Lena (Figure 7), especially in those areas with rich textures and strong linear structures, such as the corner of eyes and the edge of the hat, our method will automatically generate a smaller target patch, ensuring that structures and textures are better preserved. In Reference [10], the fixed size of patch may not work well especially for areas with rich details, since the patch size can easily exceed the scale of the texture element, which could easily lead to stitching error and structure discontinuity. Reference [17] also uses a fixed size of patch, although a few of flaws still occur, benefiting from its improved priority model based on structure tensor, the algorithm has also achieved relatively good inpainting results. Reference [18] utilizes an adaptive size of patch based on gradient histogram; however, this method will generate a larger patch when connecting strong edges, as discussed above, this may lead to incorrect propagation of structures. Different from patch-based methods, Reference [4] does not copy patches from elsewhere, and performs better when dealing with such slim scratches by diffusion. As for the House (Figure 8), previous patch-based methods have occurred mismatch marked in red circles. This is because in the process of finding the best sample patch, both References [10] and [18] search the sample patch from the entire source region without filtering, and they are easier to match an inappropriate candidate if the match metric does not work well. In Reference [17], although the search area is limited by picking out those sample patches whose sum of pixel values falls within a certain range near its target patch's, however, this strategy does not always work well because the sum in a patch may reflect very limited information, those unsatisfactory sample patches may not be well excluded. Moreover, Reference [4] starts to show some diffusion artifacts as the scratches become thicker. In contrast, the proposed method searches for sample patch only in those areas that have similar content to the target patch, even if the similarity metric function loses its effect, it can still avoid selecting wrong sample patches to reduce mismatch. The Sculpture and Texture I, the rich-textured images with thick scratches in Figures 9 and 10, show that the structures and textures are well preserved in our results, whereas Reference [4] introduces noticeable blurring artifacts in such case and this is also known as the common issue for diffusion-based methods. Texture II ( Figure 11) is a regular pattern, therefore, we believe additional steps are required to automatically perceive the scale of texture element for this periodically arranged textures to decide the optimal patch size before further improvements are made. Unfortunately, none of these methods has solved this challenging task yet.

Subjective visual effects in
The objective performance of algorithms will be assessed from two aspects: quality and efficiency. Peak signal to noise ratio (PSNR) and structural similarity index (SSIM) [23] are used to evaluate the similarity between the completed image and the ground truth, the higher PSNR and SSIM value means the higher quality of a completed image. Running time of the algorithm is used to measure inpainting efficiency. The objective evaluation for the above images under each algorithm is shown in Tables 1-3: Table 1. Peak signal to noise ratio (PSNR) (dB) comparison of Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11.  Table 2. Structural similarity index (SSIM) comparison of Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11.  Table 3. Running time (s) comparison of Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11.

CDD [4] Criminisi's [10] Liu's [17] Zhou's [18] Proposed
Lena ( From the perspective of objective evaluation, the inpainting quality reflected by PSNR and SSIM is basically consistent with the subjective visual perception. Restored areas with more coherent and natural textures and structures may present better visual effects, and will also obtain higher PSNR and SSIM values. As for inpainting efficiency, our method has successfully reduced the time consumption thanks to our subregion search and multi-patch match strategies. Although our method is able to fill a maximum of six target patches in a single iteration, the running time is not shortened by six times as expected. This is because when the target patch size adaptively gets smaller, the inpainting progress will be slowed down. We also notice that if the image size gets larger, the acceleration becomes more obvious, since the average patch size will also increase. While Reference [17] tries to limit the search area by picking out those candidates whose sum is close to the target patch's, this does not significantly reduce the calculation and obtains limited acceleration. References [10] and [18] both adopt global search; and Reference [18] converts the determination of optimal patch size into an extra optimization problem, that requires additional iterations and decreases the overall efficiency.

Batch Test
In order to evaluate the performance of these algorithms on a broader level, our next experiment is conducted with more test samples. We randomly select 100 images from the public dataset Places2 [24] and resize them to 512 × 512. Then we make 20 masks to randomly generate missing areas on those sample images. The aforementioned methods are tested, and their objective quantitative evaluation indicators: PSNR, SSIM, and running time are plotted in Figures 12a, 13a and 14a, respectively.          We also study the performance of these methods in relation to the percentage of the masked area from above 100 samples. These samples are divided into five categories with respect to their mask ratios. Mask ratios are divided into five intervals vary from 0% to 25% with the step of 5%. Figures 12b, 13b and 14b show how the mask ratios affect the mean PSNR, SSIM, and running time of each method.
It can be learned from Figures 12a and 13a that PSNR and SSIM curves are very close that are not so distinguishable. The reason is that the differences between the restored images are those masked areas only, which are much smaller than the source region that is exactly the same as the original image. However, it still can be seen that, for most images, the proposed method has obtained relatively higher PSNR and SSIM values than other methods. In fact, our method achieves the highest PSNR value in 78 samples out of 100, the highest SSIM value in 67 samples out of 100, and the shortest running time in 97 samples out of 100. Figures 12b and 13b suggest that the CDD model could obtain relatively better performance when handling smaller mask size, but once as the mask size increases, this diffusion-based method may struggle at restoring the expected content and begin to fall behind the patch-based methods, whereas the proposed method achieves the best average performance in most cases. In order to make a clearer comparison of the overall performance of five algorithms from a quantitative perspective, the overall average values of those curves in Figures 12a, 13a and 14a are recorded in Table 4. Compared with Criminisi's algorithm [10], the average PSNR and SSIM in our algorithm are improved by 5.14% and 0.93% respectively, and the efficiency is improved by 276%. Our results also outperform the results in References [4,17,18].
In summary, the proposed approach has made effective improvements in dealing with the deficiencies of the previous patch-based inpainting algorithm, both subjectively and objectively.

Object Removal
Object removal is another typical application of image inpainting, by replacing the unwanted object with a plausible background in an image. Figure 15 shows examples where we attempt to remove unwanted objects from the existing images by using our method. Different from damaged image restoration, this task does not have ground truth as the reference, and there also might be multiple possible filling solutions. Thus, the objective quality evaluation is no longer applicable, here, we only provide the visual results. Note that as the masked area increases, some artifacts may still occur, such as the unnatural water texture in Figure 15f, and the discontinuity of the curved edge of the lawn in Figure 15g.  Table 6.

448
In summary, the proposed approach has made effective improvements in dealing with the 449 deficiencies of the previous patch-based inpainting algorithm, both subjectively and objectively.

462
In this paper, a novel multi-patch-based image inpainting algorithm is proposed for filling Figure 15. Examples of object removal using our method. Top row-original images; middle row-masks for the unwanted part; bottom row-results for object removal.

Conclusions and Future Work
In this paper, a novel multi-patch-based image inpainting algorithm is proposed for filling missing areas in a more accurate and efficient way. Aiming at the shortcomings that traditional patch-based methods use a fixed size of patch and a global search for finding the best sample patch, and only one target patch can be filled in a single iteration, which is computationally expensive, we provide a novel solution: We first introduce a measurement model to quantify the nonuniformity in an image, then different sizes of patches are adaptively determined for regions with different nonuniformity levels, making the restored textures and structures more coherent and natural so that inpainting quality is improved. Moreover, by fully utilizing the nonuniformity, the source region is divided into multiple non-overlapping subregions with different nonuniformity levels; and in each subregion, the best sample patch is matched for target patch. This has successfully reduced the match time in a single iteration and the total number of iterations, as well as the rate of mismatch. Experimental results show that our improved algorithm has obtained better inpainting quality, both subjectively and objectively with less time-consuming.
In addition, in terms of the inpainting quality, if related works are combined, such as improved priority model and match criterion, the results may become better. In terms of inpainting efficiency, further improvements can be achieved based on the acceleration strategy mentioned in this article. For example, the scalar value of nonuniformity used in this article may contain limited information, other features, such as texture directions, colors, etc., can also be introduced in the process of narrowing the search area to achieve further acceleration.
Finally, there are also limitations in our algorithm and other patch-based inpainting algorithms. These methods assume that the texture in the missing region can be found elsewhere in the source region. However, this assumption does not always hold-once the missing information is locally unique, similar structures cannot be found, these methods may struggle at reconstructing satisfactory results. Fortunately, in recent years, deep learning techniques have been introduced, and hopefully they are capable of making up for this deficiency. For example, Nazeri et al. [25] utilized two-stage GAN that has achieved impressive results. Image inpainting based on deep learning techniques might be a novel and robust way-especially in those complex cases and are worth further study.