You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

5 November 2024

PDGrad: Guiding Diffusion Model for Reference-Based Blind Face Restoration with Pivot Direction Gradient Guidance

,
and
1
Department of Artificial Intelligence, Ajou University, Suwon 16499, Republic of Korea
2
Department of Electrical and Computer Engineering, Ajou University, Suwon 16499, Republic of Korea
*
Author to whom correspondence should be addressed.
This article belongs to the Section Sensing and Imaging

Abstract

Reference-based blind face restoration (RefBFR) has gained considerable attention because it utilizes additional reference images to restore facial images in situations where the degradation is caused by unknown factors, making it particularly useful in real-world applications. Recently, guided diffusion models have demonstrated exceptional performance in this task without requiring training. They achieve this by integrating gradients of the losses where each loss reflects the different desired properties of the additional external images. However, these approaches fail to consider potential conflicts between gradients of multiple losses, which can lead to sub-optimal results. To address this issue, we introduce Pivot Direction Gradient guidance (PDGrad), a novel gradient adjustment method for RefBFR within a guided diffusion framework. To this end, we first define the loss function based on both low-level and high-level features. For loss at each feature level, both the coarsely restored image and the reference image are fully integrated. In cases of conflicting gradients, a pivot gradient is established for each level and other gradients are aligned to it, ensuring that the strengths of both images are maximized. Additionally, if the magnitude of the adjusted gradient exceeds that of the pivot gradient, it is adaptively scaled according to the ratio between the two, placing greater emphasis on the pivot. Extensive experimental results on the CelebRef-HQ dataset show that the proposed PDGrad significantly outperforms competitive approaches both quantitatively and qualitatively.

1. Introduction

Blind face restoration (BFR) aims to restore a high-quality (HQ) face image from a low-quality (LQ) image that has been degraded by unknown and complex factors, such as downsampling, blur, noise, and compression artifacts. BFR is a highly ill-posed problem because the unknown degradation makes it difficult to determine a single solution for a given LQ image, leading to multiple possible outcomes. Since facial images are sensitive to even subtle differences, having detailed information is essential for accurate restoration. By utilizing high-quality (HQ) reference images of the same individual, it becomes possible to achieve a high quality of image that is difficult to attain with BFR methods that do not use reference images. In this context, the reference-based blind face restoration (RefBFR) method has gained significant attention for its unique ability to leverage additional reference images to improve accuracy for practical scenarios. As a result, it can be applied for various applications, including face recognition [1,2], face detection [3,4,5] and age estimation [6,7].
Recently, several RefBFR studies [8,9,10,11,12,13] have been proposed based on deep learning [14]. Among these methods, PGDiff [13] has demonstrated outstanding performance in RefBFR using a training-free guided diffusion model [15]. It provides guidance to an unconditional diffusion model pre-trained for face image generation by incorporating the gradients of the losses during the reverse diffusion process. Their loss function is structured as a combination of multiple distances, each representing a specific desired attribute of the additional images. These include the coarsely restored image, generated using an external restorer such as CodeFormer [16], and the reference image, processed through the ArcFace network [1]. However, the guidance technique of PGDiff [13] may not be the optimal solution for RefBFR. This limitation arises from their gradients, which focus on using low-level information solely from the coarsely restored image, while relying on high-level information exclusively from the reference image. As a result, this approach fails to capture the crucial low-level details from the reference image and the high-level features from the coarsely restored image, leading to sub-optimal results. Moreover, the guidance derived from merely summing the gradients of multiple loss functions often results in sub-optimal results, as these gradients may be incompatible, causing conflicts.
To address this problem, we propose a novel gradient adjustment method for RefBFR called Pivot Direction Gradient guidance (PDGrad) within a guided diffusion framework. Inspired by PCGrad [17], the essence of our method is to reduce gradient interference by directly modifying the conflicting gradients of the loss. To this end, we first define the loss function based on both low-level and high-level features. Similar to PGDiff [13], we utilize external information such as the coarsely restored image y c which is obtained using the pre-trained restoration method such as CodeFormer [16] and the reference y r . However, unlike PGDiff, we utilize both y c and y r to compute the loss at each level. This is because these two images capture complementary characteristics of face images. Figure 1 illustrates the complementary properties of y c and y r . Generally, y c is aligned well with the LQ input and making it easy to compare with the prediction for low-level information such as edge, color and shape. However, certain areas of y c are not restored effectively. In contrast, y r provides more reliable high-level information, such as identity, and is partially aligned to input, helping to compensate for the low-level details in regions where y c has significant degradation. Based on this observation, our approach efficiently and comprehensively leverages both images, enabling the effective integration of detailed and contextual information from both y c and y r .
Figure 1. Example of restoration results for RefBFR. To obtain a coarsely restored image y c from the LQ input image, CodeFormer [16] is used as a restorer. Unlike PGDiff [13], which is significantly affected by the quality of y c , the proposed PDGrad can generate images that mitigate this drawback.
In this situation, simply summing the gradients of the losses at each level can lead to conflicting gradients. To address this issue, we establish a proper pivot gradient for the loss at each feature level and align other gradients to this pivot when conflicts arise. This approach allows us to fully harness the distinct advantages of both y c and y r . Specifically, for the loss using low-level features, the gradient from the loss using y c is prioritized, and the gradient of the loss using y r is modified by projecting it onto the plane orthogonal to y c when a conflict arises. Conversely, for the loss using high-level features, the gradient from the loss using y r is emphasized, and the gradient from the loss using y c is projected onto that of y r to avoid conflict and fully utilize the information in y r . Additionally, if the magnitude of the adjusted gradient exceeds that of the pivot gradient, it is adaptively scaled according to the ratio between the two, placing greater emphasis on the pivot. As exemplified in Figure 1, the proposed PDGrad outperforms previous methods by preserving the properties of the prioritized image at each feature level while selectively extracting elements of the properties of other images in a manner that aligns with the prioritized image.
In summary, the proposed method provides the following key contributions:
  • We propose a novel gradient adjustment method called PDGrad for RefBFR within a training-free guided diffusion framework.
  • The loss function of the proposed method consists of two components: low-level and high-level losses, where both the coarsely restored image and the reference image are fully incorporated.
  • Our proposed PDGrad establishes a proper pivot gradient for the loss at each level and adjusts other gradients to align with this pivot by modifying their direction and magnitude, thereby mitigating gradient interference.
  • Extensive comparisons show the superiority of our method against previous state-of-the-art RefBFR methods.
In this paper, we outline the organization as follows: Section 2 discusses previous works on blind face restoration. Section 3 provides detailed explanation of the proposed PDGrad. In Section 4, we compare and analyze the experimental outcomes of several methods, including our proposed approach. Finally, in Section 5, we discuss the conclusions.

3. Proposed Method

In this section, we provide a preliminary overview of the guided diffusion models to aid understanding of our proposed method in Section 3.1. We then detail the overall process of the proposed method in Section 3.2. Section 3.3 describes the proposed loss function, designed to fully leverage both the coarsely restored image and the reference image at each feature level. Lastly, in Section 3.4, our proposed PDGrad is explained, which is developed to mitigate the conflicting gradient problem.

3.1. Preliminary

3.1.1. Denoising Diffusion Probabilistic Models

Recently, diffusion models [33,34,35] are one of the probabilistic generative models that have achieved remarkable success in the field of image generation. The diffusion model consists of a forward process and a reverse process. The forward process gradually adds Gaussian noise to an input image, while the reverse process removes the noise and reconstructs the image from the noisy state.
For an unconditional diffusion model [33] with discrete steps T, there exists a transition distribution q ( x t + 1 | x t ) at each step t { 1 , 2 , 3 , · · · , T} with corresponding variance schedule  β t :
q ( x t | x t 1 ) = N ( x t ; 1 β t x t 1 , β t I ) ,
where x t 1 and x t are samples at time t 1 and t, respectively. x t is sampled using the reparameterization trick. x t can be sampled directly from x 0 :
x t = α ¯ t x 0 + 1 α ¯ t ϵ ,
where α ¯ t = i = 1 t α i , α t = 1 β t and ϵ N ( ϵ ; 0 , I ) . The sampling process begins with a pure Gaussian noise x T N ( x T ; 0 , I ) and gradually conducts the denoising step. Practically, the ideal denoising step is approximated by p θ ( x t 1 | x t ) [15] as follows.
p θ ( x t 1 | x t ) = N ( μ θ ( x t , t ) , Σ θ ( x t , t ) ) ,
where μ θ ( x t , t ) represents the mean, which is obtained as a linear combination of x t and an estimated noise ϵ θ ( x t , t ) , while Σ θ ( x t , t ) denotes the variance, a constant dependent on the pre-defined β t . From Equation (2), x ^ 0 | t can be directly computed from ϵ θ as:
x ^ 0 | t = 1 α ¯ t x t 1 α ¯ t α ¯ t ϵ θ ( x t , t ) .
ADM [15] introduces guided diffusion to control the sample generation of the diffusion model by leveraging an external classifier p ϕ ( c | x ) that predicts the conditioning information c such as class label. By utilizing the classifier, the conditional distribution for denoising step in Equation (3) is approximated as a Gaussian distribution and formulated as:
p θ , ϕ ( x t 1 | x t , c ) N ( μ θ ( x t , t ) + s Σ θ ( x t , t ) g , Σ θ ( x t , t ) ) ,
where s denotes the strength of classifier guidance for control. Here, the diffusion of the unconditional sampling distribution is guided by the gradient g towards conditional target c, which can be written as:
g = x log p ϕ ( c | x ) | x = μ θ ( x t , t ) .

3.1.2. Partial Guidance

Recently, PGDiff [13] has introduced a training-free method and utilized classifier guidance on an unconditional diffusion model for face restoration by leveraging a pre-trained network through a technique called partial guidance. Specifically, PGDiff [13] decomposes a high-quality face image into smooth semantics and high-frequency details. The smooth semantics of the face are provided by the pre-trained face restoration model, such as CodeFormer [16]. For the high-frequency details, PGDiff relies on the diffusion prior. In addition, by leveraging a reference image and incorporating identity loss into the partial guidance, PGDiff enhances the preservation of personal identity. This identity information is guided using a pre-trained face recognition network, such as ArcFace [1].

3.2. Overview of Our Method

Figure 2 illustrates an overview of the proposed process. Let y R H × W × C be the given LQ image and y r R H × W × C be the reference HQ image. Our goal is to predict a HQ image x 0 R H × W × C by adjusting conflicting gradient of the loss within a guided diffusion framework [44].
Figure 2. Overview of the proposed method. During the sampling process, the gradients are carefully adjusted by our PDGrad technique to prevent conflicts between gradients. This ensures that the diffusion process is efficiently guided, optimizing the quality and stability of the generated images.
To this end, following PGDiff [13], a coarsely restored image y c R H × W × C is first obtained by adopting a pre-trained face restoration model f ( · ) , which can be written as:
y c = f ( y ) .
However, unlike PGDiff [13], which begins the reverse process from pure Gaussian noise, our method starts from x τ , sampled from y c to enhance initialization and decrease the number of sampling steps [49]. As in Equation (2), x τ can be defined as:
x τ = α ¯ τ y c + 1 α ¯ τ ϵ ,
where τ [ 0 , T ] is a hyperparameter that determines the starting reverse process. Also, α ¯ τ = i = 1 τ α i and ϵ N ( ϵ ; 0 , I ) . Then, the reverse diffusion process is iteratively performed using the guided diffusion model [33] as follows:
x t 1 N ( μ θ ( x t , t ) s Σ θ ( x t , t ) x ^ 0 | t L t o t a l , Σ θ ( x t , t ) ) ,
where s represents the guidance strength and Σ θ ( x t , t ) is the time-dependent constant, as defined in Equation (5). x ^ 0 | t is the predicted image at timestep t (Equation (4)). x ^ 0 | t L t o t a l denotes the gradient of the total loss with respect to x ^ 0 | t . Details of L t o t a l and the computation of the corresponding gradient x ^ 0 | t L t o t a l are discussed in the following.

3.3. Loss Function

One key goal of our method is to decompose the gradients from both the coarsely restored image and the reference image into their respective low-level and high-level components and then use these as guidance. This enables our approach to effectively leverage both low-level and high-level features from the coarsely restored image and the reference image.
Our total loss L t o t a l in Equation (9) at arbitrary diffusion time t is formulated by:
L t o t a l = L l o w + L h i g h ,
where L l o w and L h i g h represent losses for low-level and high-level features, respectively. For ease of notation, we omit the denoising timestep t. The former focuses on preserving low-level information such as face shape, edges and color, while the latter focuses on promoting high-level information such as face identity.
The proposed loss is computed using the coarsely restored image y c , the reference image y r , and the predicted image x ^ 0 | t (Equation (4)) at an arbitrary diffusion time t. To explicitly incorporate face information from y r and y c into the diffusion process, various levels of intermediate features are extracted from pre-trained ArcFace [1] and VGG16 [50] networks. As discussed in [51], it is well established that the intermediate output feature maps of the early layers of a well-trained network capture the low-level information of the input image, while the later layers capture higher-level features. Motivated by this, various levels of features are extracted from pre-trained convolutional networks such as ArcFace [1] and VGG16 [50].
Specifically, let { u i ( z ) } i = 1 4 represent a set of features extracted using ArcFace [1], which is the face recognition network to determine whether two images are of the same identity. Here, u i ( z ) denotes the feature extracted from the i t h intermediate layer of the ArcFace network for the input image z R H × W × C . In the case of low-level loss, we additionally use the VGG16 [50] network trained on a dataset that reflects human perceptual similarity to better match human preferences. Let { v i ( z ) } i = 1 5 represent a set of features extracted using the VGG16 [50], where v i ( z ) denotes the feature extracted from the i t h intermediate layer of the VGG16 [50] for the input image z R H × W × C . The specific layers used in u and v are explained in the implementation details in Section 4.1. Now, we explain each loss in detail in the following subsection.

3.3.1. Low-Level Loss

The proposed low-level loss L l o w is defined as sum of two losses:
L l o w = L l o w c + L l o w r ,
where L l o w c represents the loss that measures the similarity between the low-level features of x ^ 0 | t and y c . Similarly, L l o w r represents the loss that ensures alignment of the low-level features between x ^ 0 | t and y r . L l o w c and L l o w r are defined by using pre-trained networks, including ArcFace [1] and VGG16 [50]. Concretely, L l o w c is defined as:
L l o w c = i = 1 3 d a r c ( u i ( x ^ 0 | t ) , u i ( y c ) ) + i = 1 5 d v g g ( v i ( x ^ 0 | t ) , v i ( y c ) ) .
Here, d a r c ( · , · ) [1] is defined as:
d a r c ( j 1 , j 2 ) = 1 j 1 · j 2 j 1 j 2 ,
where j 1 and j 2 represent input vectors to measure distance. The distance function d v g g ( · , · )  [50] is defined as:
d v g g ( z 1 , z 2 ) = l 1 H l W l h , w | | w l h w ( z 1 h w l z 2 h w l ) | | 2 2 ,
where z 1 and z 2 are input images being compared, and z 1 h w l and z 2 h w l represent the feature values of the feature maps in the l t h layer at the spatial location ( h , w ) for z 1 and z 2 , respectively. H l and W l denote the height and width of the feature map at the l t h layer, and w l h w is a weighting factor for the feature map difference at spatial location ( h , w ) . Symbol ⊙ represents element-wise multiplication.
Similarly, to enforce the alignment of low-level features between between x 0 | t and y r , L l o w r is formulated as:
L l o w r = i = 1 3 d a r c ( u i ( x ^ 0 | t ) , u i ( y r ) ) + i = 1 5 d v g g ( v i ( x ^ 0 | t ) , v i ( y r ) ) .

3.3.2. High-Level Loss

The proposed high-level loss L h i g h is designed to measure the identity similarity between x 0 | t and y r as well as between x 0 | t and y c . Accordingly, L h i g h is comprised of two loss terms as:
L h i g h = L h i g h c + L h i g h r ,
where L h i g h c and L h i g h r are defined as follows.
L h i g h c = d a r c ( u 4 ( x ^ 0 | t ) , u 4 ( y c ) ) .
L h i g h r = d a r c ( u 4 ( x ^ 0 | t ) , u 4 ( y r ) ) .
Here, d a r c ( · , · ) refers to the cosine distance metric defined in Equation (13).

3.4. The Proposed PDGrad

Inspired by the PCGrad [17], we propose a gradient adjustment method for guiding diffusion models in RefBFR. Similar to PGDiff [13], the unconditioned diffusion model is guided using classifier guidance. In this context, the gradient of each loss in Equation (10) acts as a specific guidance, defined by the following equation:
g t o t a l = g l o w + g h i g h ,
where g t o t a l , g l o w , and g h i g h denote the gradients x ^ 0 | t L t o t a l , x ^ 0 | t L l o w , and x ^ 0 | t L h i g h , respectively. Unlike PCGrad [17], we select a pivot gradient for each loss and adjust the other gradient by projecting it to the normal plane of the pivot gradient when a conflicting gradient occurs. This prevents the interfering component from being applied in the pivot direction.
From Equation (11), g l o w consists of g l o w c = x ^ 0 | t L l o w c and g l o w r = x ^ 0 | t L l o w r , where the former is defined using y c , and the latter is defined using y r . When the angle between g l o w c and g l o w r is larger than 90 , that is, the cosine similarity between them is a negative value, it indicates that two gradients conflict [17]. In this case, the resultant gradient g l o w would be suboptimal as guidance for the guided diffusion. Thus, the proposed gradient g l o w p d , which replaces g l o w , is defined as follows:
g l o w p d = g l o w c + g l o w r if g l o w c · g l o w r 0 , g l o w c + k l · g ^ l o w r otherwise .
When two gradients g l o w c and g l o w r do not conflict, g l o w p d is the same as the original g l o w . When they conflict, we hypothesize that y c contains more reliable information for low-level features. Thus, as shown in Figure 3, we set the pivot direction as g l o w c and define g ^ l o w r by projecting g l o w r onto the normal plane of g l o w c , which is formulated as:
g ^ l o w r = g l o w r g l o w r · g l o w c g l o w c 2 g l o w c .
Figure 3. The proposed PDGrad. We illustrate an example of calculating the proposed gradient g p d , where two input gradients are denoted as g 1 and g 2 . The pivot gradient, denoted as g 1 and represented by the red arrow, is without loss of generality. In (a), when the gradients g 1 and g 2 do not conflict, the resultant gradient g p d is defined as the simple sum of the two gradients, expressed as g p d = g 1 + g 2 . In (b), g 1 and g 2 exhibit conflicting directions. In this case, g 2 is projected onto the normal plane of the pivot gradient, resulting in g ^ 2 , where the magnitude of g ^ 2 is smaller than that of g 1 . Then, g p d is defined as g p d = g 1 + g ^ 2 . In (c), if the magnitude of g ^ 2 is larger than that of the pivot gradient g 1 , the magnitude of g ^ 2 is adjusted by scaling factor k, ensuring that it does not exceed that of g 1 . This results in g p d = g 1 + k · g ^ 2 .
The weighting factor k l in Equation (20) is defined by
k l = 1 if g l o w c g ^ l o w r , g l o w c g ^ l o w r otherwise .
Note that when the norm of the projected gradient g ^ l o w r is larger than that of g l o w c , we clip the norm of g ^ l o w r by controlling the value of k l . This weighting factor helps our model to focus on the low-level features of y c .
Similarly, g h i g h consists of g h i g h c = x ^ 0 | t L h i g h c and g h i g h r = x ^ 0 | t L h i g h r . For the high-level features, y r contains more suitable information than that of y c . In this case, we define the modified gradient g h i g h p d which replaces g h i g h as:
g h i g h p d = g h i g h c + g h i g h r if g h i g h c · g h i g h r 0 , k h · g ^ h i g h c + g h i g h r otherwise .
As shown in Figure 3, we set the pivot direction for g h i g h as g h i g h r and define g ^ h i g h c by projecting g h i g h c onto the normal plane of g h i g h r , which is formulated as:
g ^ h i g h c = g h i g h c g h i g h c · g h i g h r g h i g h r 2 g h i g h r .
The weighting factor k h in Equation (23) is defined by
k h = 1 if g h i g h r g ^ h i g h c , g h i g h r g ^ h i g h c otherwise .
When the norm of the projected gradient g ^ h i g h c is larger than that of g h i g h r , we clip the norm of g ^ h i g h c by controlling the value of k h .
Finally, the total gradient g t o t a l p d for guiding the diffusion model can be obtained by summing g l o w p d in Equation (20) and g h i g h p d in Equation (23). It is formulated as follows:
g t o t a l p d = g l o w p d + g h i g h p d .
The overall pipeline of the proposed PDGrad is described in Algorithm 1.
Algorithm 1 Restoration process of PDGrad
1:
Input: a low-quality image y, reference image y r , a diffusion model ( μ θ ( x t , t ) , Σ θ ( x t , t ) ), face restorer f ( · ) , gradient scale s and the initial timestep τ
2:
Output: restored image x 0
3:
y c f ( y )
4:
Sample x τ from q ( x τ | y c ) according to Equation (8)
5:
for t do from τ to 1
6:
     μ , Σ μ θ ( x t , t ) , Σ θ ( x t , t )
7:
     x ^ 0 | t 1 α ¯ t x t 1 α ¯ t α ¯ t ϵ θ ( x t , t )
8:
    Compute gradients g l o w p d according to Equation (20)
9:
    Compute gradients g h i g h p d according to Equation (23)
10:
     g t o t a l p d g l o w p d + g h i g h p d
11:
     x t 1 sample from N μ s Σ g t o t a l p d , Σ
12:
end for

4. Experiments

As mentioned in Li et al. [10], the RefBFR methods generally outperform single-image BFR methods, since reference image contains rich textures and the fine detains lost in the given LQ image [52,53]. Hence, in this paper, we mainly compare our proposed method with the recent reference-based BFR methods such as ASFFNet [11], DMDNet [12], and PGDiff [13]. Additionally, we report comparison with the single-image BFR methods, including VQFR [39], CodeFormer [16], RestoreFormer++ [40] and DifFace [44]. All experiments in this paper are conducted using the official models with pre-trained weights provided by the authors.
In this section, we provide the details of experimental settings in Section 4.1. In Section 4.2, we compare our proposed method with the state-of-the-art BFR methods through both qualitative and quantitative analyses. Section 4.4 presents the evaluation results of the ablation study to assess the effect of each component of our proposed approach.

4.1. Experimental Setting

4.1.1. Implementation Details

Following PGDiff [13], we utilize the pre-trained diffusion model provided by Yue et al. [44] for a fair comparison. This model is an unconditional diffusion network trained on the FFHQ dataset [54] and supports an image resolution of 512 × 512 . In our RefBFR process, we leverage CodeFormer [16] as a pre-trained face restoration model to obtain coarsely restored image from a given LQ image, which is denoted as f ( · ) in Equation (7). It is noteworthy that the proposed method employs off-the-shelf pre-trained networks which are readily accessible online, without the need for additional training. The proposed framework is implemented using Pytorch [55] and the inference process is executed on a single NVIDIA GeForce RTX 3090 GPU. Empirically, we set τ for initial guidance step to 700, and gradient scale s to 0.1. The intermediate features { u i ( z ) } i = 1 3 are extracted from layers conv 1 _ 1 , conv 2 _ 2 , and conv 3 _ 2 of ArcFace [1]. u 4 ( z ) is final output feature of ArcFace [1]. The intermediate features { v i ( z ) } i = 1 5 are extracted from layers conv 1 _ 2 , conv 2 _ 2 , conv 3 _ 3 , conv 4 _ 3 and conv 5 _ 2 of VGG16 [50].

4.1.2. Datasets

To evaluate our method, we use CelebRef-HQ dataset [12], which comprises a total of 10,555 HQ face images. This dataset includes 1005 distinct identities and each individual has between 2 and 21 images. Specifically, for our evaluation, we randomly select two images from each of the 1005 identities in the dataset. Then, one image is designated as the ground-truth HQ image, while the other image serves as the reference HQ image. Following the degradation model specified in recent BFR studies [16,27,39,56], the LQ images are synthesized as follows:
y = [ [ ( x G T k σ ) r + n δ ] J P E G ] r ,
where the ground-truth HQ image x G T is first blurred using a Gaussian kernel k σ , then downsampled by a scale factor r. Next, Gaussian noise n δ is added, followed by JPEG compression with quality factor q is applied. Lastly, the LQ image y is resized back to 512 × 512 . In this paper, we randomly sample σ , r, δ , and q from [ 0.1 , 15 ] , [ 24 , 40 ] , [ 0 , 20 ] , and [ 30 , 100 ] , respectively.

4.1.3. Evaluation Metrics

For a quantitative evaluation, we employ PSNR, SSIM [57] and NIQE [58], which are commonly used metrics in the image restoration field. Additionally, we measure LPIPS [50] to assess perceptual similarity between ground-truth images and restored images. Furthermore, FID [59] is used to quantify the distance between the feature distributions of HQ face datasets and restored images. We employ the CelebRef-HQ dataset [12] to measure the feature distributions of HQ face dataset. To measure the similarity in facial identity between the ground-truth images and the restored images, we compute the angle between their embedding vectors using ArcFace [1], denoted as Deg [39]. We also compare the landmark distance (LMD) [39], which is calculated as the average L2 distance of 98 facial landmarks predicted using Awing [60] between the ground-truth images and the restored images.

4.2. Quantitative Comparison

The quantitative comparisons of various RefBFR methods are shown in Table 1. Here, ASSFNet [11] and DMDNet [12] are face restoration methods that incorporate landmark estimation procedures. However, for certain LQ input images, these methods struggle with accurate landmark detection, preventing them from generating results and rendering testing infeasible. Therefore, to ensure a fair comparison between methods, we conduct experiments on 662 LQ input images, a subset of the CelebRef-HQ dataset [12], where testing with ASSFNet [11] and DMDNet [12] is feasible, as shown in Table 1. The results demonstrate that the proposed PDGrad achieves better performance than other methods in terms of LPIPS, Deg, LMD and NIQE. PDGrad achieved a | 0.4508 0.4437 | / 0.4508 = 1.57 % better result in terms of LPIPS compared to the second best competitive method, indicating that it consistently restores faces with perceptual quality closest to the ground truth. For fidelity, the proposed PDGrad achieved the highest performance in terms of Deg and LMD, compared to the second best method with improvements of | 55.53 53.1 | / 55.53 = 4.38 % and | 6.25 6.01 | / 6.25 = 3.84 % , respectively. This demonstrates that our method can accurately recover facial identity similarity and details. Additionally, in terms of image quality metrics such as NIQE, the proposed PDGrad outperforms DMDNet [12] by | 3.85 3.38 | / 3.85 = 12.21 % and produces more realistic details. This can be attributed to the incorporation of (1) the proposed loss function that considers perceptual quality and identity preservation and (2) the proposed gradient adjustment procedure that effectively handles the conflict of the gradients.
Table 1. Quantitative comparison of reference-based BFR methods on a subset of the CelebRef-HQ dataset [12]. The symbol ↑ in parentheses represents that the higher the value, the better. Similarly, the symbol ↓ indicates that the lower the value, the better. We highlight the best model and the second best model with the bold and underline, respectively.
In Table 2, we further compared the proposed PDGrad with PGDiff [13] using the full CelebRef-HQ dataset [12], which consists of 1005 LQ input images. Compared to PGDiff, PDGrad shows outperforming results on the LPIPS, Deg, LMD and NIQE metrics. Notably, we achieved improvements of | 56.68 53.90 | / 56.68 = 4.9 % and | 4.32 3.39 | / 4.32 = 21.53 % in Deg and NIQE, respectively. This indicates that PDGrad generates face images that are more faithful to the ground truth identity while maintaining high image quality during the guided diffusion process.
Table 2. Quantitative comparison with PGDiff on the CelebRef-HQ dataset [12]. The symbol ↑ in parentheses represents that the higher the value, the better. Similarly, the symbol ↓ indicates that the lower the value, the better. We highlight the best model and the second best model with the bold and underline, respectively.
Table 3 provides a quantitative comparison between the proposed PDGrad and single-image BFR methods on the full CelebRef-HQ dataset [12]. The results demonstrate that our method achieves better or at least comparable performance in terms of LPIPS, Deg, LMD, NIQE and FID. While CodeFormer [16] achieves a higher perceptual quality than other methods according to LPIPS, its identity similarity is significantly compromised according to Deg. Although our proposed PDGrad is slightly worse in LPIPS compared to CodeFormer [16], it excels at preserving identity similarity, as measured by Deg. Notably, our method demonstrates a significant improvement | 68.3 53.9 | / 68.3 = 21.08 % in Deg compared to the second-best model.
Table 3. Quantitative comparison of single image BFR methods on the CelebRef-HQ dataset [12]. The symbol ↑ in parentheses represents that the higher the value, the better. Similarly, the symbol ↓ indicates that the lower the value, the better. We highlight the best model and the second best model with the bold and underline, respectively.

4.3. Qualitative Comparison

Visual comparisons of RefBFR and single-image BFR methods are presented in Figure 4 and Figure 5, respectively. In each figure, the even-numbered rows provide close-up views that highlight specific details in the same areas indicated by red rectangles in the LQ input of the corresponding images in the odd-numbered rows. Figure 4 demonstrates that ASFFNet [11] and DMDNet [12] fail to preserve the identity and to produce a proper facial shape. Specifically, it can be observed that components like the eyes have been mostly restored, but there are difficulties in restoring most components such as the nose and mouth. PGDiff [13] is able to produce high-quality images, but it showed a lack of facial details in terms of preserving identity. Unlike other methods, our PDGrad is able to generate high-quality images with high fidelity in skin texture, wrinkles and eye shape.
Figure 4. Qualitative comparison of RefBFR methods on the CelebRef-HQ dataset [12]. For a better comparison of visual quality, zooming-in is recommended. From left to right, LQ input image, reference image, ASFFNet [11], DMDNet [12], PGDiff [13], the proposed PDGrad and ground truth (GT).
Figure 5. Qualitative comparison of single-image BFR methods on the CelebRef-HQ dataset [12]. For a better comparison of visual quality, zooming-in is recommended. From left to right, LQ input image, reference image, VQFR [39], CodeFormer [16], RestoreFormer++ [40], DifFace [44], PMRF [41], the proposed PDGrad and ground truth (GT).
In Figure 5, VQFR [39] and RestoreFormer++ [40] fail to produce satisfactory restoration results due to severe degradations. The results contain artifacts and lack facial details. CodeFormer [16], DifFace [44] and PMRF [41] produce high-quality images, but they also lack of facial details important for preserving identity. However, the proposed PDGrad exhibits superior performance over all other methods in restoring sharp and fine details of the face (e.g., in the eyes, nose and mouth). Moreover, the proposed method can generate identity-preserving results consistent with the GT while also improving the perceptual quality of the image.

4.4. Ablation Study

We conducted ablation studies to investigate the impact of each component in the proposed PDGrad. First, Table 4 presents the effects of the gradient adjustment components in the PDGrad, summarizing the configurations and results for each experiment. All the methods in Table 4 use the same network architectures and a loss function defined as the sum of Equations (11) and (16), with the only difference being the gradient used to guide the diffusion model. A1 in Table 4 represents a baseline model, where the total gradient in Equation (19) is obtained by simply summing of multiple gradients without any gradient adjustment. The model is then guided by this gradient during the diffusion sampling process. A2 is a method that resolves gradient conflicts by adjusting only the gradient direction. This is achieved by projecting the gradient to the normal plane of the pivot gradient when conflicts occur. In A2, the values of both k l and k h are fixed as 1 in Equations (20) and (23) for all cases, respectively. Compared to A1, A2 shows the improvements of | 0.4513 0.4499 | / 0.4513 = 0.31 % in LPIPS, | 59.24 53.90 | / 59.24 = 9.01 % in Deg, | 6.49 6.36 | / 6.49 = 2 % in LMD and | 3.44 3.39 | / 3.44 = 1.45 % in NIQE, respectively. These improvements highlight that adjustment of the gradient direction in PDGrad effectively mitigates the conflicts between gradients arising from multiple losses. Consequently, the diffusion process is more efficiently guided, enhancing the quality of the generated images. PDGrad is our proposed method, which is built upon A2 by additionally applying an adaptive scaling for k h and k l to ensure that the magnitude of the projected gradient does not exceed that of the pivot gradient. This adaptive scaling is designed to enhance the influence of the pivot gradient when applying gradient adjustment. As a result, PDGrad not only adjusts the gradient direction towards the pivot, but also preserves the influence of the pivot gradient by adjusting the magnitude of other gradients. This leads to the restoration of images with both perceptually improved and enhanced fidelity. Consequently, compared to A2, PDGrad shows further improvements of | 0.4506 0.4499 | / 0.4506 = 0.16 % in LPIPS and | 54.02 53.90 | / 54.02 = 0.22 % in Deg.
Table 4. Ablation study on the gradient adjustment components of the proposed PDGrad using the Celebref-HQ dataset [12]. The symbol ↑ in parentheses represents that the higher the value, the better. Similarly, the symbol ↓ indicates that the lower the value, the better. We highlight the best model and the second best model with the bold and underline, respectively.
To effectively guide detailed facial information, PDGrad defines the low-level loss by combining two components, d a r c and d v g g , as shown in Equations (12) and (15). As shown in Table 5, to evaluate the impact of each component in these equations, we performed an additional ablation study by using either d a r c or d v g g in Equations (12) and (15). The experiment was performed by varying only the low-level loss component, while applying gradient adjustment, including gradient projection and adaptive scaling. A3 shows results using d a r c exclusively in both Equations (12) and (15), while A4 demonstrates results using d v g g exclusively in those equations. When A3 and A4 are compared to PDGrad, the results of PDGrad show an improvement in the LPIPS by | 0.4616 0.4499 | / 0.4616 = 2.53 % and | 0.4632 0.4499 | / 0.4632 = 2.87 % , respectively. Additionally, the Deg score improves by | 60.60 53.90 | / 60.60 = 11.06 % and | 61.87 53.90 | / 61.87 = 12.88 % , respectively. These improvements indicate that using both d a r c and d v g g together is more effective in guiding perceptual and fidelity information than using either component alone.
Table 5. Ablation study on the components of the loss function in the proposed PDGrad using the Celebref-HQ dataset [12]. The symbol ↑ in parentheses represents that the higher the value, the better. Similarly, the symbol ↓ indicates that the lower the value, the better. We highlight the best model and the second best model with the bold and underline, respectively.
As shown in Table 6, we conducted further experiments to explore the effects of the input images by using either the coarsely restored image y c from CodeFormer [16] or the reference image y r individually for gradient computation in the diffusion sampling process. This setup represents an extreme scenario where the angle between two input gradients derived from y c and y r , respectively, are aligned in the same direction. This alignment occurs when y c and y r are identical, resulting in the same result as if either image were used alone. In Table 6, A5 represents the case of using only y c , where g l o w p d and g h i g h p d in Equations (20) and (23) are set to g l o w c and g h i g h c , respectively. Similarly, A6 represents the case of using only y r , where g l o w p d and g h i g h p d in Equations (20) and (23) are set to g l o w r and g h i g h r , respectively. Our results confirm that PDGrad, which utilizes both y c and y r , as presented in Table 6, outperformed the other models across most metrics.
Table 6. Ablation study on the input images used in the proposed PDGrad with the Celebref-HQ dataset [12]. The symbol ↑ in parentheses represents that the higher the value, the better. Similarly, the symbol ↓ indicates that the lower the value, the better. We highlight the best model and the second best model with the bold and underline, respectively.

5. Conclusions

In this paper, we present a Pivot Direction Gradient (PDGrad), a novel gradient adjustment method designed to enhance reference-based blind face restoration within the guided diffusion framework. By focusing on the issue of conflicting gradients in multi loss-based guidance, the proposed method aligns gradients across different feature levels, ensuring both low-level and high-level facial characteristics are accurately restored. Through comprehensive experiments, we have demonstrated that the proposed method consistently outperforms existing methods, offering robust solution for reference-based blind face restoration. This advancement highlights the potential of gradient adjustment techniques for guided-diffusion models and the broader image restoration field.

Author Contributions

Conceptualization, G.M., T.B.L. and Y.S.H.; software, G.M. and T.B.L.; validation, Y.S.H.; investigation, G.M.; writing—original draft preparation, G.M., T.B.L. and Y.S.H.; writing—review and editing, Y.S.H.; supervision, Y.S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant 2022R1F1A1065702 and in part by the Institute of Information & Communications Technology Planning & Evaluation (IITP) under the Artificial Intelligence Convergence Innovation Human Resources Development (IITP-2024-RS-2023-00255968) grant funded by the Korean government (MSIT).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the anonymous reviewers for their constructive comments and recommendations.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Deng, J.; Guo, J.; Xue, N.; Zafeiriou, S. Arcface: Additive angular margin loss for deep face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4690–4699. [Google Scholar]
  2. Sun, Y.; Cheng, C.; Zhang, Y.; Zhang, C.; Zheng, L.; Wang, Z.; Wei, Y. Circle loss: A unified perspective of pair similarity optimization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 6398–6407. [Google Scholar]
  3. Li, J.; Zhang, B.; Wang, Y.; Tai, Y.; Zhang, Z.; Wang, C.; Li, J.; Huang, X.; Xia, Y. ASFD: Automatic and scalable face detector. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 2139–2147. [Google Scholar]
  4. Deng, J.; Guo, J.; Zhou, Y.; Yu, J.; Kotsia, I.; Zafeiriou, S. Retinaface: Single-stage dense face localisation in the wild. arXiv 2019, arXiv:1905.00641. [Google Scholar]
  5. Qi, D.; Tan, W.; Yao, Q.; Liu, J. YOLO5Face: Why reinventing a face detector. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 228–244. [Google Scholar]
  6. Kuprashevich, M.; Tolstykh, I. Mivolo: Multi-input transformer for age and gender estimation. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts, Yerevan, Armenia, 28–30 September 2023; Springer: Berlin/Heidelberg, Germany, 2023; pp. 212–226. [Google Scholar]
  7. Shin, N.H.; Lee, S.H.; Kim, C.S. Moving window regression: A novel approach to ordinal regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 18760–18769. [Google Scholar]
  8. Dogan, B.; Gu, S.; Timofte, R. Exemplar guided face image super-resolution without facial landmarks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
  9. Li, X.; Liu, M.; Ye, Y.; Zuo, W.; Lin, L.; Yang, R. Learning warped guidance for blind face restoration. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 272–289. [Google Scholar]
  10. Li, X.; Chen, C.; Zhou, S.; Lin, X.; Zuo, W.; Zhang, L. Blind face restoration via deep multi-scale component dictionaries. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 399–415. [Google Scholar]
  11. Li, X.; Li, W.; Ren, D.; Zhang, H.; Wang, M.; Zuo, W. Enhanced blind face restoration with multi-exemplar images and adaptive spatial feature fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2706–2715. [Google Scholar]
  12. Li, X.; Zhang, S.; Zhou, S.; Zhang, L.; Zuo, W. Learning dual memory dictionaries for blind face restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 5904–5917. [Google Scholar] [CrossRef] [PubMed]
  13. Yang, P.; Zhou, S.; Tao, Q.; Loy, C.C. PGDiff: Guiding Diffusion Models for Versatile Face Restoration via Partial Guidance. Adv. Neural Inf. Process. Syst. 2024, 36, 1–21. [Google Scholar]
  14. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  15. Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
  16. Zhou, S.; Chan, K.; Li, C.; Loy, C.C. Towards robust blind face restoration with codebook lookup transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 30599–30611. [Google Scholar]
  17. Yu, T.; Kumar, S.; Gupta, A.; Levine, S.; Hausman, K.; Finn, C. Gradient surgery for multi-task learning. Adv. Neural Inf. Process. Syst. 2020, 33, 5824–5836. [Google Scholar]
  18. Chen, Y.; Tai, Y.; Liu, X.; Shen, C.; Yang, J. Fsrnet: End-to-end learning face super-resolution with facial priors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2492–2501. [Google Scholar]
  19. Kim, D.; Kim, M.; Kwon, G.; Kim, D.S. Progressive face super-resolution via attention to facial landmark. arXiv 2019, arXiv:1908.08239. [Google Scholar]
  20. Shen, Z.; Lai, W.S.; Xu, T.; Kautz, J.; Yang, M.H. Deep semantic face deblurring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8260–8269. [Google Scholar]
  21. Chen, C.; Li, X.; Yang, L.; Lin, X.; Zhang, L.; Wong, K.Y.K. Progressive semantic-aware style transformation for blind face restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 11896–11905. [Google Scholar]
  22. Lee, T.B.; Jung, S.H.; Heo, Y.S. Progressive semantic face deblurring. IEEE Access 2020, 8, 223548–223561. [Google Scholar] [CrossRef]
  23. Han, S.; Lee, T.B.; Heo, Y.S. Semantic-Aware Face Deblurring with Pixel-Wise Projection Discriminator. IEEE Access 2023, 11, 11587–11600. [Google Scholar] [CrossRef]
  24. Ren, W.; Yang, J.; Deng, S.; Wipf, D.; Cao, X.; Tong, X. Face video deblurring using 3D facial priors. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 9388–9397. [Google Scholar]
  25. Hu, X.; Ren, W.; LaMaster, J.; Cao, X.; Li, X.; Li, Z.; Menze, B.; Liu, W. Face super-resolution guided by 3d facial priors. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part IV 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 763–780. [Google Scholar]
  26. Zhu, F.; Zhu, J.; Chu, W.; Zhang, X.; Ji, X.; Wang, C.; Tai, Y. Blind face restoration via integrating face shape and generative priors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 7662–7671. [Google Scholar]
  27. Wang, X.; Li, Y.; Zhang, H.; Shan, Y. Towards real-world blind face restoration with generative facial prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 9168–9178. [Google Scholar]
  28. Lau, Y.F.; Zhang, T.; Rao, Z.; Chen, Q. ENTED: Enhanced Neural Texture Extraction and Distribution for Reference-based Blind Face Restoration. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024; pp. 5162–5171. [Google Scholar]
  29. Varanka, T.; Toivonen, T.; Tripathy, S.; Zhao, G.; Acar, E. PFStorer: Personalized Face Restoration and Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024; pp. 2372–2381. [Google Scholar]
  30. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 139–144. [Google Scholar]
  31. Van Den Oord, A.; Vinyals, O. Neural discrete representation learning. Adv. Neural Inf. Process. Syst. 2017, 30, 6309–6318. [Google Scholar]
  32. Esser, P.; Rombach, R.; Ommer, B. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 12873–12883. [Google Scholar]
  33. Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
  34. Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  35. Nichol, A.Q.; Dhariwal, P. Improved denoising diffusion probabilistic models. In Proceedings of the International Conference on Machine Learning (PMLR), Virtual, 18–24 July 2021; pp. 8162–8171. [Google Scholar]
  36. Menon, S.; Damian, A.; Hu, S.; Ravi, N.; Rudin, C. Pulse: Self-supervised photo upsampling via latent space exploration of generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2437–2445. [Google Scholar]
  37. Gu, J.; Shen, Y.; Zhou, B. Image processing using multi-code gan prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3012–3021. [Google Scholar]
  38. Yang, T.; Ren, P.; Xie, X.; Zhang, L. Gan prior embedded network for blind face restoration in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 672–681. [Google Scholar]
  39. Gu, Y.; Wang, X.; Xie, L.; Dong, C.; Li, G.; Shan, Y.; Cheng, M.M. Vqfr: Blind face restoration with vector-quantized dictionary and parallel decoder. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 126–143. [Google Scholar]
  40. Wang, Z.; Zhang, J.; Chen, T.; Wang, W.; Luo, P. RestoreFormer++: Towards real-world blind face restoration from undegraded key-value pairs. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 15462–15476. [Google Scholar] [CrossRef]
  41. Ohayon, G.; Michaeli, T.; Elad, M. Posterior-Mean Rectified Flow: Towards Minimum MSE Photo-Realistic Image Restoration. arXiv 2024, arXiv:2410.00418. [Google Scholar]
  42. Kawar, B.; Elad, M.; Ermon, S.; Song, J. Denoising diffusion restoration models. Adv. Neural Inf. Process. Syst. 2022, 35, 23593–23606. [Google Scholar]
  43. Wang, Y.; Yu, J.; Zhang, J. Zero-shot image restoration using denoising diffusion null-space model. arXiv 2022, arXiv:2212.00490. [Google Scholar]
  44. Yue, Z.; Loy, C.C. Difface: Blind face restoration with diffused error contraction. arXiv 2022, arXiv:2212.06512. [Google Scholar] [CrossRef]
  45. Fei, B.; Lyu, Z.; Pan, L.; Zhang, J.; Yang, W.; Luo, T.; Zhang, B.; Dai, B. Generative diffusion prior for unified image restoration and enhancement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 9935–9946. [Google Scholar]
  46. Wang, Z.; Zhang, Z.; Zhang, X.; Zheng, H.; Zhou, M.; Zhang, Y.; Wang, Y. Dr2: Diffusion-based robust degradation remover for blind face restoration. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1704–1713. [Google Scholar]
  47. Suin, M.; Nair, N.G.; Lau, C.P.; Patel, V.M.; Chellappa, R. Diffuse and Restore: A Region-Adaptive Diffusion Model for Identity-Preserving Blind Face Restoration. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 4–8 January 2024; pp. 6343–6352. [Google Scholar]
  48. Lu, X.; Hu, X.; Luo, J.; Ren, W. 3D Priors-Guided Diffusion for Blind Face Restoration. In Proceedings of the ACM Multimedia, Melbourne, Australia, 28 October–1 November 2024. [Google Scholar]
  49. Chung, H.; Sim, B.; Ye, J.C. Come-closer-diffuse-faster: Accelerating conditional diffusion models for inverse problems through stochastic contraction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 12413–12422. [Google Scholar]
  50. Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
  51. Peng, X.; Zhang, X.; Li, Y.; Liu, B. Research on image feature extraction and retrieval algorithms based on convolutional neural network. J. Vis. Commun. Image Represent. 2020, 69, 102705. [Google Scholar] [CrossRef]
  52. Zheng, H.; Ji, M.; Wang, H.; Liu, Y.; Fang, L. Crossnet: An end-to-end reference-based super resolution network using cross-scale warping. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 88–104. [Google Scholar]
  53. Zhang, Z.; Wang, Z.; Lin, Z.; Qi, H. Image super-resolution by neural texture transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 7982–7991. [Google Scholar]
  54. Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 4401–4410. [Google Scholar]
  55. Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. Adv. Neural Inf. Process. Syst. 2019, 32, 8026–8037. [Google Scholar]
  56. Wang, Z.; Zhang, J.; Chen, R.; Wang, W.; Luo, P. Restoreformer: High-quality blind face restoration from undegraded key-value pairs. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 17512–17521. [Google Scholar]
  57. Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
  58. Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
  59. Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30, 6629–6640. [Google Scholar]
  60. Wang, X.; Bo, L.; Fuxin, L. Adaptive wing loss for robust face alignment via heatmap regression. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6971–6981. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.