A Joint Training Model for Face Sketch Synthesis

Featured Application: The proposed face sketch synthesis method can be applied for various applications, such as law enforcement and digital entertainment. Abstract: The exemplar-based method is most frequently used in face sketch synthesis because of its e ﬃ ciency in representing the nonlinear mapping between face photos and sketches. However, the sketches synthesized by existing exemplar-based methods su ﬀ er from block artifacts and blur e ﬀ ects. In addition, most exemplar-based methods ignore the training sketches in the weight representation process. To improve synthesis performance, a novel joint training model is proposed in this paper, taking sketches into consideration. First, we construct the joint training photo and sketch by concatenating the original photo and its sketch with a high-pass ﬁltered image of their corresponding sketch. Then, an o ﬄ ine random sampling strategy is adopted for each test photo patch to select the joint training photo and sketch patches in the neighboring region. Finally, a novel locality constraint is designed to calculate the reconstruction weight, allowing the synthesized sketches to have more detailed information. Extensive experimental results on public datasets show the superiority of the proposed joint training model, both from subjective perceptual and the FaceNet-based face recognition objective evaluation, compared to existing state-of-the-art sketch synthesis methods.


Introduction
Face sketch synthesis is a key branch of face style transformation, which generates face sketches for given input photos with the help of face photo-sketch pairs as the training dataset [1].It has achieved wide applications in both law enforcement and digital entertainment.For example, sketches drawn according to the description of victims or witnesses can help identify a suspect by matching the sketch against a mugshot dataset from a police department.Face sketch synthesis reduces the texture discrepancy between photos and sketches for the face recognition procedure [2] and thus increases the recognition accuracy [3].In digital entertainment, people are increasingly preferring to use face sketches as their portrait in social media; the sketch synthesis technique can also simplify animation production [4].
During the past two decades, various sketch synthesis methods have been proposed.The exemplarbased method is an important category of existing synthesis approaches.It synthesizes sketches for test photos by utilizing photo-sketch pairs as training data.The exemplar-based method mainly consists of neighbor selection and reconstruction weight representation [5].In the neighbor selection process, K nearest training photo patches are selected for a test photo patch.In the reconstruction weight representation, a weight vector between the test photo patch and the selected photo patches is calculated.The target sketch patch can be obtained using weighted averaging of the K training sketch patches corresponding to the selected photo patches with the calculated weight vector.The final sketch is obtained by averaging all the generated sketch patches.
Exemplar-based face sketch synthesis originates from the Eigen-transformation research by Tang et al. [6].In their work, all training photo-sketch pairs were used to generalize the target sketch.Principal component analysis was adopted to learn the weight coefficients by projecting the input test photo onto the training photos.
It is difficult to represent nonlinear relationships between face photos and sketches by only learning one holistic reconstruction model.Thus, Liu et al. [7] proposed a locally linear embedding (LLE)-based sketch synthesis method to estimate the nonlinear mapping with piecewise linear mappings.The LLE method works at the image patch level, in which K nearest training photo patches are searched in terms of Euclidean distance for each test photo patch.However, the LLE method suffers from a serious noise problem.To resolve this problem, Song et al. [8] formulated face sketch synthesis into a spatial sketch denoising problem and calculated the reconstruction weight using the conjugate gradient solver.
To describe the dependency relationship between neighboring sketches, Wang et al. [9] introduced a multi-scale Markov random fields (MRF) model to represent the neighboring constraint between adjacent sketch patches using a compatibility function.However, this method only chose the best single sketch patch from the training data for the test photo patch, meaning it could not synthesize new sketch patches.Additionally, the optimization process in the MRF model is an NP-hard problem.To overcome these limitations, Zhou et al. [10] extended the MRF model by introducing the linear combination of nearest neighbors to structure a Markov weight fields (MWF) model, which is capable of synthesizing new sketch patches that do not exist in the training dataset.In reference [11], a sparse representation-based face sketch synthesis method was proposed by Gao et al.Peng et al. [12] proposed a multiple representation-based face sketch synthesis method, which is able to obtain high-quality sketch images.However, this method is time-consuming because of the online neighbor selection.
Recently, Wang et al. [13] proposed a state-of-the-art face sketch synthesis method, based on random sampling and locality constraint (RSLCR).They randomly sampled the training photo and sketch patches in place of a neighbor search, and then employed the locality constraint to model the distinct correlations between the test photo patch and sampled photo patches while calculating the reconstruction weight coefficients.However, the target sketch patch reconstruction obtained by weighted-averaging the hundreds of sampled sketches can be regarded as a low-pass filter process, which results in blurred synthesized sketches.In addition, the Bayesian inference was utilized in reference [14] to incorporate the neighboring constraint in both the neighbor selection and reconstruction weight representation, which can obtain impressive performance.
Apart from the exemplar-based method, deep learning techniques were also applied to face sketch synthesis, generating new trends.Zhang et al. [15] first proposed a fully convolutional network (FCN) to learn end-to-end mapping from photos to sketches.It consisted of six convolutional layers with rectified linear units as activation functions.Zhang et al. [16] utilized the branched FCN to structure a decomposition representation learning framework for sketch synthesis.Additionally, the generative adversarial network (GAN) [17] was developed for image style transformation (e.g., photo-to-sketch generation or vice versa).The deep learning-based sketch synthesis methods can preserve textural structure well; however, serious noise effects occur in the synthesized results.This is mainly because of the limited available of training data, which is insufficient to train large networks robustly [18].
A common problem with these exemplar-based approaches is that they ignore the role of training sketches when calculating reconstruction weights.This is because the basic assumption of these exemplar-based methods is that a photo patch and its corresponding sketch patch have a similar geometric manifold structure.If two photo patches are similar, then their sketch patch counterparts are also similar.However, owing to potential misalignment, the reconstruction weight obtained from the test photo patch and selected training photo patches may not be suitable for sketch patches reconstruction [19].
We propose a new exemplar-based method, the joint training model, to solve the problem.In our method, the training photo and the sketch patches are concatenated.Instead of directly using the sketch patches, the high-pass filtered component of the training sketches are adopted to reduce the effect of the modality difference between a photo and a sketch.Then, we employ the offline random sampling method to select the joint training photo and the sketch patches for the test photo patch.Moreover, a modified locality constraint is designed to calculate the reconstruction weight.With the obtained reconstruction weight, the target sketch patch can be synthesized.Experimental results indicate that the proposed joint model significantly eliminates noise and improves the synthesized sketch quality.It also preserves the detail information of the test photo, which other methods cannot do.Therefore, synthesized face sketch images using our method achieve higher accuracy in face sketch recognition.
The contributions of this paper are summarized as below.
(1) To consider the training sketches during the reconstruction weight representation process, a joint training model is proposed to integrate the training photo and sketch information.(2) We design a modified locality constraint that modulates the reconstruction weight through the distance between the high-pass filtered images of test patches and the sampled training sketch patches.(3) The proposed method yields high quality sketches with more detail information and less noise over the wide range of datasets, promoting the accuracy of the sketch-based suspect identification.
The organization of the rest of the paper is as follows.Section 2 introduces the relevant works, including some example-based sketch synthesis methods, which are the basic works of the proposed method.The proposed model is described in detail in Section 3. Section 4 provides a comparison of experiments and their results.Conclusions are then given in Section 5.

Technical Backgrounds
In this paper, excepted as noted, a bold uppercase letter and a bold lowercase letter represent a matrix and a column vector, respectively; regular uppercase and lowercase letters denote scalars.Given a test photo T, it is divided into patches t (i,j) with r pixels overlapping between neighboring patches.(i, j) denotes the location of the patch at the i-th row and the j-th column, i ∈ {1, • • • , m}, j ∈ {1, • • • , n}.Notice that each patch is represented as a q-dimensional column vector, where q = p 2 , and p is the size of the patch.Similarly, the target sketch is denoted as S. s (i,j) denotes the target sketch patch corresponding to the testing patch t (i,j) .The training dataset, which consists of M photo-sketch pairs, are similarly divided into patches.Let X (i,j) = {x denote the set of K selected training photo patches and the corresponding sketch patches of the test photo patch t (i,j) , respectively.The weight coefficients w (i,j) = (w T are calculated to linearly combine the candidate sketch patches.

The LLE Method
In the LLE method [6], K nearest patches are first obtained for each test photo patch t (i,j) .Then, the linear reconstruction coefficients w (i,j) can be calculated by resolving the following minimization problem: min Then the target sketch patch s (i,j) can then be synthesized.
After all target sketch patches are generated, the final sketch can be achieved by averaging the overlapped pixel intensities.

The RSLCR Method
Instead of searching the nearest neighbor photo patches, the RSLCR method [13] proposed to randomly sample K photo-sketch patch pairs from training data in a predicted neighbor region for the test patch t (i,j) .Additionally, to consider the correlation between different sampled patches, a locality constraint [20] was introduced to impose a weight to the distances of the test photo patch and random sampled photo patches.The reconstruction weight representation model of the RSLCR method can be written as follows.min where denotes element-wise multiplication, λ balances the reconstruction error and the locality constraint, and d (i,j) is the Euclidean distance vector between the test photo patch t (i,j) and sampled training photo patches X (i,j) .

Joint Training Model for Face Sketch Synthesis
In most exemplar-based face sketch synthesis methods, only the test photo and training photos are considered for selecting the candidate photo patches and calculating the reconstruction weight.This strategy cannot achieve an optimal result when the training photo and sketch are misaligned.In this paper, we put forward a novel exemplar-based face sketch synthesis approach, which takes the training sketch into account by joining it with its corresponding training photo for reconstruction weight representation.
Figure 1 shows the illustration of the proposed face sketch synthesis method.First, the high-pass filtered components of the sketch images in the training data are extracted by using the Laplacian of Gaussian (LoG) filter.Then, the extracted high-pass filtered components are attached to their corresponding photo and sketch images to form the joint training data.After that, the candidate joint patch pairs are sampled with the offline random sampling strategy.In the test phase, the high-pass filtered component of the input photo is extracted with the same filter and attached to the input photo to obtain the joint test photo.For each joint test patch, a reconstruction weight can be calculated by approximating the joint test patch and joint training photos.Finally, the target patch can be constructed by linearly combining the sampled joint training sketches with the reconstruction weight and the final sketch image can be synthesized by averaging the overlapping sketch patches.The strategy to structure the joint training model and the way to calculate the reconstruction weight coefficients of the proposed method will be explained next in detail.

The RSLCR Method
Instead of searching the nearest neighbor photo patches, the RSLCR method [13] proposed to randomly sample K photo-sketch patch pairs from training data in a predicted neighbor region for the test patch ( , )   i j t .Additionally, to consider the correlation between different sampled patches, a locality constraint [20] was introduced to impose a weight to the distances of the test photo patch and random sampled photo patches.The reconstruction weight representation model of the RSLCR method can be written as follows.
where  denotes element-wise multiplication, λ balances the reconstruction error and the locality constraint, and ( , ) i j d is the Euclidean distance vector between the test photo patch ( , )   i j t and sampled training photo patches ( , )   i j X .

Joint Training Model for Face Sketch Synthesis
In most exemplar-based face sketch synthesis methods, only the test photo and training photos are considered for selecting the candidate photo patches and calculating the reconstruction weight.This strategy cannot achieve an optimal result when the training photo and sketch are misaligned.In this paper, we put forward a novel exemplar-based face sketch synthesis approach, which takes the training sketch into account by joining it with its corresponding training photo for reconstruction weight representation.
Figure 1 shows the illustration of the proposed face sketch synthesis method.First, the high-pass filtered components of the sketch images in the training data are extracted by using the Laplacian of Gaussian (LoG) filter.Then, the extracted high-pass filtered components are attached to their corresponding photo and sketch images to form the joint training data.After that, the candidate joint patch pairs are sampled with the offline random sampling strategy.In the test phase, the high-pass filtered component of the input photo is extracted with the same filter and attached to the input photo to obtain the joint test photo.For each joint test patch, a reconstruction weight can be calculated by approximating the joint test patch and joint training photos.Finally, the target patch can be constructed by linearly combining the sampled joint training sketches with the reconstruction weight and the final sketch image can be synthesized by averaging the overlapping sketch patches.The strategy to structure the joint training model and the way to calculate the reconstruction weight coefficients of the proposed method will be explained next in detail.

Joint Training Model
For each training sketch, the corresponding high-pass filtered component is first obtained with a LoG filter.Let Z (i,j) = {z (i,j) k } K k=1 denote the set of K randomly selected high-pass filtered image patches of the training sketches corresponding to a test patch t (i,j) .Then, the sampled K joint training photo and sketch patches for t (i,j) can be denoted as Equations ( 4) and ( 5), respectively.
For the test photo T, the high-pass image H is obtained with a LoG filter.Let h (i,j) denotes the high-pass image patch corresponding to the test photo patch t (i,j) .We concatenate these two patches as the joint test patch t (i,j) After modeling the joint training photos and sketches, the reconstruction weight can be calculated by approximating the joint test patch and joint training photos, which will be discussed next.

Face Sketch Synthesis
Assuming there are M pairs of joint training photos and sketches that are geometrically aligned, they are divided into patches of fixed size (p × p × 2).Each joint patch is reshaped to a 2q-dimensional column vector.For each joint test patch location, we extend the search region around the patch with c pixels.Thus, there are (2c + 1) 2 patches in the search region for one patch location, and there are (2c + 1) 2 M joint training photo/sketch patch-pairs.Among these patch-pairs, K joint training photo patches U (i,j) ∈ R 2p 2 ×K and joint training sketch patches V (i,j) ∈ R 2p 2 ×K are randomly and simultaneously selected.
For each joint test patch t (i,j) 1 , the reconstruction weight is calculated as follow: min where w (i,j) ∈ R K×1 is the weight coefficient for the joint test patch t (i,j) ∈ R K×1 is the Euclidean distance vector between the joint test patch t (i,j) and sampled joint training photo patches U (i,j) , and ∈ R K×1 is the Euclidean distance vector between the LoG test patch image h (i,j) and K sampled LoG training sketch patches Z (i,j) .Equation ( 7) has the closed-form solution: where 1 is a column vector in which all elements are 1.C i,j = (U (i,j) − 1t ) T denotes the data covariance matrix, and diag(•) extends the vector into a diagonal matrix.
The target joint sketch patch s (i,j) can be synthesized by linearly combining the sampled joint training sketches with the reconstruction weight coefficients w (i,j) .
The obtained joint sketch patch is a 2q-dimensional vector.Thus, we extract the first half and reshape it to a p × p patch, which is the target sketch patch.After obtaining all target sketch patches, the final target sketch can be achieved with an averaged overlapping area.

Datasets
We evaluated the performance of the proposed method on two publicly available datasets: The Chinese University of Hong Kong (CUHK) face sketch (CUFS) dataset [9] and the CUHK face sketch FERET (CUFSF) dataset [21].The CUFS dataset includes three sub-datasets: The CUHK student dataset (188 subjects) [22], the AR dataset (123 subjects) [23], and the XM2VTS dataset (295 subjects) [24].The CUFSF dataset includes 1194 subjects from the FERET dataset [25].Artist drew sketches corresponding to each face photo in these datasets.In our experiments, all face images were normalized into the size of 200×250 by centering the coordinates on two eyes and the mouth.Some face photo-sketch pairs from these two datasets are shown in Figure 2.
FERET (CUFSF) dataset [21].The CUFS dataset includes three sub-datasets: The CUHK student dataset (188 subjects) [22], the AR dataset (123 subjects) [23], and the XM2VTS dataset (295 subjects) [24].The CUFSF dataset includes 1194 subjects from the FERET dataset [25].Artist drew sketches corresponding to each face photo in these datasets.In our experiments, all face images were normalized into the size of 200×250 by centering the coordinates on two eyes and the mouth.Some face photo-sketch pairs from these two datasets are shown in Figure 2.

Experimental Setting
In this section, the data distribution and parameter settings are introduced.= 5, the number for random sampling was K = 800, and the regularization parameter λ and μ were both set to 0.5.The window size and standard deviation of LoG filter were 5 and 0.5, respectively.To evaluate performance, four traditional exemplar-based methods (LLE [7], MRF [9], MWF [10], and RSLCR [13]) and two deep learning-based methods (FCN [15] and GAN [17]) were compared.These six state-of-the-art face sketch synthesis results are released by Wang et al. [13].Parameters were set as follow.Patch size was p = 20, overlap size was o = 14, search length was c = 5, the number for random sampling was K = 800, and the regularization parameter λ and µ were both set to 0.5.The window size and standard deviation of LoG filter were 5 and 0.5, respectively.

Synthesizesd Sketch Results Comparison
Figure 3 shows some synthesized face sketches from different methods on the CUFS dataset.The first two rows are from the CUHK student dataset, the middle two rows are from the AR dataset, and the last two rows are from the XM2VTS dataset.Moreover, the corresponding local blocks of the synthesized sketches are displayed in Figure 4. From Figure 3, it can be seen that the sketches synthesized by the LLE and MRF methods suffer serious block effects.The MWF method obtains better performance than the LLE and MRF methods in the CUHK student and AR datasets.However, the results are unsatisfying in the XM2VTS dataset, because it contains more face variations, such as aging, race, and hair styles.The RSLCR method generates fine textures and structures, because more candidate patches are incorporated via random sampling and the locality constraint.However, this method results in blurred outputs.The FCN and GAN methods overcome the blurring effect by using pixel-to-pixel mapping from a photo to a sketch.However, they tend to have undesirable artifacts because of instabilities in training, while generating high-resolution images.Our proposed method achieves much better performance than these six benchmarked methods in all the three datasets.More detailed information is preserved in the synthesized sketches, such as the double-fold eyelids and hair grain.This illustrates that the proposed method is capable of generating identity-preserved sketches.[7], Markov random fields (MRF) [9], Markov weight fields (MWF) [10], random sampling and locality constraint (RSLCR) [13], fully convolutional network (FCN) [15], generative adversarial network (GAN) [17], and our proposed method, respectively.Face photos in the first two rows are from the CUHK student dataset; second two rows are from the AR dataset; and the last two rows are from the XM2VTS dataset, respectively.

Synthesizesd Sketch Results Comparison
Figure 3 shows some synthesized face sketches from different methods on the CUFS dataset.The first two rows are from the CUHK student dataset, the middle two rows are from the AR dataset, and the last two rows are from the XM2VTS dataset.Moreover, the corresponding local blocks of the synthesized sketches are displayed in Figure 4. From Figure 3, it can be seen that the sketches synthesized by the LLE and MRF methods suffer serious block effects.The MWF method obtains better performance than the LLE and MRF methods in the CUHK student and AR datasets.However, the results are unsatisfying in the XM2VTS dataset, because it contains more face variations, such as aging, race, and hair styles.The RSLCR method generates fine textures and structures, because more candidate patches are incorporated via random sampling and the locality constraint.However, this method results in blurred outputs.The FCN and GAN methods overcome the blurring effect by using pixel-to-pixel mapping from a photo to a sketch.However, they tend to have undesirable artifacts because of instabilities in training, while generating high-resolution images.Our proposed method achieves much better performance than these six benchmarked methods in all the three datasets.More detailed information is preserved in the synthesized  [7], Markov random fields (MRF) [9], Markov weight fields (MWF) [10], random sampling and locality constraint (RSLCR) [13], fully convolutional network (FCN) [15], generative adversarial network (GAN) [17], and our proposed method, respectively.Face photos in the first two rows are from the CUHK student dataset; second two rows are from the AR dataset; and the last two rows are from the XM2VTS dataset, respectively.sketches, such as the double-fold eyelids and hair grain.This illustrates that the proposed method is capable of generating identity-preserved sketches.We also investigated the robustness of the proposed method against shape exaggeration and illumination variations on the CUFSF dataset.Figure 5 shows some synthesized face sketches from different methods used on the CUFSF dataset.The block effect still exists in the LLE and MRF results.The MWF and RSLCR methods obtain similar performance on the CUFSF dataset.The FCN method suffers from serious noise and artifacts.The GAN results show much improvement, but We also investigated the robustness of the proposed method against shape exaggeration and illumination variations on the CUFSF dataset.Figure 5 shows some synthesized face sketches from different methods used on the CUFSF dataset.The block effect still exists in the LLE and MRF results.The MWF and RSLCR methods obtain similar performance on the CUFSF dataset.The FCN method suffers from serious noise and artifacts.The GAN results show much improvement, but distortion occurs in the synthesized sketches by this method.By comparison, the proposed method achieves the most vivid and clear sketches, reflecting the robustness of our proposed method.We also investigated the robustness of the proposed method against shape exaggeration and illumination variations on the CUFSF dataset.Figure 5 shows some synthesized face sketches from different methods used on the CUFSF dataset.The block effect still exists in the LLE and MRF results.The MWF and RSLCR methods obtain similar performance on the CUFSF dataset.The FCN method suffers from serious noise and artifacts.The GAN results show much improvement, but distortion occurs in the synthesized sketches by this method.By comparison, the proposed method achieves the most vivid and clear sketches, reflecting the robustness of our proposed method., MRF [9], MWF [10], RSLCR [13], FCN [15], GAN [17], and the proposed method, respectively.
Overall, the synthesized face sketch images on the CUFS and CUFSF datasets by our method have the following superiorities compared to the benchmarked methods: 1) Less block artifacts and noise, because we calculated the target sketch patch by weighted averaging hundreds randomly sampled training sketch patches but not few nearest patches; 2) rich facial detail, due to the high-pass filtered components were adopted to build the joint training model, which is able to  [7], MRF [9], MWF [10], RSLCR [13], FCN [15], GAN [17], and the proposed method, respectively.
Overall, the synthesized face sketch images on the CUFS and CUFSF datasets by our method have the following superiorities compared to the benchmarked methods: (1) Less block artifacts and noise, because we calculated the target sketch patch by weighted averaging hundreds randomly sampled training sketch patches but not few nearest patches; (2) rich facial detail, due to the high-pass filtered components were adopted to build the joint training model, which is able to generate high-quality face sketch patches; (3) complete facial structures, as a result of the training sketch images were took into consideration when computing the reconstruction weights, which weakens the influence by the misalignment in training images.
Table 1 shows the average time consumption of different methods on different datasets.Here, only exemplar-based methods are compared, because it takes a very long time to train the neural network for deep learning-based methods, though the test time is quite fast once the model is trained.From Table 1, it can be seen that the LLE, MRF, and MWF methods have no scalability of training data.With the amplification of training data, the running time increased radically, such as the CUFSF dataset.The RSLCR and our proposed method were less susceptible to the size of training data, owing to the random sampling strategy.Although the proposed method is not the fastest, it still has comparable time consumption as the other methods.Face sketch recognition is commonly used to quantitatively evaluate the face sketch synthesis methods and to collectively compare the synthesized sketch images [8,15,26].A higher face sketch recognition rate means that the corresponding sketch synthesis method is more effective and the synthesized sketch images are better.In this study, FaceNet [27] was employed to conduct the face sketch recognition experiments.To demonstrate the recognition performance of the synthesized sketches using our proposed method, we used the sketches synthesized using different methods as probe images to match the gallery images, consisting of the corresponding artist-drawn sketches.The 338 synthesized sketches in the CUFS dataset were taken as the probe set, and the corresponding ground-truth sketches drawn by the artist were taken as the gallery set.For the CUFS dataset, 944 synthesized sketches were taken as the probe set and the corresponding sketches drawn by the artist were taken as the gallery set.
Figure 6 shows the face sketch recognition accuracies of FaceNet on the CUFS and the CUFSF dataset, respectively.The proposed method achieved the best accuracy on both datasets, 97.04% in CUFS and 87.18% in CUFS, at rank-100, respectively.Table 2 shows the rank-1, rank-5, and rank-10 recognition rates, where rank-n measures the accuracy of the top-n best matches.The FCN and GAN methods got higher recognition accuracy on CUFS dataset, but similar accuracy on CUFSF dataset compared with the traditional exemplar-based methods.It indicates that the performance of deep learning-based methods degrades when the dataset has challenging variations.The synthesized sketches of our method obtained the highest rate in rank-1, rank-5, and rank-10.The recognition results indicated that the better generated texture features and more detailed information preserved by the proposed method contributes to the face sketch recognition.This further demonstrates the superiority of our proposed method.
Appl.Sci.2019, 9, x FOR PEER REVIEW 9 of 11 generate high-quality face sketch patches; 3) complete facial structures, as a result of the training sketch images were took into consideration when computing the reconstruction weights, which weakens the influence by the misalignment in training images.
Table 1 shows the average time consumption of different methods on different datasets.Here, only exemplar-based methods are compared, because it takes a very long time to train the neural network for deep learning-based methods, though the test time is quite fast once the model is trained.From Table 1, it can be seen that the LLE, MRF, and MWF methods have no scalability of training data.With the amplification of training data, the running time increased radically, such as the CUFSF dataset.The RSLCR and our proposed method were less susceptible to the size of training data, owing to the random sampling strategy.Although the proposed method is not the fastest, it still has comparable time consumption as the other methods.

Face Sketch Recognition
Face sketch recognition is commonly used to quantitatively evaluate the face sketch synthesis methods and to collectively compare the synthesized sketch images [8,15,26].A higher face sketch recognition rate means that the corresponding sketch synthesis method is more effective and the synthesized sketch images are better.In this study, FaceNet [27] was employed to conduct the face sketch recognition experiments.To demonstrate the recognition performance of the synthesized sketches using our proposed method, we used the sketches synthesized using different methods as probe images to match the gallery images, consisting of the corresponding artist-drawn sketches.The 338 synthesized sketches in the CUFS dataset were taken as the probe set, and the corresponding ground-truth sketches drawn by the artist were taken as the gallery set.For the CUFS dataset, 944 synthesized sketches were taken as the probe set and the corresponding sketches drawn by the artist were taken as the gallery set. Figure 6 shows the face sketch recognition accuracies of FaceNet on the CUFS and the CUFSF dataset, respectively.The proposed method achieved the best accuracy on both datasets, 97.04% in CUFS and 87.18% in CUFS, at rank-100, respectively.Table 2 shows the rank-1, rank-5, and rank-10 recognition rates, where rank-n measures the accuracy of the top-n best matches.The FCN and GAN methods got higher recognition accuracy on CUFS dataset, but similar accuracy on CUFSF dataset compared with the traditional exemplar-based methods.It indicates that the performance of deep

Figure 1 .
Figure 1.Illustration of the proposed joint model for face sketch synthesis.

Figure 1 .
Figure 1.Illustration of the proposed joint model for face sketch synthesis.

Figure 2 .
Figure 2. Example of face sketch-photo pairs in the CUFS dataset (first two rows) and the CUFSF dataset (last two rows).The first and the third row are face photos and the second and the last rows are corresponding face sketches drawn by the artist.
For fair comparison, we employed the same cross validation technique used in most exemplar-based sketch synthesis works.For the CUFS dataset, 88 face photo-sketch pairs were taken as the training dataset and the remaining 100 pairs were taken for test in the CUHK student dataset; 80 pairs were chosen for training and the remaining 43 pairs for test in AR dataset; 100 pairs for training and the remaining 195 pairs for test in XM2VTS dataset.For the CUFSF dataset, 250 face photo-sketch pairs were randomly chosen for training and the rest of the 944 pairs for test.Parameters were set as follow.Patch size was p = 20, overlap size was o = 14, search length was c

Figure 2 .
Figure 2. Example of face sketch-photo pairs in the CUFS dataset (first two rows) and the CUFSF dataset (last two rows).The first and the third row are face photos and the second and the last rows are corresponding face sketches drawn by the artist.

4. 2 .
Experimental Setting In this section, the data distribution and parameter settings are introduced.For fair comparison, we employed the same cross validation technique used in most exemplar-based sketch synthesis works.For the CUFS dataset, 88 face photo-sketch pairs were taken as the training dataset and the remaining 100 pairs were taken for test in the CUHK student dataset; 80 pairs were chosen for training and the remaining 43 pairs for test in AR dataset; 100 pairs for training and the remaining 195 pairs for test in XM2VTS dataset.For the CUFSF dataset, 250 face photo-sketch pairs were randomly chosen for training and the rest of the 944 pairs for test.

Figure 4 .
Figure 4. Local cut-out effect on Figure 3; same marshalling sequence as Figure 3.

Figure 4 .
Figure 4. Local cut-out effect on Figure 3; same marshalling sequence as Figure 3.

Figure 4 .
Figure 4. Local cut-out effect on Figure 3; same marshalling sequence as Figure 3.

Figure 6 .
Figure 6.Face sketch recognition accuracies of FaceNet on the CUFS (a) and the CUFSF (b) datasets.

Figure 6 .
Figure 6.Face sketch recognition accuracies of FaceNet on the CUFS (a) and the CUFSF (b) datasets.

Table 1 .
Average running time (s) to generate one sketch by different methods.

Table 1 .
Average running time (s) to generate one sketch by different methods.