Sketch Synthesis with Flowpath and VTF

Kim, Junho; Yang, Heekyung; Min, Kyumgha

doi:10.3390/electronics14142861

Open AccessArticle

Sketch Synthesis with Flowpath and VTF

by

Junho Kim

¹,

Heekyung Yang

^2,*,†

and

Kyumgha Min

^1,*,†

¹

Department of Computer Science, Sangmyung University, Seoul 03016, Republic of Korea

²

Department of Software, Sangmyung University, Cheonan 31066, Republic of Korea

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Electronics 2025, 14(14), 2861; https://doi.org/10.3390/electronics14142861

Submission received: 31 May 2025 / Revised: 11 July 2025 / Accepted: 15 July 2025 / Published: 17 July 2025

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

We present a novel sketch generation scheme from an image using the flowpath and the value through the flow (VTF). The first stage of our scheme is to produce grayscale noisy sketch using a deep learning-based approach. In the second stage, the unclear contour and unwanted noises in the grayscale noisy sketch are resolved using our flowpath and VTF-based schemes. We build a flowpath by integrating the tangent flow extracted from the input image. The integrated tangent flow produces a strong clue for the salient contour of the shape in the image. We further compute VTF by sampling values through the flowpath to extract line segments that correspond to the sketch stroke. By combining the deep learning-based approach and VTF, we can extract salient sketch strokes from various images that suppresses unwanted noises. We demonstrate the excellence of our scheme by generating sketches from various images including portrait, landscape, objects, animals, and animation scenes.

Keywords:

sketch synthesis; flowpath; value through flow; tangent flow; infodraw; noisy sketch

1. Introduction

In the long history of fine arts, artists express the shape of their target including a person, a landscape, or objects using a sketch, which conveys the overall geometric structure of the targets [1]. Drawing a sketch is an act of placing proper strokes on the important shape of the target using various artistic tools such as brushes, pens, or charcoals. Therefore, most of the sketch produces a series of lines that depict the shape of their targets.

Sketches can be represented at various levels of abstraction, which is determined from the contents of their targets [2,3]. The levels of abstraction range from simple and minimal forms that convey only the essential shape to highly detailed and complex drawings that capture as much information as possible. Singer et al. [4] distinguishes a descriptive sketch as a drawing from a simple sketch. Various datasets on sketches reflect this level of details. For instance, the Quick, Draw! dataset [5] consists of simplistic, doodle-like sketches. In contrast, the dataset provided by APDrawingGAN [6] contains highly detailed and high-quality portrait sketches. The first row of Figure 1 illustrates the spectrum of sketches ranging from doodle level to descriptive level. The second row of Figure 1 presents datasets containing sketch images at both the doodle and descriptive levels.

From the early days of computer vision and computer graphics, various studies have focused on extracting sketches from images. The Canny edge detection algorithm [8] employs image gradients to detect these edges, which correspond to significant features of an image. However, Canny edge produces a lot of unwanted artifacts, since it is sensitive to noise and the intensity of the original image. To address these limitations, subsequent research has incorporated not only gradient information but also line-based features. Kang et al. [9,10] utilized edge tangent flow (ETF) to compute the flowpath, enhancing the line components of sketches. However, the sketches produced by these methods still heavily depends on image intensity variations.

The recent progress of deep learning techniques has accelerated sketch generation studies, and a series of important studies has been presented [2,3,6,11,12,13,14,15,16,17,18,19,20]. However, many of them can only generate doodle-level outputs [2,11,12,13] or show limitations in distinguishing line components from the backgrounds [6,16,17,18,19,20].

A line drawing-based sketch, which is composed of strokes, is required to exhibit three key properties: First, the sketch should sample the most prominent regions of an input image. Second, the sketch must represent the structure of the depicted object in a linear form. Third, the sketch should be represented in a binary format, where each pixel indicates the presence or absence of a sketch component.

Recent deep learning-based approaches have successfully generated sketch-similar images represented in grayscale. However, these noisy sketch images differ from our desired sketch representation, since the pixels in the noisy sketch encode the probability of belonging to a sketch within a range between 0 and 1. Consequently, the noisy sketch does not conform to line drawing-based sketches and are thus insufficient for many practical applications.

Another serious drawback of deep learning-based approaches is that their results heavily depend on the training datasets of their models. For example, models trained with photographs and tonal sketches have limitations in producing line drawing-based sketches. Furthermore, they have drawbacks in producing sketches for images of certain categories if their training set does not contain sufficient images of the category. For example, they show excellent results for portraits but can show poor results on other categories such as landscapes and objects.

We present a two-stage sketch generation scheme as a process of determining whether each pixel in an input image belongs to a sketch or not. Our target sketch is defined as a binary image whose pixels are classified to belong to our target sketch. In the first stage, we generate a noisy sketch by estimating the likelihood of belonging to a sketch for the pixels in the input image. The values of the pixels in the noisy sketch denote the probability of the pixel of belonging to our target sketch. To classify whether the pixels in the noisy sketch belong to our target sketch, we devise the second stage that employs wide-pass networks that determine whether the pixel belongs to our target sketch or not.

In the first stage, a deep learning-based method is used to compute the likelihood of each pixel being part of the sketch, producing a noisy sketch as an intermediate result. In the second stage, to determine whether each pixel in the noisy sketch belongs to the final sketch, information is collected from its neighboring pixels. This process utilizes a flowpath, which samples pixels along the tangent flow of the image. The concept of flowpath was first introduced in Line Integral Convolution (LIC) by Cabral and Leedom [21]. Later, Kang et al. [9] extended this idea to construct coherent lines by computing the tangent flow, averaging pixel values along the flow, and thresholding the averaged values.

Most deep learning-based models employ isotropic filters to process their images. This property can show a limitation in generating a sketch that has anisotropic geometry. However, anisotropic information may cause the loss of geometric information when generating a sketch. Therefore, we apply deep learning-based models that employ isotropic information for generating noisy sketch in the first stage and apply a noise classifier that employs anisotropic information gathered using flowpath and VTF for finalizing our sketch generation process.

While our method also employs the tangent flow and flowpath, we adopt a novel approach by sampling pixels along the flowpath and feeding them into a multi-layered wide-pass network to determine the sketch membership. This key distinction is illustrated in Figure 2. Unlikely previous methods that rely on averaging pixel values along the flowpath, our approach leverages the sampled pixel values to make a direct classification decision regarding the sketch membership.

The advantage of using a wide-pass network, a type of deep neural network, lies in its ability to determine sketches in a more stable manner. The method proposed by Kang et al. [9], which applies thresholding, exhibits low stability because the resulting sketch varies significantly depending on the threshold value. In contrast, the wide-pass network can sketch determinations, which enables more stable and accurate sketch generation.

2. Related Work

Sketch generation, which has been a longstanding research topic, can be categorized into two approaches: traditional methods [8,9,10,14,22] and deep learning-based methods [2,3,11,13,20,23,24,25,26,27,28,29]. Deep learning-based methods typically generate sketches from input images or input texts. The generated sketches are represented as either raster sketches or vector sketches.

2.1. Raster Sketch Synthesis

Traditional methods for sketch extraction rely on gradient values from the original image. This approach was first introduced in the Canny edge detection algorithm [8]. This scheme is expanded to include methods that compute gradients such as the difference of Gaussian (DoG) [9,10,22]. Kang et al. [9] further enhanced this approach by employing edge tangent flow (ETF) maps based on gradients and generated sketches that consider the extracted edge flow. Arbelaez et al. [14] observed a close relationship between contour detection and image segmentation to address both problems simultaneously.

Studies on raster sketch generation often employ the image style transfer framework proposed by Gatys et al. [16]. They also employ models like Pix2pix [30] or CycleGAN [17]. Frameworks such as UNIT [31], MUNIT [32], and U-GAT-IT [33] provide more generalized style transfer capabilities, which can be used for sketch generation. Furthermore, models like APDrawingGAN [6], MangaGAN [18], Kim et al.’s model [19], and InfoDraw [20] are specifically designed for sketch generation. Additionally, sketch generation can employ diffusion models [34,35,36]. DALS [37] utilizes Stable Diffusion [38] and ControlNet [39] to produce landscape sketches.

2.2. Vector Sketch Synthesis

SketchRNN [11], which utilizes RNNs for sketch synthesis, is one of the most important studies in vector sketch generation. From this work, subsequent research has focused on generating vector sketches from images [24,40], as well as using reinforcement learning for vector sketch generation [26,27]. These approaches can be categorized as stroke-by-stroke methods, since they iteratively generate individual strokes to produce a final sketch.

In contrast to the stroke-by-stroke methods, optimization-based approaches generate sketches by randomly distributing strokes and refining them through optimization. They include DiffSketch [13], which employs diffusion models, as well as CLIPasso [2], CLIPDraw [25], and CLIPascene [3], which employ CLIP [12].

Additionally, studies [28,41,42,43,44] employ differentiable renderers [45] to represent images directly as vectors. Some studies [46,47] convert raster sketches into vector sketches.

2.3. Sketch-Based Application

Sketches are utilized in various applications. Sketch-based image synthesis methods [48,49,50] generate images from sketches. Sketch-based image retrieval methods [51,52,53,54] employ sketches to search images. Some studies [1] employ sketches for robot control.

A sketch represents lines with semantic information. The Segment Anything study [15] demonstrated superior performance by generating sketches from predicted semantic segmentation maps and comparing them with other sketch generation models. The sketches generated by Segment Anything contain semantic information, which closely resembles human-drawn sketches. Additionally, Wang et al. [55] showed that sketches capture salient features of an image, effectively utilizing this information within an embedding space.

2.4. Dataset

The datasets used in many sketch studies can be categorized as follows: Doodle-level datasets include Quick, Draw! [5], Sketchy [23], SketchyScene [56], SketchyCOCO [57], and FS-COCO [58]. Descriptive-level datasets are provided by Wang et al. [59], APDrawingGAN [6], and Yun et al. [60]. Datasets related to animation include Creative Flow+ [61] and Danbooru [62].

3. Method

3.1. Overview

Our sketch generation framework consists of two modules: the Noisy Sketch Generator and a noise classifier. Our framework is illustrated in Figure 3. The Noisy Sketch Generator samples prominent regions of the image to generate a noisy sketch, extracts a linear structure extracted from the input image, and combines them to compute the value through flow (VTF). The VTF represents the probability of each pixel belonging to the sketch. The noise classifier classifies pixels in the noisy sketch into a noise pixel and a sketch pixel. This module, composed of multiple wide-pass networks, is applied to the VTF to generate a line drawing-based binary-format sketch. The structure of the wide-pass networks are presented in the bottom of Figure 3. This network has two

5 \times 5

convolutional layers with ReLU activation functions. The output wide-pass network is distinguished from the input wide-pass network by an additional

1 \times 1

convolutional layer. In our framework, the InfoDraw method [20] is employed to extract prominent regions from the image, while the flowpath method [9], computed from the image’s tangent flow, is utilized to extract its linear structure.

3.2. Edge Tangent Flow

We employ a flowpath [9] to build the line component of an input image. The computation of the flowpath comes from the edge tangent flow (ETF) map that builds a smooth flow map on an image by smoothing the tangent vectors of an image. The tangent vectors are estimated from gradient vectors, which are defined as follows:

G (x) = \nabla I (x),

(1)

where

G

and

I

denote gradient and an input image, respectively.

x

is the position of a pixel.

T_{0} (x)

, the edge tangent, is initialized as the tangent of the gradient, which is estimated by cross-producting

G (x)

and the z-axis. We compute

T_{N}

, the N-th edge tangent, using the following formula:

T_{i + 1} (x) = \sum_{n \in N} D (x, n) w (x, n) T_{i} (x),

(2)

where N does not exceed three.

N

is the neighbor pixels of

x

. D and w are estimated as follows:

D (x_{1}, x_{2}) = \{\begin{matrix} 1 & if T_{i} (x_{1}) \cdot T_{i} (x_{2}) > 0 \\ - 1 & else \end{matrix}

(3)

w (x_{1}, x_{2}) = \frac{1}{2} \times 1 (| | x_{1} - x_{2} | | < r) \times (1 + \tan h (η (\hat{G} (x_{1}) - \hat{G} (x_{2}))) \times | T (x 1) \cdot T (x_{2}) |,

(4)

where

\hat{G}

is the normalized gradient map. Finally, the ETF map is formulated as follows:

ETF (I) = {\hat{T}}_{N},

(5)

where

\hat{T}

is the normalized vector.

3.3. Flowpath and VTF

The ETF map enables a pixel’s position to flow along the tangent direction at the pixel to the next coordinate. We set the step size for each movement as 1. A flowpath at pixel

x

, which is denoted as

F (x)

, is constructed by repeatedly flowing to the next coordinate. To prevent the path from being excessively long, we constrain the maximum length,

T_{s}

, as 21.

F (x)

, the flowpath at

x

, is defined as follows:

F (x) = (f_{- k}, \dots, f_{- 1}, x, f_{1}, \dots, f_{k}),

(6)

where k is 10, and

f_{i}

is a coordinate in the image. The details of this process are illustrated in Figure 4 and explained in Algorithm 1.

Algorithm 1 Algorithm for flowpath

F

Require:: image $I$ , ETF map $ETF$
Ensure:: Flowpath $F$
1:: initialization:
2:: $ϵ \leftarrow 1 \times 10^{- 6}$
3:: $Δ s \leftarrow 1$
4:: Set threshold for length $T_{S}$ and angle $T_{A}$
5:: Computing normalized ETF map $\hat{ETF}$
6:: for At each pixel $x = (y, x) \in I$ do
7:: if $∥ \hat{ETF} (x) ∥ < ϵ$ then
8:: continue
9:: $F (x) \leftarrow {f_{x}^{(- T_{S} / 2)}, \dots, f_{x}^{(0)}, \dots, f_{x}^{(T_{S} / 2)}} \in R^{T_{S} \times 2}$
10:: $f_{x}^{(0)} \leftarrow x$
▹Forward Tracing
11:: for $s = 0$ to $T_{S} / 2 - 1$ do
12:: $θ \leftarrow arccos (\hat{ETF} (f_{x}^{(s)}) \cdot \hat{ETF} (f_{x}^{(s)} + Δ s \cdot \hat{ETF} (f_{x}^{(s)})))$
13:: if $θ > T_{A}$ then
14:: $f_{x}^{(s + 1)} \leftarrow f_{x}^{(s)} + Δ s \cdot \hat{ETF} (f_{x}^{(s)})$
15:: else
16:: for For the remaining s do
17:: $f_{x}^{(s)} \leftarrow (- 1, - 1)$
▹Backward Tracing
18:: for $s = 0$ to $- T_{S} / 2 + 1$ do
19:: $θ \leftarrow arccos (\hat{ETF} (f_{x}^{(s)}) \cdot \hat{ETF} (f_{x}^{(s)} - Δ s \cdot \hat{ETF} (f_{x}^{(s)})))$
20:: if $θ > T_{A}$ then
21:: $f_{x}^{(s - 1)} \leftarrow f_{x}^{(s)} - Δ s \cdot \hat{ETF} (f_{x}^{(s)})$
22:: else
23:: for For the remaining s do
24:: $f_{x}^{(s)} \leftarrow (- 1, - 1)$

The line components of a sketch are highly likely to be located along the computed flowpath. These components can be regarded as the intensities of the image along the flowpath. The intensity values along the flowpath are defined as the values through the flowpath (VTFs). Given the flowpath

F

and a reference image

R

, the

VTF

is computed as follows:

VTF (F (x), R) = (R (f_{- k}), \dots, R (f_{- 1}), R (x), R (f_{1}), \dots, R (f_{k})),

(7)

where

f_{i}

belongs to

F (x)

. In this formula,

f

is a two-dimensional vector that specifies the position of the sampled pixel, while

F

is

T_{s}

-dimensional vector whose values correspond to

R (f_{i})

. The VTF is calculated by interpolating the values of the pixels in a noisy sketch through a flowpath. We apply the reflection padding scheme for the border pixels in the flowpath.

3.4. Noisy Sketch Generator

The advance of deep learning accelerates various studies on sketch generation [2,3,11,13,20,23,24,25,26,27,28,29]. These studies produce results that are aesthetically and semantically superior to traditional sketch generation methods. However, a notable limitation of the deep learning-based schemes is that their outputs are grayscale images with values ranging between 0 and 1. Note that an ideal sketch is composed of 1, which represents a line, or 0, which represents the background. Even though we apply binarization schemes on these results, they still remain noisy.

We address this issue by generating noisy sketches using a pretrained deep learning model and then designing a classifier to filter out the noise. To generate noisy sketches, we employ InfoDraw [20], a general-purpose sketch generation model. InfoDraw is trained on an unpaired image–sketch dataset, which allows a relatively large quantity of sketches. Additionally, its training process incorporates both semantic and geometric information, achieved using CLIP [12] and depth maps, respectively. To the best of our knowledge, InfoDraw represents the state-of-the-art model for generating raster sketches from images.

3.5. Noise Classifier

We generate binary sketches composed of 0 s and 1 s that reflect both line and semantic information from images. For this purpose, we devise a module to classify each pixel as either part of the sketch or not using the VTF and the input image. This module is then trained properly.

The training dataset is composed of our in-house dataset, which includes 1000 simple shape images. This dataset includes approximately 400 object sketches, 100 portrait sketches, and 500 food sketches. These sketches are created by artists. Additionally, sketches from DifferSketching [7] and SKSF-A [60] are exploited, resulting in a total of approximately 2000 image–sketch pairs used for training.

Since the sketch dataset is restricted, the architecture of our module is designed with a shallow structure to achieve better generalization performance. Consequently, we propose a wide-pass network with a non-bottleneck structure.

The input to this module includes the noisy raster sketch generated by the noisy raster sketch generator, the VTF is extracted using this noisy raster sketch, as well as the original image. When the VTF is extracted using the noisy raster sketch, it captures the intensity values along the flowpath of the noisy raster sketch. If the line components are strong, the VTF values will closely correspond to sketch elements; otherwise, they will represent background components. Thus, the VTF serves as an intuitive feature that determines whether a pixel along the flowpath belongs to the sketch.

Since the sketch is sparse, our module does not adopt a conventional encoder–decoder structure with a bottleneck layer. Instead, a shallow and wide architecture is employed. Features are extracted using separate CNN structures for the original image and the VTF, and these features are concatenated and processed further through the CNN.

4. Implementation and Results

We implemented our model in a cloud platform with Intel Xeon Pentium 8480 and nVidia H100 80 GB. We produce line drawing-based sketches of drawing quality for images of various categories including portraits, landscapes, objects, animation scenes, and animals. Our results are presented in the following figures: portraits in Figure 5, landscapes in Figure 6, animals in Figure 7, animation images in Figure 8, and objects in Figure 9.

5. Comparison and Evaluation

5.1. Comparison

We compare our results with three important studies [23,63,64] and two commercial sketch generation services [65,66]. Note that two commercial sketch generation services produce tonal sketches rather than line-drawing sketches. Important studies such as CLIPasso [2] and DiffSketcher [13] are not considered, since CLIPasso aims to generate abstract and simplified sketches, emphasizing visual simplification and artistic style rather than realistic representation or precise structural reproduction. DiffSketcher employs a text-conditioned generation approach, achieving a higher level of detail depiction than CLIPasso while still prioritizing abstraction and conceptual expression. Due to these differences these studies are excluded for the comparison of our study.

We compare the generated sketches in three categories: portrait, landscape, and others. Figure 10, Figure 11 and Figure 12 illustrate the images we compare. From these comparisons, we execute two evaluations: a quantitative evaluation and a qualitative evaluation.

5.2. Quantitative Evaluation

It is very important to present proper evaluation metrics for sketch generation studies. In many sketch generation studies, the PSNR (peak signal-to-noise ratio) [63,64], FID (Frechet inception distance) [6,19,63,64], LPIPS (Learned Perceptual Image Patch Similarity) [63,64], and CLIP (Contrastive Language-Image Pretraining) score [20] are used. We estimate these metrics in the images of three categories illustrated in Figure 10, Figure 11 and Figure 12. Table 1 presents the quantitative comparison results across various existing methods. However, we also have a concern that these metrics are not perfectly suitable for sketch evaluations. Therefore, we also execute a user study for a qualitative evaluation of our results.

As shown in Table 1, our method achieves two best scores and one second-best score in the portrait category (Figure 10), one best score and three second-best scores in the landscape category (Figure 11), and three best scores and one second-best score in the other category (Figure 12).

Overall, our framework demonstrates superior performance compared to three key recent studies [23,63,64]. While ours performs similarly to Seo et al. [64] in the portrait category, ours outperforms Seo et al. [64] in the landscape and other categories. A commercial service, BeFunky [65], that produces tonal sketch produces visually high-quality results and achieves higher scores than our method in many metrics.

Sketch generation techniques based on tonal sketches, such as BeFunky and Fotor, tend to produce more visually pleasing results compared to line drawing-based approaches including ours. Moreover, PSNR, FID, and CLIPScore, which are the evaluation metrics employed in our study, tend to assign higher scores to results that closely resemble realistic images. Consequently, the results generated by BeFunky, which utilizes both lines and tones, tend to achieve higher evaluation scores than our method, which relies solely on line drawing style.

5.3. Qualitative Evaluation

We conducted a user study to qualitatively evaluate our method by comparing our results with those of existing studies [23,63,64,65,66]. A total of 20 participants were recruited, consisting 18 individuals in their 20 s and 2 in their 30 s. The participant pool included 12 females and 8 males.

The participants were asked to evaluate the following criteria using a 10-point scale:

Q1: Identity Preservation
Evaluate how well the identity of the original image is maintained in the sketch image on a 10-point scale.
A score of 10 indicates that the identity is well preserved, whereas a score of 1 suggests poor preservation.
Q2: Artifact Suppression
Evaluate the level of artifact suppression in the sketch image on a 10-point scale.
A score of 10 signifies minimal artifacts, while a score of 1 indicates a high presence of artifacts.
Q3: Line Drawing Appearance
Evaluate the extent to which the sketch resembles a line drawing on a 10-point scale.
A score of 10 implies a strong resemblance to a line drawing, whereas a score of 1 suggests that the sketch appears more like a tonal sketch than a line drawing.

Participants were presented with sketch images alongside those produced by five related studies in Figure 10, Figure 11 and Figure 12. They evaluated each technique based on the three questions described above, assigning scores accordingly. The images shown to participants were categorized into three groups: portrait (Figure 10), landscape (Figure 11), and others (Figure 12). Table 2 presents the average scores assigned by participants for each group.

The results can be summarized as follows:

Our method ranked third in terms of identity preservation and artifact suppression in the portrait category.
Our method ranked second for identity preservation and artifact suppression among the tested approaches in the landscape and other categories. The commercial tool BeFunky outperformed our method in these aspects.
Our method achieved the highest score across all categories in resembling line drawings. While BeFunky [65] demonstrated strong performance in various aspects, its generated sketches resemble tonal sketches rather than line drawings, making it less aligned with the objective of our study.

For the analysis of our qualitative study, we conducted an ANOVA analysis, which is commonly used for comparing multiple datasets, to evaluate the performance of our framework in three categories: portrait, landscape, and other. The obtained p-values were 0.0187 for portrait, 0.0209 for landscape, and 0.0020 for other, indicating a statistically significant difference among the six compared methods including ours.

Additionally, to assess whether the score differences between the highest-performing methods, Seo et al. [64], and BeFunky [65], we performed a t-test and present its results in Table 3. Although these two methods do not produce line drawing-based sketches, which are the focus of this study, they generate the most high-quality tonal sketches, making them relevant for comparison.

In Table 3, “invalid” indicates that the difference between our results and other methods is not statistically significant, meaning that our method performs at a similar level to other techniques. Conversely, “valid” denotes a statistically significant score difference. These results demonstrate that our method achieves competitive performance when compared to the best-performing research-based method [64] and commercial service [65].

Our results demonstrate superior scores in 11 out of 18 evaluation metrics compared to the compared studies [64,65]. Among these, our method exhibits statistically significant improvements in eight metrics. Conversely, in the seven metrics where our scores are lower than those of the compared studies [64,65], no statistically significant differences were observed. Therefore, we conclude that our approach outperforms existing methods.

5.4. Ablation Study

We analyze the effects of removing the VTF and image information. Figure 13 illustrates the results, and Table 4 presents the results of the ablation study.

Our ablation study evaluates the impact of individual components by performing a component-wise ablation on the image presented in Figure 13a and comparing the resulting outputs. In the first ablation study, we apply thresholding to the noisy sketch generated by the Noisy Sketch Generator (See Figure 13b). In the second study, we apply the noisy classifier to the input image, which is combined with the flowpath (see Figure 13c). In the third study, we extract the VTF by applying the flowpath to the noisy sketch and then apply the noisy classifier to the extracted VTF without considering the input image (see Figure 13d). In the final study, we incorporate both the input image and the VTF to generate the final result (see Figure 13e).

To quantitatively assess the impact of these variations, we measure PSNR, FID, CLIPScore, and LPIPS for the four generated images in Figure 13. The results are presented in Table 4. As shown in the table, the best performance is achieved when both the input image and the VTF are considered, demonstrating the importance of utilizing both components in the proposed method.

As shown in Table 4, the method using only the VTF (Ours (VTF)) achieved higher performance than the method that uses only image information (Ours (Image)). However, incorporating image information with VTF (Ours (VTF+image)) records improved performance across all metrics. These results demonstrate the synergistic effect of combining the VTF and image information in sketch generation.

5.5. Limitation

Our framework combines a noisy sketch generated from a deep learning-based generator with a flowpath extracted from the input image. Therefore, our framework effectively suppresses noise while generating a clear line drawing-style sketch. However, our approach has certain limitations. Since our method aims to produce a line drawing-style sketch, it tends to generate strokes that are excessively dense and complex. This limitation requires further research on optimizing stroke distribution to achieve a more balanced and visually pleasing sketch representation. Furthermore, the generated sketches do not sufficiently account for the semantic information of the input image. A more refined approach that differentiates between the foreground and background and analyzes the structural components of the image in greater detail is required to enhance sketch generation quality.

6. Conclusions and Future Work

We propose a method for generating clear and salient line drawing-based sketches while reducing unwanted noise that appears on the sketches. Our approach employs a deep learning-based model to generate noisy sketches from input images. From the input image, the edge flow is averaged to extract the flowpath. Subsequently, integration along this flowpath is performed to compute the VTF, which produces salient and noise-reduced sketches.

The sketches generated using our method have broad applications. Users can manipulate the vectorized sketches to control the shapes in the images. This capability facilitates various tasks, such as creating animations or webtoons, with greater ease. Additionally, the method can be utilized to control appropriate inputs for generative models, enabling more effective utilization in image generation workflows.

Author Contributions

Conceptualization, J.K.; Methodology, J.K.; Validation, H.Y.; Writing—original draft, K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Sangmyung University at 2023.

Data Availability Statement

The datasets presented in this article are not readily available because they are part of an ongoing study. Requests to access the datasets should be directed to yanghk@smu.ac.kr. The copyright of the photographs used in our manuscript falls under “Fair Use”.

Conflicts of Interest

The authors declare no conflict of interest.

References

Sakamoto, D.; Honda, K.; Inami, M.; Igarashi, T. Sketch and run: A stroke-based interface for home robots. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, Boston, MA, USA, 4–9 April 2009; pp. 197–200. [Google Scholar]
Vinker, Y.; Pajouheshgar, E.; Bo, J.Y.; Bachmann, R.C.; Bermano, A.H.; Cohen-Or, D.; Zamir, A.; Shamir, A. Clipasso: Semantically-aware object sketching. ACM Trans. Graph. (TOG) 2022, 41, 86. [Google Scholar] [CrossRef]
Vinker, Y.; Alaluf, Y.; Cohen-Or, D.; Shamir, A. Clipascene: Scene sketching with different types and levels of abstraction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–6 October 2023; pp. 4146–4156. [Google Scholar]
Singer, J.; Seeliger, K.; Kietzmann, T.; Hebart, M. From photos to sketches—How humans and deep neural networks process objects across different levels of visual abstraction. J. Vis. 2022, 22, 4. [Google Scholar] [CrossRef] [PubMed]
Jongejan, J.; Rowley, H.; Kawashima, T.; Kim, J.; Fox-Gieg, N. A.I. Experiments: Quick, Draw! Available online: https://quickdraw.withgoogle.com/ (accessed on 14 July 2025).
Yi, R.; Liu, Y.J.; Lai, Y.K.; Rosin, P.L. Apdrawinggan: Generating artistic portrait drawings from face photos with hierarchical gans. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 10743–10752. [Google Scholar]
Xiao, C.; Su, W.; Liao, J.; Lian, Z.; Song, Y.Z.; Fu, H. DifferSketching: How Differently Do People Sketch 3D Objects? ACM Trans. Graph. 2022, 41, 264. [Google Scholar] [CrossRef]
Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 1986, PAMI-8, 679–698. [Google Scholar] [CrossRef]
Kang, H.; Lee, S.; Chui, C.K. Coherent line drawing. In Proceedings of the 5th International Symposium on Non-Photorealistic Animation and Rendering, San Diego, CA, USA, 4–5 August 2007; pp. 43–50. [Google Scholar]
Kang, H.; Lee, S.; Chui, C.K. Flow-based image abstraction. IEEE Trans. Vis. Comput. Graph. 2008, 15, 62–76. [Google Scholar] [CrossRef] [PubMed]
Ha, D.; Eck, D. A neural representation of sketch drawings. arXiv 2017, arXiv:1704.03477. [Google Scholar]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
Xing, X.; Wang, C.; Zhou, H.; Zhang, J.; Yu, Q.; Xu, D. DiffSketcher: Text-Guided Vector Sketch Synthesis through Latent Diffusion Models. arXiv 2023, arXiv:2306.14685. [Google Scholar]
Arbelaez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 898–916. [Google Scholar] [CrossRef] [PubMed]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. arXiv 2023, arXiv:2304.02643. [Google Scholar] [PubMed]
Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
Su, H.; Niu, J.; Liu, X.; Li, Q.; Cui, J.; Wan, J. Mangagan: Unpaired photo-to-manga translation based on the methodology of manga drawing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada (virtual), 2–9 February 2021; Volume 35, pp. 2611–2619. [Google Scholar]
Kim, H.; Kim, J.; Yang, H. Portrait Sketch Generative Model for Misaligned Photo-to-Sketch Dataset. Mathematics 2023, 11, 3761. [Google Scholar] [CrossRef]
Chan, C.; Durand, F.; Isola, P. Learning To Generate Line Drawings That Convey Geometry and Semantics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 19–24 June June 2022; pp. 7915–7925. [Google Scholar]
Cabral, B.; Leedem, L.C. Imaging vector fields using line integral convolution. In Proceedings of the Siggraph 1993, Anaheim, CA, USA, 2–6 August 1993; pp. 263–270. [Google Scholar]
Winnemöller, H.; Kyprianidis, J.E.; Olsen, S.C. XDoG: An eXtended difference-of-Gaussians compendium including advanced image stylization. Comput. Graph. 2012, 36, 740–753. [Google Scholar] [CrossRef]
Li, M.; Lin, Z.; Mech, R.; Yumer, E.; Ramanan, D. Photo-sketching: Inferring contour drawings from images. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa Village, HI, USA, 7–11 January 2019; pp. 1403–1412. [Google Scholar]
Chen, Y.; Tu, S.; Yi, Y.; Xu, L. Sketch-pix2seq: A model to generate sketches of multiple categories. arXiv 2017, arXiv:1709.04121. [Google Scholar]
Frans, K.; Soros, L.; Witkowski, O. Clipdraw: Exploring text-to-drawing synthesis through language-image encoders. Adv. Neural Inf. Process. Syst. 2022, 35, 5207–5218. [Google Scholar]
Muhammad, U.R.; Yang, Y.; Song, Y.Z.; Xiang, T.; Hospedales, T.M. Learning deep sketch abstraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 8014–8023. [Google Scholar]
Zhou, T.; Fang, C.; Wang, Z.; Yang, J.; Kim, B.; Chen, Z.; Brandt, J.; Terzopoulos, D. Learning to doodle with deep q networks and demonstrated strokes. In Proceedings of the British Machine Vision Conference, Newcastle upon Tyne, UK, 3–6 September 2018; Volume 1, p. 4. [Google Scholar]
Jain, A.; Xie, A.; Abbeel, P. Vectorfusion: Text-to-svg by abstracting pixel-based diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1911–1920. [Google Scholar]
Bhunia, A.K.; Khan, S.; Cholakkal, H.; Anwer, R.M.; Khan, F.S.; Laaksonen, J.; Felsberg, M. Doodleformer: Creative sketch drawing with transformers. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 338–355. [Google Scholar]
Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
Liu, M.Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst. 2017, 30, 700–708. [Google Scholar]
Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 10–13 September 2018; pp. 172–189. [Google Scholar]
Kim, J. U-gat-it: Unsupervised generative attentional networks with adaptive layer-instance normalization for image-to-image translation. arXiv 2019, arXiv:1907.10830. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Song, J.; Meng, C.; Ermon, S. Denoising diffusion implicit models. arXiv 2020, arXiv:2010.02502. [Google Scholar]
Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2020, arXiv:2011.13456. [Google Scholar]
Kim, J.; Yang, H.; Min, K. DALS: Diffusion-Based Artistic Landscape Sketch. Mathematics 2024, 12, 238. [Google Scholar] [CrossRef]
Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar]
Zhang, L.; Rao, A.; Agrawala, M. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, France, 2–13 October 2023; pp. 3836–3847. [Google Scholar]
Song, J.; Pang, K.; Song, Y.Z.; Xiang, T.; Hospedales, T.M. Learning to sketch with shortcut cycle consistency. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 801–810. [Google Scholar]
Huang, Z.; Heng, W.; Zhou, S. Learning to paint with model-based deep reinforcement learning. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Ma, X.; Zhou, Y.; Xu, X.; Sun, B.; Filev, V.; Orlov, N.; Fu, Y.; Shi, H. Towards layer-wise image vectorization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 16314–16323. [Google Scholar]
Reddy, P.; Gharbi, M.; Lukac, M.; Mitra, N.J. Im2vec: Synthesizing vector graphics without vector supervision. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual (Nashville, TN, USA), 19–25 June 2021; pp. 7342–7351. [Google Scholar]
Liu, S.; Lin, T.; He, D.; Li, F.; Deng, R.; Li, X.; Ding, E.; Wang, H. Paint transformer: Feed forward neural painting with stroke prediction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 6598–6607. [Google Scholar]
Li, T.M.; Lukáč, M.; Gharbi, M.; Ragan-Kelley, J. Differentiable vector graphics rasterization for editing and learning. ACM Trans. Graph. (TOG) 2020, 39, 1–15. [Google Scholar] [CrossRef]
Bhunia, A.K.; Chowdhury, P.N.; Yang, Y.; Hospedales, T.; Xiang, T.; Song, Y.Z. Vectorization and Rasterization: Self-Supervised Learning for Sketch and Handwriting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Virtual (Nashville, TN, USA), 19–25 June 2021. [Google Scholar]
Su, H.; Liu, X.; Niu, J.; Cui, J.; Wan, J.; Wu, X.; Wang, N. MARVEL: Raster Gray-level Manga Vectorization via Primitive-wise Deep Reinforcement Learning. IEEE Trans. Circuits Syst. Video Technol. 2023, 34, 2677–2693. [Google Scholar] [CrossRef]
Wang, S.Y.; Bau, D.; Zhu, J.Y. Sketch your own gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual Conference with Main Sessions, Montreal, QC, Canada, 11–17 October 2021; pp. 14050–14060. [Google Scholar]
Bashkirova, D.; Lezama, J.; Sohn, K.; Saenko, K.; Essa, I. Masksketch: Unpaired structure-guided masked image generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1879–1889. [Google Scholar]
Koley, S.; Bhunia, A.K.; Sain, A.; Chowdhury, P.N.; Xiang, T.; Song, Y.Z. Picture that sketch: Photorealistic image generation from abstract sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 6850–6861. [Google Scholar]
Yu, Q.; Liu, F.; Song, Y.Z.; Xiang, T.; Hospedales, T.M.; Loy, C.C. Sketch me that shoe. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 799–807. [Google Scholar]
Sangkloy, P.; Jitkrittum, W.; Yang, D.; Hays, J. A sketch is worth a thousand words: Image retrieval with text and sketch. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 251–267. [Google Scholar]
Sain, A.; Bhunia, A.K.; Chowdhury, P.N.; Koley, S.; Xiang, T.; Song, Y.Z. Clip for all things zero-shot sketch-based image retrieval, fine-grained or not. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 2765–2775. [Google Scholar]
Chaudhuri, A.; Bhunia, A.K.; Song, Y.Z.; Dutta, A. Data-free sketch-based image retrieval. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 12084–12093. [Google Scholar]
Wang, A.; Ren, M.; Zemel, R. Sketchembednet: Learning novel concepts by imitating drawings. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual (Honolulu, HI, USA), 18–24 July 2021; pp. 10870–10881. [Google Scholar]
Zou, C.; Yu, Q.; Du, R.; Mo, H.; Song, Y.Z.; Xiang, T.; Gao, C.; Chen, B.; Zhang, H. SketchyScene: Richly-Annotated Scene Sketches. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; Springer International Publishing: Berlin/Heidelberg, Germany, 2018; pp. 438–454. [Google Scholar] [CrossRef]
Gao, C.; Liu, Q.; Xu, Q.; Wang, L.; Liu, J.; Zou, C. Sketchycoco: Image generation from freehand scene sketches. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5174–5183. [Google Scholar]
Chowdhury, P.N.; Sain, A.; Bhunia, A.K.; Xiang, T.; Gryaditskaya, Y.; Song, Y.Z. Fs-coco: Towards understanding of freehand sketches of common objects in context. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 253–270. [Google Scholar]
Wang, Z.; Qiu, S.; Feng, N.; Rushmeier, H.; McMillan, L.; Dorsey, J. Tracing Versus Freehand for Evaluating Computer-Generated Drawings. ACM Trans. Graph. 2021, 40, 52. [Google Scholar] [CrossRef]
Yun, K.; Seo, K.; Seo, C.W.; Yoon, S.; Kim, S.; Ji, S.; Ashtari, A.; Noh, J. Stylized Face Sketch Extraction via Generative Prior with Limited Data. Comput. Graph. Forum 2024, 43, e15045. [Google Scholar] [CrossRef]
Shugrina, M.; Liang, Z.; Kar, A.; Li, J.; Singh, A.; Singh, K.; Fidler, S. Creative Flow+ Dataset. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 16–20 June June 2019. [Google Scholar]
Danbooru2021: A Large-Scale Crowdsourced and Tagged Anime Illustration Dataset. Available online: https://gwern.net/danbooru2021 (accessed on 14 July 2025).
Ashtari, A.; Seo, C.W.; Kang, C.; Cha, S.; Noh, J. Reference Based Sketch Extraction via Attention Mechanism. ACM Trans. Graph. 2022, 41, 207. [Google Scholar] [CrossRef]
Seo, C.W.; Ashtari, A.; Noh, J. Semi-supervised Reference-based Sketch Extraction using a Contrastive Learning Framework. ACM Trans. Graph. 2023, 42, 56. [Google Scholar] [CrossRef]
BeFunky Photo Editor. Available online: https://www.befunky.com (accessed on 14 July 2025).
Fotor. Photo to Sketch: Free Image to Sketch Converter. Available online: https://www.fotor.com/ (accessed on 14 July 2025).

Figure 1. Various sketches: (a) spectrum of sketches from doodles to descriptive and (b) sketch datasets. Left one is from Quick, Draw! [5]; center is from DifferSketching [7], and right is the one we collected.

Figure 2. The key idea of our framework from important existing studies: (a) tangent flow and a flowpath at a center pixel(the green color indicates the intensity of the noisy sketch, where darker shades represent stronger components), (b) the difference in sketch generation processes of Cabral and Leedom [21], Kang et al. [9], and our framework.

Figure 3. The overview of our framework. Our framework is composed of two modules: the Noisy Sketch Generator and a noise classifier. The noise classifier is composed of wide-pass networks, whose structures are illustrated in the bottom.

Figure 4. Visualization of the VTF estimation process. (a) Two pixels (

x

and

y

) and their flowpaths (

F (x)

and

F (y)

); (b) the same pixels and flowpaths on the noisy sketch; (c) the result sketches; (d) VTFs computed at the

x

(upper) and

y

(lower) parts.

Figure 4. Visualization of the VTF estimation process. (a) Two pixels (

x

and

y

) and their flowpaths (

F (x)

and

F (y)

); (b) the same pixels and flowpaths on the noisy sketch; (c) the result sketches; (d) VTFs computed at the

x

(upper) and

y

(lower) parts.

Figure 5. Results on portraits.

Figure 6. Results on landscape.

Figure 7. Results on animals.

Figure 8. Results on animation images (The bottom row shows the input).

Figure 9. Results on objects (The bottom row shows the input).

Figure 10. Comparison of our results with existing studies and services for portrait. From left to right: results of ours, [23,63,64,65,66].

Figure 11. Comparison of our results with existing studies and services for landscape. From left to right: results of ours, [23,63,64,65,66].

Figure 12. Comparison of our results with existing studies and services for other images. From left to right: results of ours, [23,63,64,65,66].

Figure 13. Ablation study.

Table 1. Quantitative evaluation of our study and the existing studies. ↑ denotes that higher values are better, and ↓ denotes that lower values are better. We distinguish the figures of the best results using bold characters and the figures of the second-best results using italic characters.

		Line-Drawing Sketch				Tonal Sketch
		Ours	Li 2019	Ashtari 2022	Seo 2023	BeFunky	Fotor
			[23]	[63]	[64]	[65]	[66]
	PSNR ↑	35.5045	35.4907	34.6952	35.4999	35.4729	33.0367
Portrait	FID ↓	282.7476	423.2384	359.2274	213.9626	171.6401	219.4862
(Figure 10)	CLIPscore ↑	0.8947	0.7632	0.8031	0.8844	0.9108	0.8661
	LPIPS ↓	0.2369	0.4086	0.3343	0.2827	0.3497	0.4789
	PSNR ↑	33.2470	33.2460	32.9502	33.2134	33.2495	31.7239
Landscape	FID ↓	288.9014	571.9037	353.7223	362.5313	155.8621	312.4798
(Figure 11)	CLIPscore ↑	0.9078	0.7304	0.8234	0.8524	0.9296	0.9015
	LPIPS ↓	0.3574	0.5399	0.4274	0.3776	0.3703	0.5723
	PSNR ↑	39.7271	39.6895	37.3449	39.6778	39.6720	37.3697
Other	FID ↓	193.7292	398.5725	364.4751	215.5324	165.0447	207.1792
(Figure 12)	CLIPscore ↑	0.9540	0.8664	0.7609	0.8837	0.8915	0.8472
	LPIPS ↓	0.1489	0.2450	0.4571	0.2591	0.2925	0.2941

Table 2. Qualitative evaluation of our method and the existing studies. ↑ denotes that higher values are better. We distinguish the figures of the best results using bold characters and the figures of the second-best results using italic characters.

		Line-Drawing Sketch				Tonal Sketch
		Ours	Li 2019	Ashtari 2022	Seo 2023	BeFunky	Fotor
			[23]	[63]	[64]	[65]	[66]
Portrait	Q1 ↑	7.67	1.33	3.67	9.00	8.00	6.33
(Figure 10)	Q2 ↑	7.33	6.67	4.00	9.67	9.67	1.67
	Q3 ↑	9.67	6.67	4.67	9.33	5.00	4.00
Landscape	Q1 ↑	9.25	3.50	2.00	3.75	10.00	8.00
(Figure 11)	Q2 ↑	8.25	5.00	2.75	4.75	9.50	7.25
	Q3 ↑	9.25	9.00	3.75	8.25	6.25	4.75
Other	Q1 ↑	9.00	5.25	1.75	6.75	9.50	7.50
(Figure 12)	Q2 ↑	9.25	9.00	2.25	6.50	9.50	7.00
	Q3 ↑	9.50	9.25	1.50	5.50	4.75	4.50
	Q1 ↑	8.64	3.36	2.47	6.50	9.17	7.28
Average	Q2 ↑	8.28	6.89	3.00	6.97	9.58	5.31
	Q3 ↑	9.47	8.31	3.31	7.69	5.33	4.42

Table 3. p-values from the t-test on data in Table 2. ↑ denotes that higher values are better. Bold figures denote that our model outperform the compared studies, while normal figures do not.

		Ours & Seo2023 [64]		Ours & BeFunky [65]
		p-Value	Valid	p-Value	Valid
Portrait	Q1 ↑	0.253	invalid	0.631	invalid
(Figure 10)	Q2 ↑	0691	invalid	0.073	invalid
	Q3 ↑	0.667	invalid	0.019	valid
Landscape	Q1 ↑	0.003	valid	0.058	invalid
(Figure 11)	Q2 ↑	0.012	valid	0.079	invalid
	Q3 ↑	0.092	invalid	0.011	valid
Other	Q1 ↑	0.003	valid	0.495	invalid
(Figure 12)	Q2 ↑	0.048	valid	0.638	invalid
	Q3 ↑	0.034	valid	0.049	valid

Table 4. The results of the ablation study. ↑ denotes that higher values are better, and ↓ denotes that lower values are better. Bold figure notes the best results.

	Noisy Sketch	Ours	Ours	Ours
	(Thresholding)	(Image-Only)	(VTF-Only)	(Image + VTF)
	Figure 13b	Figure 13c	Figure 13d	Figure 13e
PSNR ↑	36.7781	36.7299	36.7793	36.8004
FID ↓	197.6035	398.3203	91.6925	55.8984
CLIPscore ↑	0.9672	0.8791	0.9592	0.9732
LPIPS ↓	0.1796	0.3586	0.2063	0.0651

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.; Yang, H.; Min, K. Sketch Synthesis with Flowpath and VTF. Electronics 2025, 14, 2861. https://doi.org/10.3390/electronics14142861

AMA Style

Kim J, Yang H, Min K. Sketch Synthesis with Flowpath and VTF. Electronics. 2025; 14(14):2861. https://doi.org/10.3390/electronics14142861

Chicago/Turabian Style

Kim, Junho, Heekyung Yang, and Kyumgha Min. 2025. "Sketch Synthesis with Flowpath and VTF" Electronics 14, no. 14: 2861. https://doi.org/10.3390/electronics14142861

APA Style

Kim, J., Yang, H., & Min, K. (2025). Sketch Synthesis with Flowpath and VTF. Electronics, 14(14), 2861. https://doi.org/10.3390/electronics14142861

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sketch Synthesis with Flowpath and VTF

Abstract

1. Introduction

2. Related Work

2.1. Raster Sketch Synthesis

2.2. Vector Sketch Synthesis

2.3. Sketch-Based Application

2.4. Dataset

3. Method

3.1. Overview

3.2. Edge Tangent Flow

3.3. Flowpath and VTF

3.4. Noisy Sketch Generator

3.5. Noise Classifier

4. Implementation and Results

5. Comparison and Evaluation

5.1. Comparison

5.2. Quantitative Evaluation

5.3. Qualitative Evaluation

5.4. Ablation Study

5.5. Limitation

6. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI