MDPI - Publisher of Open Access Journals

20 pages, 37686 KiB

Open AccessArticle

Multi-Source Training-Free Controllable Style Transfer via Diffusion Models

by Cuihong Yu, Cheng Han and Chao Zhang

Symmetry 2025, 17(2), 290; https://doi.org/10.3390/sym17020290 - 13 Feb 2025

Cited by 1 | Viewed by 2190

Diffusion models, as representative models in the field of artificial intelligence, have made significant progress in text-to-image synthesis. However, studies of style transfer using diffusion models typically require a large amount of text to describe semantic content or specific painting attributes, and the style and layout of semantic content in synthesized images are frequently uncertain. To accomplish high-quality fixed content style transfer, this paper adopts text-free guidance and proposes a multi-source, training-free and controllable style transfer method by using single image or video as content input and single or multiple style images as style guidance. To be specific, the proposed method firstly fuses the inversion noise of a content image with that of a single or multiple style images as the initial noise of stylized image sampling process. Then, the proposed method extracts the self-attention mechanism’s query, key, and value vectors from the DDIM inversion process of content and style images and injects them into the stylized image sampling process to improve the color, texture and semantics of stylized images. By setting the hyperparameters involved in the proposed method, the style transfer effect of symmetric style proportion and asymmetric style distribution can be achieved. By comparing with state-of-the-art baselines, the proposed method demonstrates high fidelity and excellent stylized performance, and can be applied to numerous image or video style transfer tasks. Full article

(This article belongs to the Special Issue Symmetry and Asymmetry in Artificial Intelligence and Machine Learning-Based Image Processing)

► Show Figures

Figure 1

27 pages, 52132 KiB

Open AccessFeature PaperArticle

Temporally Coherent Video Cartoonization for Animation Scenery Generation

by Gustavo Rayo and Ruben Tous

Electronics 2024, 13(17), 3462; https://doi.org/10.3390/electronics13173462 - 31 Aug 2024

Viewed by 3350

Abstract

The automatic transformation of short background videos from real scenarios into other forms with a visually pleasing style, like those used in cartoons, holds application in various domains. These include animated films, video games, advertisements, and many other areas that involve visual content creation. A method or tool that can perform this task would inspire, facilitate, and streamline the work of artists and people who produce this type of content. This work proposes a method that integrates multiple components to translate short background videos into other forms that contain a particular style. We apply a fine-tuned latent diffusion model with an image-to-image setting, conditioned with the image edges (computed with holistically nested edge detection) and CLIP-generated prompts to translate the keyframes from a source video, ensuring content preservation. To maintain temporal coherence, the keyframes are translated into grids and the style is interpolated with an example-based style propagation algorithm. We quantitatively assess the content preservation and temporal coherence using CLIP-based metrics over a new dataset of 20 videos translated into three distinct styles. Full article

(This article belongs to the Special Issue Deep Learning in Video and Image Processing: Challenges, Solutions, and Future Directions)

► Show Figures

Figure 1

27 pages, 28358 KiB

Open AccessArticle

Fast Coherent Video Style Transfer via Flow Errors Reduction

by Li Wang, Xiaosong Yang and Jianjun Zhang

Appl. Sci. 2024, 14(6), 2630; https://doi.org/10.3390/app14062630 - 21 Mar 2024

Viewed by 2230

Abstract

For video style transfer, naively applying still image techniques to process a video frame-by-frame independently often causes flickering artefacts. Some works adopt optical flow into the design of temporal constraint loss to secure temporal consistency. However, these works still suffer from incoherence (including ghosting artefacts) where large motions or occlusions occur, as optical flow fails to detect the boundaries of objects accurately. To address this problem, we propose a novel framework which consists of the following two stages: (1) creating new initialization images from proposed mask techniques, which are able to significantly reduce the flow errors; (2) process these initialized images iteratively with proposed losses to obtain stylized videos which are free of artefacts, which also increases the speed from over 3 min per frame to less than 2 s per frame for the gradient-based optimization methods. To be specific, we propose a multi-scale mask fusion scheme to reduce untraceable flow errors, and obtain an incremental mask to reduce ghosting artefacts. In addition, a multi-frame mask fusion scheme is designed to reduce traceable flow errors. In our proposed losses, the Sharpness Losses are used to deal with the potential image blurriness artefacts over long-range frames, and the Coherent Losses are performed to restrict the temporal consistency at both the multi-frame RGB level and Feature level. Overall, our approach produces stable video stylization outputs even in large motion or occlusion scenarios. The experiments demonstrate that the proposed method outperforms the state-of-the-art video style transfer methods qualitatively and quantitatively on the MPI Sintel dataset. Full article

(This article belongs to the Special Issue Advanced Convolutional Neural Network (CNN) Technology in Object Detection and Data Processing)

► Show Figures

Figure 1

13 pages, 16987 KiB

Open AccessArticle

Depth-Aware Neural Style Transfer for Videos

by Eleftherios Ioannou and Steve Maddock

Computers 2023, 12(4), 69; https://doi.org/10.3390/computers12040069 - 27 Mar 2023

Cited by 3 | Viewed by 3639

Abstract

Temporal consistency and content preservation are the prominent challenges in artistic video style transfer. To address these challenges, we present a technique that utilizes depth data and we demonstrate this on real-world videos from the web, as well as on a standard video dataset of three-dimensional computer-generated content. Our algorithm employs an image-transformation network combined with a depth encoder network for stylizing video sequences. For improved global structure preservation and temporal stability, the depth encoder network encodes ground-truth depth information which is fused into the stylization network. To further enforce temporal coherence, we employ ConvLSTM layers in the encoder, and a loss function based on calculated depth information for the output frames is also used. We show that our approach is capable of producing stylized videos with improved temporal consistency compared to state-of-the-art methods whilst also successfully transferring the artistic style of a target painting. Full article

(This article belongs to the Special Issue Selected Papers from Computer Graphics & Visual Computing (CGVC 2022))

► Show Figures

Figure 1

14 pages, 5150 KiB

Open AccessArticle

Facial Feature Model for a Portrait Video Stylization

by Dongxue Liang, Kyoungju Park and Przemyslaw Krompiec

Symmetry 2018, 10(10), 442; https://doi.org/10.3390/sym10100442 - 28 Sep 2018

Cited by 5 | Viewed by 2910

Abstract

With the advent of the deep learning method, portrait video stylization has become more popular. In this paper, we present a robust method for automatically stylizing portrait videos that contain small human faces. By extending the Mask Regions with Convolutional Neural Network features (R-CNN) with a CNN branch which detects the contour landmarks of the face, we divided the input frame into three regions: the region of facial features, the region of the inner face surrounded by 36 face contour landmarks, and the region of the outer face. Besides keeping the facial features region as it is, we used two different stroke models to render the other two regions. During the non-photorealistic rendering (NPR) of the animation video, we combined the deformable strokes and optical flow estimation between adjacent frames to follow the underlying motion coherently. The experimental results demonstrated that our method could not only effectively reserve the small and distinct facial features, but also follow the underlying motion coherently. Full article

► Show Figures

Figure 1

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (5)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI