LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting

Kim, Jeongho; Hwang, Byungsun; Kim, Jinwook; Lee, Seongwoo; Kim, Soohyun; Sun, Youngghyu; Kim, Jinyoung

doi:10.3390/electronics15051018

Open AccessArticle

LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting

by

Jeongho Kim

,

Byungsun Hwang

,

Jinwook Kim

,

Seongwoo Lee

,

Soohyun Kim

,

Youngghyu Sun

and

Jinyoung Kim

^*

Department of Electronic Convergence Engineering, Kwangwoon University, Seoul 01897, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(5), 1018; https://doi.org/10.3390/electronics15051018

Submission received: 30 January 2026 / Revised: 21 February 2026 / Accepted: 27 February 2026 / Published: 28 February 2026

(This article belongs to the Special Issue 3D Scene Reconstruction, Generation and Understanding: Latest Advances and Prospects)

Download

Browse Figures

Versions Notes

Abstract

The recent development of 3D generative AI encompassing generation and editing technologies has been increasingly investigated to advance immersive applications. To enrich visual aesthetics, 3D stylization techniques focus on transferring artistic effects from reference style images to 3D scenes. However, existing 3D stylization techniques primarily focus on global style transfer, which can result in unwanted modifications to background regions and a lack of localized control. To address these limitations, we propose LocalGaussStyle, a novel approach for localized style transfer on scenes represented by 3D Gaussian splatting. The proposed pipeline consists of two phases: object localization and localized stylization. First, 2D instance segmentation masks are projected into a 3D scene to precisely localize target objects. Next, a boundary-aware optimization is designed to perform style transfer and mitigate style leakage caused by the spatial overlap of Gaussians. In addition, geometry-decoupled adaptive densification (GDAD) is employed to enhance the geometric resolution of Gaussians within the target object, thereby improving the representation capacity. The LocalGaussStyle facilitates high-fidelity style transfer that preserves the geometry and appearance of the non-target regions. In terms of style fidelity and background preservation, the effectiveness and efficiency of the proposed method are demonstrated through extensive experiments conducted on various scenes and reference style images.

Keywords:

3D Gaussian splatting; localized style transfer; boundary control; adaptive densification

1. Introduction

Visual art stylization has been used to provide users with more visually immersive experiences in content such as digital art, virtual reality (VR), and augmented reality (AR) [1]. 3D scene stylization is the process of editing the visual appearance of a 3D scene by transferring the style of a 2D reference image [2]. This process is a core technology for visual art stylization. Conventional methods have relied on explicit representations, such as meshes [3,4], voxels [5,6], and point clouds [7,8] for 3D scene stylization. However, these methods can cause visual artifacts and distortions in complex scenes due to their inherent structural constraints and discretization [9].

Recent studies have attracted significant attention for employing implicit neural representations, such as neural radiance fields (NeRFs). NeRF-based methods represent the geometry and appearance as a continuous implicit function using a multilayer perceptron (MLP) [10]. These methods synthesize 3D scenes through volumetric rendering by sampling multiple points along each ray. Although NeRF-based methods have enabled higher-fidelity rendering than mesh- or point cloud-based methods, their practical applicability has been limited by computationally expensive optimization and slow rendering speeds [11].

3D Gaussian splatting (3DGS) represents 3D scenes as a set of anisotropic 3D Gaussian primitives. 3DGS employs tile-based rasterization for real-time rendering and enables direct editing of scene elements through explicit representation [12]. Compared to implicit NeRF-based methods, 3DGS-based methods achieved substantial improvements in rendering speed and editability. Due to its high rendering speed and editability, 3DGS has emerged as a promising solution for practical applications.

Most previous studies related to 3D scene stylization have focused on global scene stylization. However, localized stylization has received comparatively less attention and remains a challenging task. Applying style transfer to localized regions often leads to unintended style leakage into neighboring regions due to the spatial overlap of Gaussians during alpha-blending [13]. Moreover, the discrete representation of Gaussians can induce appearance-geometry entanglement, which can result in a lack of representation capacity for reference style when rendering complex patterns over fine-grained structures [14]. StyleSplat [13] has been proposed for localized 3D stylization. In [13], a segmentation model was employed to mask the target object, and the reference style is transferred only to the masked object. However, mask-based localized stylization can lead to style leakage near object boundaries due to the overlap and blending of semi-transparent Gaussians during rasterization. Furthermore, existing densification strategies are mainly designed for geometric reconstruction fidelity rather than enhancing the style fidelity.

In this paper, a novel localized 3D stylization method called LocalGaussStyle is proposed to address limitations of localized stylization and the lack of representation capacity. The LocalGaussStyle employs an off-the-shelf 2D segmentation model to extract an instance mask of the target object. The extracted 2D mask is projected into a 3D scene to provide spatial guidance for the target object localization. Style transfer is applied only within the target object identified by the mask. Whereas previous approaches depend solely on segmentation masks to confine stylization, we introduce a boundary-aware optimization mechanism that explicitly models boundary preservation. This mechanism can suppress unintended style leakage due to Gaussian overlay and alpha-blending through a boundary control (BC) loss function that penalizes style leakage outside the mask boundary. Furthermore, geometry-decoupled adaptive densification (GDAD) is employed to enhance the geometric resolution of Gaussians in the target object. In contrast to conventional densification methods designed for geometric fidelity, GDAD can be driven by style loss gradients and selectively increases geometric resolution in regions exhibiting style underfitting. The initial Gaussians are often sparse, which makes it difficult to capture fine artistic details. The representation capacity for style fidelity can be significantly enhanced by densifying the Gaussian, as compared to the original coarse distribution. The main contributions are summarized as follows:

LocalGaussStyle is proposed for localized 3D stylization based on 3DGS. Local-GaussStyle consists of the following pipeline: (i) 2D instance masks of the target object are extracted using an off-the-shelf 2D instance segmentation model and then projected onto the 3D scene to precisely identify the region for style transfer, (ii) boundary-aware optimization, and (iii) GDAD. Through this process, LocalGaussStyle can achieve high style fidelity for the target object while preserving the geometry and appearance of non-target regions.
A boundary-aware loss function is designed to mitigate style leakage caused by the spatial overlap of Gaussians. The proposed loss function combines a masked nearest neighbor feature matching (M-NNFM) loss with a boundary control (BC) loss to preserve appearance consistency in the non-target regions. By employing a regularization strategy to balance the two losses, localized style transfer with stable boundaries can be achieved.
A GDAD strategy is employed to decouple the region containing the target object from the non-target regions and enhance its representation capacity through densification. This strategy can increase the geometric resolution of the target region based on view-invariant gradients derived from M-NNFM loss. This method can effectively capture intricate high-frequency style patterns and preserve the structural integrity of the non-target regions.

The remainder of this paper is organized as follows. Section 2 reviews related works on 3DGS. Section 3 describes the LocalGaussStyle in detail. The experimental results of the LocalGaussStyle are presented in Section 4. Finally, we conclude this paper in Section 5.

2. Related Works

2.1. 3D Scene Representation

3D scene representation models the geometry and appearance of a 3D environment. The conventional methods, such as polygon meshes, voxels, and point clouds, provide direct and intuitive manipulation of individual elements. These explicit representations are often constrained by their fixed resolution or grid-based nature. Furthermore, significant memory overhead can be incurred when capturing complex geometries and view-dependent appearance.

NeRFs can implicitly represent a scene via MLPs that can predict volume density and view-dependent color for each point in 3D space [10]. Compared to conventional explicit representations, a NeRF enables higher-fidelity rendering and accurately captures intricate geometric details and view-dependent effects. Because a NeRF relies on ray marching with dense sampling along camera rays, it requires hundreds of samples per pixel, leading to long training times and slow rendering speeds.

Some studies have been conducted to address computational inefficiency and slow rendering speeds of NeRFs. In Instant-NGP [15], multiresolution hash encoding is proposed to enhance training and rendering speed. It uses tiny MLP to reconstruct high-resolution scenes by indexing features from a hash grid. TensoRF [16] considered a 3D scene as a single 4D tensor and introduced a tensor decomposition technique that factorizes into low-rank components. It reduced memory usage via tensor decomposition while maintaining rendering quality. However, NeRF-based methods still have limitations in practical applications due to high computational overhead and expensive per-scene optimization.

3DGS has recently emerged as a technique that represents a scene as a set of anisotropic 3D Gaussians parameterized by learnable attributes such as position, covariance, opacity, and spherical harmonics [12]. 3DGS can avoid the computationally intensive ray-marching process of NeRFs by projecting anisotropic 3D Gaussians directly onto the 2D image plane. The projected Gaussians are depth-sorted within each tile using a tile-based rasterization method. Final pixel colors are determined through alpha-blending of the sorted Gaussians. This tile-based approach can be optimized for GPU parallel processing, enabling high rendering efficiency and real-time rendering.

2.2. 3D Scene Stylization

3D scene stylization aims to transfer the visual style of a 2D reference image to a 3D scene while preserving the original scene geometry. In artistic radiance fields (ARFs) [17], a nearest neighbor feature matching (NNFM) loss is proposed to transfer detailed style patterns into an implicit field. StyleGaussian [18] presents a 3DGS-based style transfer that enables the transfer of arbitrary image styles to 3D scenes. It incorporates a K-nearest neighbor-based 3D convolutional neural network (CNN) decoder to ensure view consistency. StylizedGS [19] optimizes geometric and appearance parameters to effectively capture high-frequency details from the reference style image. These methods are designed for global scene stylization, in which the style is transferred to the entire scene. As a result, they do not support localized control that restricts style application to specific user-selected objects while preserving the original appearance of the background and other objects.

StyleSplat [13] addresses these limitations by introducing an image segmentation model for localized 3D stylization. Specifically, it uses an off-the-shelf 2D segmentation model to generate a 2D mask that identifies the boundaries of the target object. However, style leakage can occur when the style at the boundary of the target object unintentionally spills into the surrounding area.

3. Proposed Method for Localized Stylization

The objective of the LocalGaussStyle is to transfer the artistic style from a 2D reference image to target regions in a 3DGS scene. The overall process consists of three main phases: (i) segmentation model-based object localization and (ii) localized stylization integrated boundary-aware optimization and GDAD. The overall architecture of the proposed LocalGaussStyle is illustrated in Figure 1.

First, a segmentation model is employed to obtain spatial guidance of the target region. Specifically, 2D segmentation masks of the target object are extracted for each multi-view image using off-the-shelf 2D segmentation models. To lift the 2D segmentation mask into the 3D scene, each Gaussian is assigned a mask probability by projecting it onto the multi-view 2D masks. By aggregating these view-dependent mask statistics, a consistent 3D mask representation is established, allowing for the precise identification of Gaussians that comprise the target object.

Second, the appearance parameters of Gaussians are optimized strictly within the target region. Localized style transfer is conducted by optimizing the spherical harmonic (SH) function coefficients guided by the M-NNFM loss. The M-NNFM aligns feature representations extracted from a pre-trained VGG-16 network to minimize the distance between the rendered target region and the reference style image. However, spatial overlap of semi-transparent Gaussians can cause color bleeding into adjacent regions, leading to unintended style leakage. To mitigate style leakage, a BC loss is introduced to strictly confine the stylization effects to the target mask region. The BC loss penalizes appearance deviations in the background, preventing unintended style leakage. The proposed loss function integrates the M-NNFM and BC losses to ensure high-fidelity style transfer for the target object while maintaining background consistency.

Additionally, a GDAD is introduced to adaptively improve the Gaussian resolution within the target region, enhancing the representation capacity of the initially learned Gaussians. The core function of GDAD is to guide the densification process based on the gradient of the M-NNFM loss. During style transfer optimization, view-space gradient statistics derived from the M-NNFM loss are accumulated, and adaptive densification is triggered when the average gradient within the masked target region exceeds a threshold. In this process, the Gaussian targeted for densification is split into some child Gaussians.

3.1. 3D Gaussian Splatting

The 3DGS is adopted as a 3D scene representation that enables explicit manipulation of appearance parameters for localized style transfer. 3DGS explicitly represents a scene as a set of anisotropic 3D Gaussian primitives

G = {g_{i}}_{i = 1}^{| G |}

. Each Gaussian is defined as

g_{i} (x | μ_{i}, Σ_{i}) = exp (- \frac{1}{2} {(x - μ_{i})}^{⊤} Σ_{i}^{- 1} (x - μ_{i})),

(1)

where

μ_{i} \in R^{3}

and

Σ_{i} \in R^{3 \times 3}

denote the centroid and the covariance matrix, respectively. The covariance matrix

Σ_{i}

can be decomposed into a scaling matrix

S_{i} \in R^{3 \times 3}

and a rotation matrix

R_{i} \in R^{3 \times 3}

. The covariance matrix can be obtained by

Σ_{i} = R_{i} S_{i} S_{i}^{⊤} R_{i}^{⊤}

.

In addition, each Gaussian is characterized by an opacity

o_{i} \in [0, 1]

, and a view-dependent color

c_{i}

is represented via spherical harmonics (SH) coefficients. To render the scene efficiently, 3D Gaussians are projected onto the 2D image plane by a differentiable tile-based rasterizer. Specifically, 3D Gaussians are (i) assigned to local image tiles, (ii) sorted by depth within each tile, and (iii) used to compute a pixel color by accumulation using alpha-blending. The pixel color can be computed as

C (p) = \sum_{i \in N} c_{i} α_{i} \prod_{j = 1}^{i - 1} (1 - α_{j}),

(2)

where

N

denotes a set of Gaussians that overlap the pixel p, and

α_{i}

is obtained by multiplying the opacity

o_{i}

with the density of the projected 2D Gaussian distribution at p.

3.2. Segmentation Model-Based Object Localization

To perform style transfer on specific objects, instance masks for the target objects are required for each input view. In the LocalGaussStyle, the segment anything model (SAM) [20] is adopted to obtain these masks, as it provides robust and consistent instance segmentation without requiring scene-specific fine-tuning. SAM is an off-the-shelf segmentation model trained on large-scale image datasets and is capable of zero-shot segmentation across diverse object categories. The extracted instance masks serve as spatial guidance to precisely localize the target object within the scene. Examples of extracted instance mask images using SAM are shown in Figure 2.

Given an image sequence with T frames, each input image

I_{t} \in R^{H \times W \times 3}

is processed using SAM to generate a set of instance masks. The set of instance masks at frame t is defined as

M_{t} = (m_{t}^{1}, m_{t}^{2}, \dots, m_{t}^{K}),

(3)

where K is the number of instances detected in frame t, and

m_{t}^{k} \in {0, 1}^{H \times W}

is a binary mask representing the k-th instance.

For each produced mask, SAM provides a predicted intersection over union (IoU) score that estimates the quality and reliability of the segmentation. A higher IoU score indicates more reliable segmentation with superior boundary fidelity. Based on these scores, masks exceeding a predefined threshold are retained, while those with low confidence are filtered out.

Following confidence-based filtering, candidate masks may still include spatial overlaps due to instance ambiguity. To resolve these ambiguities, a prioritization scheme is implemented by ranking masks according to their spatial extent. Specifically, pixels associated with multiple masks are assigned to the mask with the largest area. This process ensures mutual exclusivity, where each pixel is uniquely assigned to a single instance.

Each filtered mask is assigned to a unique instance index. These indices are aggregated to construct an instance segmentation map, where each pixel is represented by an integer value between 0 and 255. The generated instance segmentation map is used as input to a probabilistic projection scheme, in which each 2D instance mask is projected into 3D space using camera intrinsics and extrinsics. This projection enables the integration of view-dependent 2D masks into a unified 3D mask representation of the target object within the Gaussian scene.

3.3. Loss Function for Localized Style Transfer

A loss function consists of the M-NNFM loss and the BC loss. The M-NNFM loss is employed to achieve localized stylization of the target object by operating on pixels within the masked target region. Specifically, intermediate feature maps are extracted from the rendered image and the reference style image using a pre-trained VGG-16 network. By minimizing the cosine distance between each feature vector in the masked target region and its nearest neighbor in the reference style image, style patterns are effectively transferred onto the target object. The M-NNFM loss is defined as

L_{style} = \sum_{l} w_{l} (\frac{1}{| M_{render} |} \sum_{p \in M_{render}} {∥ ϕ_{l} {(I_{render})}_{p} - ϕ_{l} {(I_{style})}_{q^{*}} ∥}_{2}^{2}),

(4)

where

M_{render}

denotes the binary mask indicating pixels within the target object,

I_{render}

and

I_{style}

denote the rendered and style reference images, respectively, and

ϕ_{l} (\cdot)

and

w_{l}

denote the feature map extracted from the l-th layer of the pre-trained VGG-16 network and the corresponding weight, respectively. For each feature vector at pixel p in the rendered target object, the nearest neighbor

q^{*}

is identified from the style reference image by minimizing the cosine distance in feature space. The nearest neighbor is calculated as

q^{*} = \underset{q}{arg min} d_{cos} (ϕ_{l} {(I_{render})}_{p}, ϕ_{l} {(I_{style})}_{q}) .

(5)

The BC loss is proposed to preserve boundary integrity during localized stylization and prevent style leakage. This loss constrains pixels in the background region to remain consistent with their original appearance. The BC loss is defined as

L_{BC} = \frac{1}{H W} \sum_{p} {∥I_{render} (p) - I_{origin} (p)∥}_{2}^{2} \cdot (1 - M_{render} (p)),

(6)

where H and W denote the image height and width, respectively, and

I_{origin} (p)

denotes the original rendered image before style transfer.

Since

L_{BC}

and

L_{style}

represent competing optimization objectives, their relative magnitudes must be carefully balanced. To prevent loss imbalance and ensure stable optimization, the normalized BC loss is utilized as follows:

L_{BC}^{norm} = \frac{L_{BC}}{{\bar{L}}_{style} + ϵ},

(7)

where

{\bar{L}}_{style}

denotes the mean M-NNFM loss value during the training iteration, and

ϵ

is a small numerical stability term. The normalization prevents the boundary loss from becoming negligible when the M-NNFM loss is large, or from over-constraining the optimization when the M-NNFM loss is small.

The total loss function is formulated as

L_{total} = λ_{style} L_{style} + λ_{BC} L_{BC}^{norm},

(8)

where

λ_{style}

and

λ_{BC}

denote the balancing hyperparameters for the M-NNFM loss and the normalized BC loss, respectively.

The proposed loss function is formulated by combining an M-NNFM loss with a normalized BC loss to enable localized stylization while suppressing style leakage. The normalization strategy is employed to balance competing objectives, resulting in stable and robust optimization throughout training.

3.4. Geometry-Decoupled Adaptive Densification

With geometric attributes held fixed, capturing high-frequency details from the reference image during 3D scene stylization can be challenging due to insufficient density in the target region. To address this limitation, we propose a GDAD strategy that enhances local geometric granularity within masked Gaussians.

During style transfer optimization, 2D view-space gradients are accumulated for Gaussians within the target object region. These gradients quantify each Gaussian’s projected position contribution to the

L_{style}

. For each Gaussian

g_{i}

derived from the

L_{style}

, the magnitude of its gradient is calculated as

γ_{i} = {∥\frac{\partial L_{style}}{\partial u_{i}}∥}_{2},

(9)

where

u_{i} \in R^{2}

denotes the 2D view-space coordinates of the projected 2D Gaussian (i.e., the center of the Gaussian in the rendered image after projection). The

γ_{i}

is accumulated solely for Gaussian primitives within the masked region across all visible views to guide GDAD. The accumulated gradient is computed as

A_{style}^{(i)} = \sum_{t \in visible} γ_{i} (t),

(10)

where the summation is performed only over Gaussians within the target object mask.

To compute a view-invariant gradient magnitude, the 2D view-space gradients are accumulated over views and frames and then normalized by the number of times each Gaussian was visible within the masked region. View-invariant gradient magnitude is defined as

{\bar{γ}}_{i} = \frac{A_{style}^{(i)}}{C_{i} + ϵ},

(11)

where

C_{i}

is the number of views in which

g_{i}

is both visible and within the target mask, and

ϵ

is a small constant to avoid division by zero.

Large

{\bar{γ}}_{i}

with respect to the

L_{style}

indicates regions suffering from texture underfitting, where existing Gaussians lack sufficient representation capacity to capture the high-frequency style details. The GDAD process is triggered when the average gradient magnitude exceeds a predefined threshold

τ_{style}

.

The candidate Gaussians can be densified into N smaller Gaussians, consisting of one center and

N - 1

neighbors. The illustrations of the densified Gaussians are shown in Figure 3 for

N = 9

. By populating the eight octants and the center, the original volumetric extent is densely covered by the newly generated Gaussians, thereby preventing structural deviation during the densification process [21]. The positions of the densified Gaussians are given by

{\tilde{μ}}_{i, j} = μ_{i} + o_{j} ⊙ \frac{s_{i}}{4}, \forall j \in {0, 1, . . ., 8},

(12)

where

μ_{i}

,

s_{i}

, and

o_{j}

denote the mean position of the original Gaussian, the scale vector, and the octant offset vector, respectively, and ⊙ stands for element-wise multiplication. The set of offset vectors is defined as

\begin{matrix} O = & {{[0, 0, 0]}^{⊤}, {[1, 1, 1]}^{⊤}, {[1, 1, - 1]}^{⊤}, \\ {[1, - 1, 1]}^{⊤}, {[1, - 1, - 1]}^{⊤}, {[- 1, 1, 1]}^{⊤}, \\ {[- 1, 1, - 1]}^{⊤}, {[- 1, - 1, 1]}^{⊤}, {[- 1, - 1, - 1]}^{⊤}} \end{matrix} .

(13)

The central offset

{[0, 0, 0]}^{⊤}

preserves a Gaussian at the original position. The eight non-zero offset vectors correspond to the vertices of a cube. The scale of each densified Gaussian is reduced to 1/8 of the original scale. This scaling ensures that the union of the nine densified Gaussians remains within a region comparable to the original while providing higher granularity for high-frequency representation. All intrinsic attributes of the original Gaussian are inherited by each of the densified Gaussians. The GDAD strategy is performed once at a predefined iteration during style transfer optimization.

4. Simulation Results

4.1. Simulation Settings

Extensive experiments were conducted on multiple real-world scenes in the Tanks and Temples (T&T) dataset [22], the MipNeRF-360 dataset [23], and a custom dataset to evaluate the effectiveness of the LocalGaussStyle. The T&T and MipNeRF-360 datasets were selected as these benchmarks are widely adopted in 3D stylization. These datasets provide complex real-world scenes with diverse geometries and viewpoints, making them well-suited for evaluating localized style transfer. To evaluate the practical applicability of the proposed model, a custom dataset of nail art was constructed using data obtained from the AI hub, managed by the Korean government. The collected images were captured from a distance of approximately 30 cm using a 50 MP RGB camera with a resolution of

2124 \times 2832

, at various fields of view. Due to the sensitivity of biometric information in hand images, a rigorous de-identification procedure was applied. Specifically, fingerprint regions were obscured using tape and Gaussian blurring. Additionally, the WikiArt dataset [24] was used as a reference style image for demonstrating the versatility of style transfer.

The LocalGaussStyle was implemented using PyTorch 2.1.1, and all experiments were conducted on an Intel Core i9-10940X CPU, 64 GB RAM, and three NVIDIA RTX 3090 GPUs. The comparison model was implemented using the official source code released by the authors of StyleSplat [13] on GitHub https://github.com/bernard0047/style-splat, accessed on 29 January 2026. For each scene, the geometric structure and color of the 3D Gaussians and the Gaussian-wise classifier for object segmentation were trained for 30,000 iterations. Subsequently, the style transfer optimization was performed for 2000 iterations. During this phase, the GDAD process is executed a single time at iteration 100. The key hyperparameters used in the experiments are shown in Table 1. The values of key hyperparameters were determined based on empirical experiments.

4.2. Qualitative Evaluation for Localized Style Transfer

The qualitative comparisons for the T&T and MipNeRF datasets are shown in Figure 4 and Figure 5. From the results of the StyleSplat, it was confirmed that the fine-grained details of the reference style image are not adequately captured. In contrast, not only the global color tones of the reference style image but also high-frequency patterns can be effectively reproduced by the LocalGaussStyle.

Specifically, zoomed-in views of target regions are presented in Figure 6. It was confirmed that high-frequency style patterns can be effectively captured by LocalGaussStyle rather than StyleSplat. For instance, in the wooden table scene, the StyleSplat mainly transfers only the global color tone, which is mainly red and green. On the other hand, the LocalGaussStyle faithfully reproduces not only the global color tone but also the repetitive floral patterns. Similarly, in the horse statue scene, the LocalGaussStyle reproduces the fine-grained dotted stylistic patterns more effectively compared to the results of the StyleSplat. Additionally, it was confirmed that the proposed model effectively captures fine-grained style details while preserving the structural integrity of the non-target region.

The qualitative comparison results for the custom dataset are presented in Figure 7, and the zoomed-in views are presented in Figure 8. From the results of the StyleSplat, style leakage was observed near the nail boundary, and the style transfer tends to be limited to the global color tone of the reference style image. In contrast, in the LocalGaussStyle, high applicability in practical applications such as virtual nail art was demonstrated by confining the style within the nail boundary. In addition, it was confirmed that the style fidelity is improved by more faithfully reproducing the intricate texture pattern, such as the spinning of the wind.

4.3. Quantitative Evaluation for Localized Style Transfer

Quantitative comparison was conducted to evaluate the effectiveness of the LocalGaussStyle. The reference-based learned perceptual image patch similarity (Ref-LPIPS) [21] and contrastive language image pre-training (CLIP) score [25] were employed to assess style fidelity. Ref-LPIPS quantifies the fidelity of the stylized result in reproducing the perceptual features of the reference style image. Specifically, the perceptual distance between the stylized image and the reference style image is computed based on feature maps extracted from a pre-trained VGG network. The CLIP score measures the cosine similarity between feature embeddings extracted from the stylized image and the reference style image using the image encoder of a pre-trained CLIP model. This metric assesses the semantic consistency of transferring the abstract style and overall atmosphere. To evaluate the performance of localized stylization, Ref-LPIPS and CLIP scores are measured by comparing the stylized object region, obtained via a segmentation mask, with the reference style image.

The learned perceptual image patch similarity (LPIPS) [26] and structural similarity index measure (SSIM) [27] metrics were used to evaluate background preservation and content integrity. LPIPS measures the perceptual distance between two images based on a pre-trained VGG network, assessing the degree of visual change compared to the original scene. SSIM assesses the preservation of the structural integrity based on structural similarity. The simulations were conducted on ten scenes selected from the T&T, Mip-NeRF 360, and custom datasets. For quantitative evaluation, 30 viewpoint pairs per scene were selected, and 10 distinct reference style images were applied, resulting in a total of 300 test cases.

Quantitative comparisons of style fidelity across multiple datasets are presented in Table 2. The proposed LocalGaussStyle method achieves marginally higher scores than the other models in terms of the Ref-LPIPS and CLIP scores. In particular, a greater increase has been observed in the StyleSplat with GDAD and LocalGaussStyle. It is demonstrated that richer and higher-frequency style features of the reference image can be captured by the GDAD strategy.

The quantitative comparison of background preservation and content integrity across multiple datasets is summarized in Table 3. A slight improvement in background preservation can be achieved, and undesired style leakage at object boundaries can be reduced when BC is incorporated in StyleSplat. The slight improvement observed in LPIPS and SSIM reflects the fact that boundary regions occupy a relatively small portion of the image. LPIPS and SSIM were designed to evaluate average perceptual similarity and structural consistency across the entire image. Therefore, they can have limited sensitivity to localized artifacts, particularly subtle style leakage that may occur near object boundaries.

The LocalGaussStyle introduces GDAD to enhance style fidelity within the target object. While this strategy enables richer and more expressive style representation, it can also increase perceptual deviation from the original image compared to the StyleSplat. In this context, the decrease in LPIPS and SSIM owing to the appearance changes caused by enhanced style representation can offset the slight improvements in global image metrics achieved by boundary control. To complement this limitation and to more intuitively evaluate style fidelity and boundary preservation performance in the target object region, the performance of the LocalGaussStyle was evaluated through qualitative comparison.

4.4. Ablation Study and Analysis

By varying the IoU threshold used for SAM mask filtering, the effect of segmentation accuracy on localized stylization has been analyzed, as summarized in Table 4. It was observed that lowering the IoU threshold led to a decrease in metrics related to style fidelity and background preservation, with a particularly notable decrease in background preservation near a specific IoU threshold at lower than 0.75. These results demonstrate that the performance of background preservation may depend on mask quality and can support the use of a high IoU threshold within the proposed framework.

In Figure 9, the convergence of the total loss during stylization optimization was compared with and without the BC loss normalization. The graph presents the raw loss values at each iteration and the smoothed trend line obtained by moving average (MA), allowing for simultaneous observation of optimization stability and convergence trends. With normalization, it was observed that oscillations in the early stages of training were reduced, resulting in a more stable convergence, and the loss converged to a lower value.

In Figure 10 and Table 5, ablation studies were conducted to evaluate the impact of varying the number of child Gaussians

N \in {2, 5, 9}

on model performance. The results show that increasing N can improve style fidelity and background preservation. In particular,

N = 9

achieves the best performance in terms of Ref-LPIPS, CLIP score, LPIPS, and SSIM. From a qualitative perspective, noise-like artifacts were observed on the stylized surfaces when fewer child Gaussians (

N = 2

and

N = 5

) were used. These artifacts can occur during the densification process from parent Gaussians to child Gaussians, where the spatial coverage of the parent Gaussian is insufficiently preserved. This lack of coverage can lead to discontinuities in alpha-blending, resulting in rendering inconsistencies and noise-like appearance variations.

To evaluate the computational overhead introduced by GDAD, GPU memory usage, rendering time, and inference speed were measured in Table 6. In the case of an object occupying 0.85% of the scene, GPU memory usage was slightly increased from 3.87 GB to 3.95 GB, with negligible changes in rendering time and FPS. In the case of an object occupying 5.75% of the scene, GPU memory usage increased from 9.59 GB to 9.82 GB, and FPS decreased slightly from 136.5 to 125.4. Although a slight decrease in FPS was observed, real-time performance was maintained. These results demonstrate that GDAD selectively increases Gaussian density only in the target regions, thereby enhancing style representation fidelity while preserving the real-time rendering efficiency of 3D Gaussian splatting.

4.5. Discussions

As shown in Table 2 and Table 3, the proposed LocalGaussStyle offers comparable performance to the StyleSplat and its variants, including StyleSplat with GDAD and BC. For LPIPS and SSIM, the observed marginal performance differences can be attributable to the inherent characteristics of these performance metrics, which typically measure similarity and structural consistency on a global scale between the original and the stylized image. Moreover, the GDAD strategy improves representation capacity within the target object, enabling the faithful reproduction of intricate high-frequency style patterns. Because the stylized object typically occupies a larger spatial extent than the boundary region, the appearance changes introduced by enhanced style fidelity may outweigh the gains in boundary preservation when evaluated using global metrics.

A trade-off between style fidelity and content integrity can be shown in the quantitative evaluations. As the representation capacity of Gaussians increases through GDAD, the model can more faithfully reproduce the complex textures of the reference style, leading to greater deviation from the original image. As style fidelity improves, LPIPS and SSIM related to content integrity tend to decrease due to larger perceptual differences from the original scene. Therefore, the relatively lower LPIPS and SSIM values of LocalGaussStyle do not necessarily indicate poor background preservation. Instead, these results can indicate that richer and more expressive style representations are successfully transferred within the target object.

5. Conclusions

In this paper, LocalGaussStyle was proposed for localized style transfer in scenes represented by 3DGS. Existing 3D scene stylization techniques primarily focus on global style transfer, lacking object-level precise control for localized style transfer. Furthermore, style leakage caused by the spatial overlap of Gaussians remains a challenge. We proposed three key components to address these challenges: (i) segmentation model-based object localization, which projects 2D instance segmentation masks into a 3D scene to localize target objects precisely, (ii) a boundary-aware optimization to mitigate style leakage caused by Gaussian overlapping, and (iii) a geometry-decoupled adaptive densification to enhance the representation capacity of target objects based on view-invariant gradients. These components effectively enabled high-fidelity localized style transfer while reliably preserving the structural integrity and appearance of the background. Simulation results on multiple public and custom datasets demonstrated that the LocalGaussStyle can improve the style fidelity of the target object while preserving background integrity performance compared to the comparison models.

While the proposed method achieves effective localized stylization, several limitations remain. First, the current framework is optimized for single objects and is sensitive to segmentation accuracy and complex boundaries. Second, it is designed for static scenes and therefore lacks mechanisms to ensure temporal consistency. Third, the optimization process may become unstable when camera viewpoints are extremely sparse.

To address these challenges, several promising research directions can be considered. Future work may extend the framework to support multi-object stylization and improve robustness against segmentation artifacts. Moreover, incorporating temporal constraints is essential for adapting the method to dynamic scenes. Finally, investigating techniques to stabilize gradient flows under sparse-view conditions will be crucial for ensuring reliable and consistent optimization.

The proposed LocalGaussStyle has potential applications in diverse fields such as AR/VR, digital content creation, and virtual production. It is expected to be used not only for services such as nail art but also for applications that require interactive digital content and personalized visual effects.

Author Contributions

Conceptualization, J.K. (Jeongho Kim); methodology, J.K. (Jeongho Kim); software, J.K. (Jeongho Kim) and B.H.; validation, J.K. (Jeongho Kim), B.H., J.K. (Jinwook Kim), and S.L.; formal analysis, J.K. (Jeongho Kim), B.H., J.K. (Jinwook Kim), S.K., and Y.S.; investigation, J.K. (Jeongho Kim), B.H., and J.K. (Jinwook Kim); resources, J.K. (Jeongho Kim), B.H., J.K. (Jinwook Kim), S.L., S.K., Y.S., and J.K. (Jinyoung Kim); data curation, J.K. (Jeongho Kim), B.H., S.K., and Y.S.; writing—original draft preparation, J.K. (Jeongho Kim); writing—review and editing, B.H., J.K. (Jinwook Kim), S.L., S.K., Y.S., and J.K. (Jinyoung Kim); visualization, J.K. (Jeongho Kim); supervision, J.K. (Jinyoung Kim); project administration, J.K. (Jinyoung Kim). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Industrial Technology Innovation Program (RS-2025-14383049, Development of spatial design technology applied with interactive kinetic media) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Data Availability Statement

The custom data that support the findings will be available in AI Hub Korea at https://www.aihub.or.kr/ following an embargo from the date of publication to allow for commercialization of research findings.

DURC Statement

Current research is limited to the field of computer vision and 3D scene reconstruction, which is beneficial for advancing immersive content creation and architectural visualization and does not pose a threat to public health or national security. Authors acknowledge the dual-use potential of the research involving 3D scene manipulation and confirm that all necessary precautions have been taken to prevent potential misuse. As an ethical responsibility, authors strictly adhere to relevant national and international laws about DURC. Authors advocate for responsible deployment, ethical considerations, regulatory compliance, and transparent reporting to mitigate misuse risks and foster beneficial outcomes.

Acknowledgments

This work was supported by the Industrial Technology Innovation Program (RS-2025-14383049, Development of spatial design technology applied with interactive kinetic media) funded By the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

3DGS	3D Gaussian Splatting
AR	Augmented Reality
VR	Virtual Reality
NeRF	Neural Radiance Field
MLP	Multilayer Perceptron
SAM	Segment Anything Model
GDAD	Geometry-Decoupled Adaptive Densification
M-NNFM	Masked Nearest Neighbor Feature Matching
BC	Boundary Control
SH	Spherical Harmonics
IoU	Intersection over Union
LPIPS	Learned Perceptual Image Patch Similarity
Ref-LPIPS	Reference-based Learned Perceptual Image Patch Similarity
SSIM	Structural Similarity Index Measure
PSNR	Peak Signal-to-Noise Ratio
CNN	Convolutional Neural Network
ARF	Artistic Radiance Fields
CLIP	Contrastive Language Image Pre-training
VGG	Visual Geometry Group
T&T	Tanks and Temples

References

Li, K.; Masuda, M.; Schmidt, S.; Mori, S. Radiance fields in XR: A survey on how radiance fields are envisioned and addressed for XR research. IEEE Trans. Vis. Comput. Graph. 2025, 31, 9709–9719. [Google Scholar] [CrossRef]
Fei, B.; Xu, J.; Zhang, R.; Zhou, Q.; Yang, W.; He, Y. 3D Gaussian splatting as a new era: A survey. IEEE Trans. Vis. Comput. Graph. 2025, 31, 4429–4449. [Google Scholar] [CrossRef] [PubMed]
Kato, H.; Ushiku, Y.; Harada, T. Neural 3D mesh renderer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 3907–3916. [Google Scholar] [CrossRef]
Michel, O.; Bar-On, R.; Liu, R.; Benaim, S.; Hanocka, R. Text2Mesh: Text-driven neural stylization for meshes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13482–13492. [Google Scholar] [CrossRef]
Guo, J.; Li, M.; Zong, Z.; Liu, Y.; He, J.; Guo, Y.; Yan, L.Q. Volumetric appearance stylization with stylizing kernel prediction network. ACM Trans. Graph. 2021, 40, 1–15. [Google Scholar] [CrossRef]
Klehm, O.; Ihrke, I.; Seidel, H.P.; Eisemann, E. Property and lighting manipulations for static volume stylization using a painting metaphor. IEEE Trans. Vis. Comput. Graph. 2014, 20, 983–995. [Google Scholar] [CrossRef] [PubMed]
Cao, X.; Wang, W.; Nagao, K.; Nakamura, R. PSNet: A style transfer network for point cloud stylization on geometry and color. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV), Snowmass Village, CO, USA, 1–5 March 2020; pp. 3326–3334. [Google Scholar] [CrossRef]
Bae, E.; Kim, J.; Lee, S. Point cloud-based free viewpoint artistic style transfer. In Proceedings of the IEEE International Conference on Multimedia and Expo Workshops (ICMEW), Brisbane, Australia, 10–14 July 2023; pp. 302–307. [Google Scholar] [CrossRef]
Chen, Y.; Shao, G.; Shum, K.C.; Hua, B.S.; Yeung, S.K. Advances in 3D neural stylization: A survey. Int. J. Comput. Vis. 2025, 133, 5026–5061. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. NeRF: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Barron, J.T.; Mildenhall, B.; Tancik, M.; Hedman, P.; Martin-Brualla, R.; Srinivasan, P.P. Mip-NeRF: A multiscale representation for anti-aliasing neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 5835–5844. [Google Scholar] [CrossRef]
Kerbl, B.; Kopanas, G.; Leimkuehler, T.; Drettakis, G. 3D Gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 1–14. [Google Scholar] [CrossRef]
Jain, S.; Kuthiala, A.; Sethi, P.S.; Saxena, P. StyleSplat: 3D object style transfer with Gaussian splatting. arXiv 2024, arXiv:cs.CV/2407.09473. Available online: http://arxiv.org/abs/2407.09473 (accessed on 29 January 2026).
Galerne, B.; Wang, J.; Raad, L.; Morel, J.M. SGSST: Scaling Gaussian splatting style transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 26535–26544. [Google Scholar] [CrossRef]
Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. 2022, 41, 102. [Google Scholar] [CrossRef]
Chen, A.; Xu, Z.; Geiger, A.; Yu, J.; Su, H. TensoRF: Tensorial radiance fields. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 333–350. [Google Scholar] [CrossRef]
Zhang, K.; Kolkin, N.; Bi, S.; Luan, F.; Xu, Z.; Shechtman, E.; Snavely, N. ARF: Artistic radiance fields. In Proceedings of the 17th European Conference on Computer Vision (ECCV), Tel Aviv, Israel, 23–27 October 2022; pp. 717–733. [Google Scholar] [CrossRef]
Liu, K.; Zhan, F.; Xu, M.; Theobalt, C.; Shao, L.; Lu, S. StyleGaussian: Instant 3D style transfer with Gaussian splatting. In Proceedings of the SIGGRAPH Asia 2024 Technical Communications (SA), Tokyo, Japan, 3–6 December 2024; pp. 1–4. [Google Scholar] [CrossRef]
Zhang, D.; Yuan, Y.J.; Chen, Z.; Zhang, F.L.; He, Z.; Shan, S.; Gao, L. StylizedGS: Controllable stylization for 3D Gaussian splatting. IEEE Trans. Pattern Anal. Mach. Intell. 2025, 47, 11961–11973. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–6 October 2023; pp. 3992–4003. [Google Scholar] [CrossRef]
Mei, Y.; Xu, J.; Patel, V.M. ReGS: Reference-based controllable scene stylization with Gaussian splatting. In Proceedings of the 38th International Conference on Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, 10–15 December 2024; pp. 4035–4049. [Google Scholar] [CrossRef]
Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. 2017, 36, 1–13. [Google Scholar] [CrossRef]
Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-NeRF 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5460–5469. [Google Scholar] [CrossRef]
Tan, W.R.; Chan, C.S.; Aguirre, H.E.; Tanaka, K. Improved ArtGAN for conditional synthesis of natural image and artwork. IEEE Trans. Image Process. 2019, 28, 394–409. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning (ICML), Virtual, 18–24 July 2021; Meila, M., Zhang, T., Eds.; PMLR: Brookline, MA, USA, 2021; Volume 139, pp. 8748–8763. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar] [CrossRef]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The overall architecture of the proposed LocalGaussStyle.

Figure 2. Segmentation results: (a) Original images used for 3DGS training. (b) Rendered images reconstructed by trained 3DGS. (c) Ground-truth masks generated by SAM. (d) Predicted masks produced by trained classifier.

Figure 3. Illustration of the geometry-decoupled adaptive densification strategy: (a) Shape of parent Gaussian. (b) Shape of the generated child Gaussians for N = 9. (c) The spatial arrangement of the densified centroids.

Figure 4. Qualitative comparisons of 3D localized stylization on T&T datasets.

Figure 5. Qualitative comparisons of 3D localized stylization on MipNeRF-360 datasets.

Figure 6. Zoomed-in qualitative comparisons of stylized details on T&T and MipNeRF-360 datasets.

Figure 7. Qualitative comparisons of 3D localized stylization on custom dataset.

Figure 8. Zoomed-in qualitative comparisons of stylized details on custom dataset.

Figure 9. Convergence analysis of the total loss with and without the BC loss normalization.

Figure 10. Qualitative comparison of the effect of the number of child Gaussians N.

Table 1. The key hyperparameters used in the simulation settings.

Phase	Hyperparameters	Values
3D Reconstruction and Object Localization	Iteration	30,000
	Position learning rate	$1.6 \times 10^{- 4}$
	Learning rate of SH coefficients	0.005
	3D regularization interval	2
	IoU threshold	> $0.95$
Localized Stylization	Iteration	2000
	Gradient threshold	$5 \times 10^{- 5}$
	$λ_{B C}$	10
	$λ_{style}$	1
	Number of child Gaussians	9
	GDAD activation iteration	100

Table 2. Quantitative comparisons of style fidelity across multiple datasets.

Datasets	T&T		MipNeRF-360		Custom
Metrics	Ref-LPIPS	CLIP Score	Ref-LPIPS	CLIP Score	Ref-LPIPS	CLIP Score
StyleSplat [13]	0.380 ± 0.016	0.805 ± 0.017	0.352 ± 0.016	0.763 ± 0.014	0.139 ± 0.009	0.959 ± 0.016
StyleSplat with GDAD	0.365 ± 0.015	0.810 ± 0.016	0.330 ± 0.013	0.786 ± 0.013	0.126 ± 0.014	0.975 ± 0.014
StyleSplat with BC	0.376 ± 0.014	0.809 ± 0.013	0.352 ± 0.016	0.762 ± 0.015	0.138 ± 0.011	0.959 ± 0.015
LocalGaussStyle (Proposed)	0.363 ± 0.016	0.812 ± 0.014	0.329 ± 0.013	0.781 ± 0.017	0.126 ± 0.012	0.976 ± 0.017

Table 3. Quantitative comparisons of background preservation and content integrity across multiple datasets.

Datasets	T&T		MipNeRF-360		Custom
Metrics	LPIPS	SSIM	LPIPS	SSIM	LPIPS	SSIM
StyleSplat [13]	0.228 ± 0.010	0.884 ± 0.010	0.104 ± 0.011	0.906 ± 0.008	0.104 ± 0.007	0.958 ± 0.013
StyleSplat with GDAD	0.256 ± 0.010	0.857 ± 0.009	0.133 ± 0.013	0.877 ± 0.014	0.110 ± 0.006	0.945 ± 0.012
StyleSplat with BC	0.228 ± 0.011	0.887 ± 0.009	0.102 ± 0.010	0.910 ± 0.008	0.103 ± 0.006	0.962 ± 0.013
LocalGaussStyle (Proposed)	0.246 ± 0.010	0.857 ± 0.009	0.132 ± 0.014	0.875 ± 0.013	0.103 ± 0.008	0.946 ± 0.013

Table 4. Analysis of localized stylization performance with respect to segmentation mask quality.

IoU Threshold	0.65	0.70	0.75	0.80	0.85	0.90	0.95
Ref-LPIPS	0.353 ± 0.015	0.350 ± 0.014	0.347 ± 0.015	0.347 ± 0.017	0.344 ± 0.012	0.332 ± 0.015	0.329 ± 0.013
CLIP Score	0.755 ± 0.012	0.758 ± 0.014	0.760 ± 0.015	0.764 ± 0.019	0.767 ± 0.018	0.778 ± 0.013	0.781 ± 0.017
LPIPS	0.231 ± 0.017	0.225 ± 0.012	0.212 ± 0.016	0.155 ± 0.015	0.143 ± 0.013	0.135 ± 0.015	0.132 ± 0.014
SSIM	0.748 ± 0.016	0.752 ± 0.015	0.771 ± 0.017	0.855 ± 0.012	0.861 ± 0.011	0.870 ± 0.016	0.875 ± 0.013

Table 5. Quantitative comparison of the effect of the number of child Gaussians N.

Metric	N = 2	N = 5	N = 9 (Proposed)
Ref-LPIPS	0.340 ± 0.013	0.337 ± 0.019	0.329 ± 0.013
CLIP Score	0.764 ± 0.013	0.768 ± 0.013	0.781 ± 0.017
LPIPS	0.149 ± 0.013	0.144 ± 0.015	0.132 ± 0.014
SSIM	0.851 ± 0.015	0.859 ± 0.017	0.875 ± 0.013

Table 6. Computational overhead analysis of GDAD for object sizes ranging from 0.85% to 5.75% of the scene.

Method	Generated Gaussians	GPU Memory	Rendering Time	Inference FPS
StyleSplat	–	3.87–9.59 GB	3.65–7.32 ms	274.3–136.5 FPS
LocalGaussStyle (Proposed)	1.82–102.1 K	3.95–9.82 GB	3.65–7.98 ms	273.7–125.4 FPS

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, J.; Hwang, B.; Kim, J.; Lee, S.; Kim, S.; Sun, Y.; Kim, J. LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting. Electronics 2026, 15, 1018. https://doi.org/10.3390/electronics15051018

AMA Style

Kim J, Hwang B, Kim J, Lee S, Kim S, Sun Y, Kim J. LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting. Electronics. 2026; 15(5):1018. https://doi.org/10.3390/electronics15051018

Chicago/Turabian Style

Kim, Jeongho, Byungsun Hwang, Jinwook Kim, Seongwoo Lee, Soohyun Kim, Youngghyu Sun, and Jinyoung Kim. 2026. "LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting" Electronics 15, no. 5: 1018. https://doi.org/10.3390/electronics15051018

APA Style

Kim, J., Hwang, B., Kim, J., Lee, S., Kim, S., Sun, Y., & Kim, J. (2026). LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting. Electronics, 15(5), 1018. https://doi.org/10.3390/electronics15051018

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LocalGaussStyle: A Method for Localized Style Transfer on 3D Gaussian Splatting

Abstract

1. Introduction

2. Related Works

2.1. 3D Scene Representation

2.2. 3D Scene Stylization

3. Proposed Method for Localized Stylization

3.1. 3D Gaussian Splatting

3.2. Segmentation Model-Based Object Localization

3.3. Loss Function for Localized Style Transfer

3.4. Geometry-Decoupled Adaptive Densification

4. Simulation Results

4.1. Simulation Settings

4.2. Qualitative Evaluation for Localized Style Transfer

4.3. Quantitative Evaluation for Localized Style Transfer

4.4. Ablation Study and Analysis

4.5. Discussions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

DURC Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI