An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting

Li, Yang; Ren, He; Li, Yacong; Sui, Dong; Guo, Maozu

doi:10.3390/mca31020047

Open AccessArticle

An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting

by

Yang Li

^1,2,*,

He Ren

^1,2,

Yacong Li

³,

Dong Sui

^1,2 and

Maozu Guo

^1,2,*

¹

School of Intelligence Science and Technology, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

²

Beijing Key Laboratory for Intelligent Processing Methods of Architectural Big Data, Beijing University of Civil Engineering and Architecture, Beijing 102616, China

³

Beijing Academy of Artificial Intelligence, No. 150 Chengfu Road, Beijing 100089, China

^*

Authors to whom correspondence should be addressed.

Math. Comput. Appl. 2026, 31(2), 47; https://doi.org/10.3390/mca31020047

Submission received: 17 December 2025 / Revised: 11 February 2026 / Accepted: 7 March 2026 / Published: 11 March 2026

(This article belongs to the Special Issue Advances in Computational and Applied Mechanics (SACAM))

Download

Browse Figures

Versions Notes

Abstract

Cliff carvings, as significant art forms bearing historical, cultural, and religious connotations, face dual threats from natural weathering and human-induced damage. Their protection and restoration of the artistic style present pressing challenges. In recent years, the rapid advancement of digital technologies has offered new opportunities for preserving and reproducing cultural heritage. Particularly, 3D style transfer techniques are emerging as crucial tools for digital safeguarding. The advantages of three-dimensional style transfer in cultural heritage applications include dynamic stylized rendering, simulation of styles from multiple historical periods, alternative modes of exhibition, and facilitating a paradigm shift in conservation practices from static digital archiving to dynamic revitalization. This study proposes a novel 3D stylization method for cliff carvings by integrating 3D Gaussian Splatting (3DGS) and Nearest Neighbor Feature Matching (NNFM) loss metric. The method represents ancient cliff carvings as a set of optimizable 3D Gaussians representation, enabling efficient capture and processing of complex geometric structures and rich textural details. By integrating the textural and geometric characteristics of the target artistic style, 3DGS facilitates high-quality transfer of diverse artistic styles while effectively preserving the original intricate details of the carvings. Additionally, we employ the NNFM loss function to transfer 2D visual details into 3D representations while maintaining multi-perspective style consistency. Experimental results demonstrate that the proposed method exhibits significant advantages in texture fidelity, style consistency, and rendering efficiency. This research showcases the potential of our model for the digital preservation and presentation of cliff-carved cultural heritage, offering an innovative technological approach with theoretical value and practical significance.

Keywords:

3D style transfer; 3D Gaussian splatting; cliff carvings; cultural heritage

1. Introduction

Ancient Chinese cliff carvings and other cultural heritage bear profound historical, religious, and artistic significance. They are not only important witnesses of Chinese civilization but also important carriers of cultural heritage [1]. However, over time, these ancient artifacts have suffered from erosion, weathering, and human damage, and many of their original colors, textures, and fine details have gradually faded, making their restoration extremely challenging. Furthermore, due to the lack of detailed historical documentation, most of their original colors and styles have not been recorded and are difficult to verify. Therefore, effectively restoring and reproducing these color details has become a top priority for conservation work. In recent years, computer vision and 3D reconstruction technologies have been increasingly used in cultural heritage conservation, such as using neural radiance fields (NeRFs) to perform 3D reconstruction of cultural relics [2,3,4].

The Kongshuidong Cliff Carvings at Wanfotang in Fangshan District, Beijing, exemplify these challenges. As masterpieces of Tang Dynasty Buddhist art, they are both artistic creations and historical testaments to the value of their cultural heritage. However, relocation and renovation during the Ming Dynasty destroyed the original structure, and millennia of weathering have obscured some details. Restoring such ancient cultural heritage sites faces a common dilemma: direct intervention in the physical objects is often impractical, and hasty restoration may even cause further damage. Furthermore, the use of modern cement or synthetic materials may chemically react with the ancient materials, accelerating their deterioration. More importantly, a significant ethical debate remains: should damaged cliff carvings be restored to their presumed original state, or should they be preserved in their damaged state? Therefore, technologies that can non-invasively recreate the artifacts’ original appearance are particularly important [5].

This research proposes applying 3D style transfer techniques to cliff carvings. Unlike traditional restoration methods [6], 3D style transfer digitally reconstructs the appearance of ancient architecture and carvings while overcoming inherent limitations such as time-intensive labor and physical constraints [7]. Particularly valuable for cases lacking comprehensive historical documentation, this approach can endow ancient structures with diverse artistic styles—allowing for conjectures about their original aesthetic features, while also revitalizing them through contemporary stylistic interpretations. By leveraging style transfer algorithms, we preserve historical authenticity while infusing artworks with new cultural dimensions, revitalizing their artistic resonance within modern contexts. In short, this method enables dynamic stylization of cliff carvings, simulation of multi-period historical styles, and alternative presentations—breathing life into cultural relics while showcasing their “new” artistic reinterpretations.

Traditional 2D style transfer techniques struggle to capture 3D features [8,9], while traditional 3D modeling suffers from high computational costs and lacks real-time performance [10], making it difficult to meet the demands of virtual exhibitions, academic research, and public education. Furthermore, cultural heritage preservation requires cost-effective [3,5], environmentally friendly, and highly interactive technical solutions that balance historical authenticity with contemporary presentation requirements, ultimately achieving sustainable conservation.

Addressing these challenges, 3D Gaussian Splatting (3DGS) [11] provides an explicit and efficient 3D representation that has recently been applied to real-time rendering and stylization tasks [12]. In this study, we conduct an exploratory investigation into whether such techniques can be effectively applied to cliff-carved cultural heritage, which differs substantially from the generic scenes and objects considered in previous work [13]. Cliff carvings are characterized by large-scale, predominantly planar relief geometry, subtle depth variations, highly eroded high-frequency textures, and severely degraded chromatic information. These characteristics motivate a set of adaptation-oriented design choices rather than the direct reuse of existing pipelines. Specifically, we employ 3DGS to enable stable reconstruction from lightweight image acquisition, and integrate the Nearest Neighbor Feature Matching (NNFM) loss [14] to prioritize local texture correspondence over global statistical alignment. Furthermore, to bridge the gap between 2D style features and the explicit 3D Gaussian representation, we embed low-dimensional learnable features into each Gaussian and use a learnable affine transformation to map them into the VGG feature space. This configuration is chosen to efficiently handle large-scale relief surfaces while maintaining multi-view consistency. Through this exploratory adaptation, we aim to assess the feasibility and practical behavior of 3DGS-based style transfer in the context of cliff-carved cultural heritage.

This method outperforms traditional 3D techniques in handling complex textures and geometric structures, while also offering real-time rendering capabilities, making it particularly suitable for large-scale cultural heritage display and restoration projects. This research implements 3D style transfer on a dataset of cliff carvings from the Wanfo Hall in Kongshui Cave. This research further demonstrates that our 3D stylization method can provide effective technical support for non-invasive heritage preservation, the preservation of Chinese culture, and the virtual display of cultural relics.

The main contributions of this paper are as follows:

We propose a tailored 3D style transfer method for cliff carvings, integrating the 3D Gaussian Splatting (3DGS) model with the Nearest Neighbor Feature Matching (NNFM) loss function. This approach enables multi-period style simulation and dynamic stylized rendering, thereby enhancing the aesthetic expressiveness of cliff carvings while preserving their intricate historical details;
We optimize the 3DGS + NNFM pipeline to handle the unique planar geometry and high-frequency historical textures of cliff carvings by embedding low-dimensional features into 3D Gaussians representation and employing a learnable affine transformation, ensuring multi-view consistency and style fidelity;
We evaluate the proposed method on the Kongshuidong Cliff Carvings dataset, demonstrating its superior performance in 3D reconstruction and stylization. This provides an innovative solution for the digital preservation and interactive display of cultural heritage artifacts.

2. Related Work

2.1. 3D Reconstruction

The early digitalization of cultural heritage primarily relied on Light Detection and Ranging (LiDAR) [15,16] and photogrammetry techniques [17]. While capable of achieving high geometric accuracy, these methods suffered from drawbacks such as exorbitant equipment costs, difficulties in processing reflective surfaces, and limited real-time interactivity. The deep learning-based technology neural radiance fields (NeRFs) [18] demonstrates remarkable advantages in sparse-view 3D reconstruction by learning rich features to capture surface textures and other critical information of cultural relics. Valeria Croce et al. [19] validated NeRFs’ superiority over photogrammetry in completeness and material reconstruction using limited input images. However, neural radiance fields (NeRFs) suffer from drawbacks such as a slow training process [4,20], high rendering costs, and an insufficient capture of high-frequency details, hindering its practical applications [21]. The 3D Gaussian Splatting (3DGS) technique addresses the limitations of neural radiance fields by employing explicit [22], optimizable gaussian representations combined with tile-based rasterization technology, achieving real-time rendering while maintaining reconstruction precision [6,23].

As cliff carvings are carved from natural rock surfaces, their intricate texture details and carving techniques present significant reconstruction challenges, including capturing high-frequency variations from erosion-induced irregularities, handling planar yet complex geometries with subtle depth cues, and recovering sparse details from weathered and relocated artifacts. These challenges exceed those of standard datasets, where surfaces are often smoother or less historically degraded. 3DGS overcomes these by using adaptive Gaussian primitives that explicitly model anisotropic shapes and opacities, enabling precise density control to fill in eroded gaps and preserve fine-grained textures without the over-smoothing common in implicit methods like NeRFs. This makes 3DGS particularly suited for our dataset, achieving high-fidelity reconstruction that prior methods struggle with. Given that the 3DGS method enables high-fidelity reconstruction, this paper employs 3DGS to reconstruct damaged cliff carvings [12,24,25].

2.2. Style Transfer

Two-dimensional style transfer techniques have been widely applied in the field of virtual restoration for historic architecture. Liu et al. [26] achieved modernized restoration of dilapidated buildings by training an unsupervised CycleGAN network [27] using web-sourced image pairs of ruined and contemporary architectures. Although these methods have been relatively mature in two-dimensional applications, they still have certain limitations in three-dimensional applications due to their inability to ensure the consistency of viewing angles in three-dimensional scenes [10]. Therefore, effectively transferring diverse artistic styles to 3D targets remains a persistent research challenge in the fields of computer graphics and vision. Early 3D style transfer approaches relied on reconstructed geometry, with effectiveness constrained by underlying mesh quality and topology, struggling to maintain coherence in complex texture transfer [28].

The advent of NeRFs revolutionized 3D stylization research; Artistic Radiance Fields (ARFs) [14,29] pioneered combining NeRFs with Nearest Neighbor Feature Matching (NNFM) loss to emphasize local detail matching. The core contribution of this research lies in proposing the Nearest Neighbor Feature Matching (NNFM) loss function, which demonstrates superior capability in preserving stylistic detail features compared to the conventional Gram matrix loss function [30]. The HyperNet method [31] achieved arbitrary style transfer by encoding style information into multilayer perceptron (MLP) parameters yet exhibits notable deficiencies in rendering speed and detail quality. StyleRF [32] performed stylization in TensoRF [33] feature space, achieving partial view consistency. However, NeRFs suffer from inherent limitations including slow rendering speeds, constrained detail capture capabilities, and computationally intensive optimization processes. Consequently, these NeRF-based 3D stylization methods exhibit distortion artifacts when processing complex textures such as cliff carvings.

To address the limitations of NeRFs, researchers have proposed the 3D Gaussian Splatting model [2]. Three-dimensional Gaussian Splatting achieves real-time, high-fidelity rendering with explicit geometric control, overcoming NeRFs’ limitations in speed, detail preservation, and structured texture handling [9]. Consequently, numerous researchers have proposed 3DGS-based stylization methods to overcome the limitations of NeRF. StyleGaussian [34] explored global style transfer in the 3DGS framework using Adaptive Instance Normalization (AdaIN) [35], inheriting 3DGS’s efficiency and rendering quality. However, AdaIN’s global statistical alignment tends to produce homogenized stylization [36], while simultaneously attenuating high-frequency details and blunting geometric edges—making it suboptimal for cultural artifacts requiring precise preservation of historical traces and artistic features [37].

In this context, our choice of integrating NNFM with 3DGS is specifically optimized for cliff carving datasets, rather than a simplistic amalgamation. Cliff carvings feature high-frequency, eroded textures and irregular geometries from natural weathering, which demand local detail preservation to avoid artifacts like edge blunting or texture homogenization seen in AdaIN-based methods. NNFM is selected because it matches features at a pixel-level granularity using cosine distance, ensuring multi-perspective consistency and fidelity in transferring styles to these intricate surfaces—challenges not adequately addressed by global losses. Furthermore, we embed low-dimensional features into 3D Gaussians with a learnable affine transformation to bridge the modality gap, enabling efficient rendering of high-dimensional VGG features while preserving original nuances critical for heritage applications.

Current research on 3D style transfer for heritage conservation remains limited, with most studies focusing on generic datasets like Mip-NeRF 360 [38]. This work bridges this gap by adapting 3D style transfer technology to cultural heritage, specifically optimizing performance for ancient relief stylization. This research pioneered the application of three-dimensional stylization technology to historical relics, which is conducive to the dynamic stylized display of relics. Our method achieves artistic reproduction while retaining the original characteristics of the relics, avoiding the risks of physical intervention and expanding the possibilities of cultural communication and educational innovation.

3. Methods

3.1. 3D Gaussian Splatting

For cliff carvings exhibiting high-frequency textural details and predominantly planar geometric structures, 3D Gaussian Splatting (3DGS) [11] effectively captures both intricate surface patterns and subtle geometric variations through its adaptive Gaussian primitives. The method’s fast training and efficient rendering make it particularly suitable for small to medium-sized unordered image collections. Even with limited input images, 3DGS can produce high-fidelity reconstructions that meet the stringent detail recovery requirements essential in cliff carving preservation.

Gaussian Splatting can be regarded as a point cloud representation method. The overview structure of 3DGS is shown in Figure 1. Each 3D scene is represented by a collection of tens of thousands to millions of Gaussian distributions, with each Gaussian kernel responsible for capturing the geometric and appearance characteristics of a localized region within the scene. Specifically, each Gaussian distribution is defined by its mean

μ

(position), covariance matrix

Σ

(shape and orientation), opacity, and spherical harmonic coefficients for view-dependent color representation. Unlike traditional implicit methods, such as neural radiance fields (NeRFs), which rely on neural networks and volume rendering, 3DGS adopts an explicit representation, enabling efficient real-time rendering while maintaining visual quality.

Since the covariance matrix must be positively semi-definite to meet the requirements of differentiable optimization, it is decomposed into a rotation matrix R and a scaling matrix

S \in R^{3}

. The covariance matrix

Σ

is given by

Σ = R S S^{T} R^{T}

(1)

where R is the rotation matrix derived from a quaternion and S is the scaling matrix constructed from a 3D scaling vector.

This decomposition is designed to ensure the covariance matrix remains positive semi-definite during optimization, allowing for stable and differentiable updates in gradient-based training. In this study, it plays a crucial role by enabling anisotropic modeling of Gaussian shapes, which is essential for accurately representing the irregular, eroded surfaces and subtle depth variations in cliff carvings, thereby improving reconstruction fidelity over isotropic alternatives.

During rendering, 3DGS employs a tile-based parallel rasterizer that supports anisotropic splatting, visibility-aware ordering, and

α

-blending. The color of a pixel is computed by blending the contributions of all Gaussians along the line of sight:

C = \sum_{i \in N} c_{i} α_{i} \prod_{j - 1}^{i - 1} (1 - α_{j})

(2)

where

c_{i}

denotes the color of the i-th Gaussian and

α_{i}

denotes its opacity.

This alpha-blending formula is designed to simulate volumetric rendering efficiently without ray marching, using sorted Gaussians for ordered compositing. In our research, it facilitates real-time rendering of complex cliff carving scenes, allowing for interactive visualization and style transfer previews, which is vital for cultural heritage applications where rapid iteration on stylized models is needed.

Additionally, each Gaussian distribution contains opacity

α

and view-dependent color values represented by spherical harmonic coefficients. Once initialization is completed, these Gaussians are projected onto a 2D plane for rendering. This process is executed by a differentiable tile rasterizer. During optimization, density control is applied to these Gaussians, specifically involving the continuous removal of outdated Gaussians and the addition of novel Gaussians to the scene to maintain image quality. This is performed to ensure that the density of Gaussians remains approximately uniform across all regions of the scene, particularly in areas initially empty during Structure-from-Motion (SfM) [39] initialization. The 3D scene is optimized with respect to the following loss function:

L = (1 - λ) L_{1} + λ L_{D - S S I M}, λ = 0.2

(3)

This weighted loss function is designed to balance pixel-level accuracy (

L_{1}

) with perceptual quality (D-SSIM), where

λ

= 0.2 empirically prioritizes structural similarity. In this study, it ensures high-fidelity optimization of cliff carving reconstructions, preserving fine textural details from erosion while enabling seamless integration with subsequent style transfer steps, thus supporting the overall pipeline for dynamic heritage preservation.

3.2. Nearest Neighbor Feature Matching Loss

The Nearest Neighbor Feature Matching (NNFM) method was introduced by Artistic Radiance Fields (ARFs) [14] to fuse 3D scenes with style images. In this approach, both the style image and rendered scene image are encoded using a pre-trained VGG network [8,40]. The NNFM loss function transfers complex, high-frequency visual details from 2D style images to 3D scenes while effectively preserving local texture details.

Formally, let

F_{s}

and

F_{r}

denote the feature maps extracted from the style image and rendered image, respectively, where

F (i, j)

represents the feature vector at pixel coordinates

(i, j)

. The loss function is defined as

L_{N N F M} (F_{r}, F_{s}) = \frac{1}{N} \sum_{i, j} m i n D (F_{r} (i, j), F_{s} (i^{'}, j^{'}))

(4)

where N is the total number of pixels in the rendered image, and

D (u_{1}, u_{2})

denotes the cosine distance between vectors

u_{1}

and

u_{2}

. This mechanism effectively aligns the features of the rendered scene with those of the style image, promoting detailed and consistent style transfer. Figure 2 illustrates the computational process of the NNFM loss.

Specifically, our method uses a pre-trained VGG network to extract deep convolutional features from rendered images of a 3D Gaussian scene and reference style images, typically in intermediate layers, to capture multi-scale semantic and textural information. For each feature vector in the rendered feature map, denoted as

F_{r} (i)

, a nearest neighbor search is performed on all feature vectors in the style feature map

F_{s} (j)

by minimizing the cosine distance. This effectively aligns local blocks through content-adaptive correspondences, preserving structural fidelity while injecting style attributes. The aggregation loss is calculated as the average of these minimum distances over all pixels, thus achieving gradient-based optimization. The spherical harmonic coefficients of the segmented Gaussian are selectively fine-tuned to achieve consistent and artifact-free stylization from a new perspective.

3.3. 3D Style Transfer

Our integration of 3DGS with NNFM is tailored to the unique challenges of cliff carvings, such as their high-frequency textural details from erosion and complex planar geometries, rather than a straightforward combination of existing techniques. 3DGS excels in efficiently representing these structures through optimizable Gaussians, but alone may not ensure style consistency across views. We select NNFM over alternatives like AdaIN because it prioritizes local nearest neighbor matching, which preserves the intricate, historically significant details (e.g., subtle carvings and weathered patterns) that global methods tend to oversmooth. This optimization enables high-fidelity style transfer while maintaining multi-view coherence, critical for immersive heritage visualization.

Figure 3 illustrates the overall architecture of our method: multi-view cliff carving images are reconstructed into a 3D scene via 3D Gaussian Splatting, 2D VGG [8] features are embedded into the reconstructed Gaussians, and iterative optimization through the Nearest Neighbor Feature Matching loss yields stylized 3D Gaussian representations.

In 2D image style transfer, intermediate features from VGG network layers have been empirically validated as effective representations for texture and style attributes [40]. However, for 3D scenes, a fundamental modality gap exists between the explicit geometric representation of 3D Gaussians (position

μ_{p}

and covariance

Σ_{p}

) and the abstract semantics of style images (e.g., brushstrokes and textures). Consequently, the scalar color attribute

c_{p}

intrinsically lacks the capacity to encapsulate deep stylistic features. Direct execution of style transfer computations on Gaussian primitives is therefore infeasible.

To address this challenge, the reconstructed 3D scene representation must be mapped into a space corresponding to VGG feature dimensions. However, direct rendering of high-dimensional VGG features (e.g., 256D for ReLU3_1 and 512D for ReLU4_1) [41] exceeds typical GPU memory constraints. We therefore adopt the efficient feature rendering strategy proposed by StyleGaussian [34]. This approach first renders low-dimensional features and then maps them into a high-dimensional space.

Specifically, augmenting the original Gaussian point definition

G = \{μ_{p}, Σ_{p}, α_{p}, c_{p}\}

, each Gaussian point is assigned an additional learnable low-dimensional feature parameter

f_{p}^{'} \in R^{D^{'}}

, where

D^{'} = 32

. Utilizing the standard Gaussian Splatting rendering process, we obtain a pixel-level low-dimensional feature map

F^{'}

:

F^{'} = \sum_{i \in N} f_{i} ω_{i} .

(5)

where

ω_{i} = α_{i} \prod_{j = 1}^{i = 1} (1 - α_{j})

(6)

Subsequently, a learnable affine transformation T is designed to map the rendered low-dimensional feature

F^{'}

into the target high-dimensional VGG feature space

F \in R^{D} (D = 256)

:

F = T (F^{'}) = A F^{'} + \sum_{i \in N} b ω_{i}

(7)

where

A \in R^{D \times D^{'}}

is the linear transformation matrix and

b \in R^{D}

is the bias vector. This mapping is optimized via the loss function:

min_{A, b, \{f_{p}\}} \sum ∥F - F_{g t}∥

(8)

where

- F_{g t}

represents the target features derived from the relevant VGG layer (in practice, features extracted by VGG from the original input views serve as a practical approximation for

- F_{g t}

). It is important to note that the mapping defined by Equation (9) is mathematically equivalent to applying the affine transformation

T (f_{i}^{'}) = A f_{i}^{'} + b

to each Gaussian point’s low-dimensional feature

f_{i}^{'}

prior to rendering:

F = \sum_{i \in N} T (f_{i}) ω_{i} = \sum_{i \in N} (A f_{i}^{'} + b) ω_{i}

(9)

Equation (9) demonstrates that the high-dimensional feature F is derivable directly through per-point affine transformation of each Gaussian’s low-dimensional feature:

f_{p} = T^{'} (f_{p}^{'}) = A f_{p}^{'} + b

. The paramount advantage of this design lies in its memory efficiency: rendering high-dimensional features

(D = 256)

is reduced to rendering low-dimensional features

(D^{'} = 32)

. Memory consumption is thereby diminished to approximately 12.5% (32/256) of directly high-dimensional rendering, while preserving full feature expressivity. This establishes the groundwork for subsequent efficient style transfer computation. Having obtained the VGG-embedded 3D Gaussian representation (where each point possesses

f_{p} = T^{'} (f_{p}^{'})

), style transfer can be executed using arbitrary style images. While AdaIN [35], which is a prevalent technique in 2D and selective 3D methods, performs style transfer by globally aligning channel-wise mean and variance statistics between content and style features, our empirical analysis reveals that AdaIN’s global statistical alignment induces homogenized stylistic effects on cliff carving data. This manifests as blurring of originally intricate high-frequency textures and sculptural edges (geometric details), termed the flattening effect. Therefore, we employ the NNFM loss [14] to govern the style transfer process. Equation (10) follows the general definition given in Equation (4) and is combined with the specific formulation of Frender in Equation (11). This loss operates between the VGG-extracted content features from the rendered representation

F_{r e n d e r}

and the style image features

F_{s t y l e}

:

l_{n n f m} = \frac{1}{N} \sum_{i, j} min_{i, j} D (F_{r e n d e r} (i, j), F_{s t y l e} (i^{'}, j^{'})),

(10)

F_{r e n d e r} = \sum_{i \in N} \underset{︸}{(A f_{i} + b)} ω_{i}

(11)

The NNFM loss enforces localized feature matching by minimizing the distance between each feature vector

F_{r e n d e r} (i, j)

and its nearest neighbor

F_{s t y l e} (i^{'}, j^{'})

within the style feature map. This mechanism facilitates the effective transfer of intricate, high-frequency visual details from the 2D style image to the 3D Gaussian-parameterized scene while ensuring multi-view consistency. It surmounts the global averaging limitations inherent to AdaIN, proving particularly adept at preserving critical cultural and geometric detail fidelity in cliff carvings. Figure 2 illustrates the NNFM loss computation process.

4. Results

4.1. Dataset

The research data derives from the Wanfotang in Hebei Town, Fangshan District, Beijing. Situated northwest of Wanfotang Village, this National Key Cultural Relics Protection Unit dates back to 770 AD (the fifth year of the Dali era, Tang Dynasty), approximately 1250 years ago. The centerpiece of Wanfotang is the Manjusri, Samantabhadra, and Ten Thousand Buddhas Assembly relief, embedded across the main hall’s front and side walls. This masterpiece measures 23.8 m long and 2.47 m high and is composed of 31 rectangular white marble slabs. The central figure is Shakyamuni Buddha. The upper three tiers are carved with one thousand Buddha statues, and some damaged inscriptions of prayers remain on the south wall [42].

This study employs the Wanfotang cliff carving as the primary case for 3D reconstruction and style transfer research. Figure 4 illustrates the temple’s architectural exterior and the carving’s current condition, revealing significant surface deterioration. Millennia of weathering and historical disruption have substantially degraded its original chromatic features and artistic style. This dataset presents unique challenges for 3D reconstruction, including high-frequency textural details from eroded natural rock surfaces, irregular geometric variations due to historical damage and relocation, and sparse chromatic information from faded pigments—issues not commonly addressed in generic datasets like Mip-NeRF 360. This phenomenon of physical deterioration and loss of visual information caused by the passage of time is also commonly observed in precious cultural heritage such as the Terracotta Army and ancient murals—their surface pigments all face varying degrees of fading. In response to the various challenges in preserving endangered cultural relics, computer vision and artificial intelligence technologies can provide viable solutions.

During the data preparation phase, the cliff carvings were photographed using a consumer-grade mobile device (iPhone 13 Pro Max, Apple Inc., Cupertino, CA, USA). Images were captured from multiple viewpoints with an approximate overlap of 60–80% to ensure sufficient coverage and redundancy. After data screening, 97 images meeting the reconstruction requirements were selected for subsequent processing.

Although the images were acquired using a handheld mobile phone, the acquisition process was carefully controlled to ensure reliable reconstruction quality. During image capture, key camera parameters, including ISO (6400), aperture (f/4.5), and shutter speed (1/120 s), were fixed to maintain consistent exposure and imaging conditions across views. White balance and focus were also locked to reduce photometric variations. This controlled setup helps minimize imaging inconsistencies that could otherwise affect the reconstruction and subsequent stylization results. The use of a mobile device reflects a low-cost and accessible acquisition workflow suitable for practical cultural heritage documentation, rather than a replacement for professional surveying or restoration-grade measurement methods.

The selected images were processed using a Structure-from-Motion (SfM) pipeline implemented with COLMAP [39]. COLMAP supports automated multi-view geometric reconstruction, including feature extraction, camera pose estimation, and sparse and dense point cloud reconstruction. Before optimizing the 3D Gaussian representation, we use COLMAP to estimate camera poses and reconstruct a sparse point cloud as the input to 3DGS. The reconstructed point cloud is shown in Figure 5. Its incremental SfM strategy, combined with robust triangulation and view selection, improves reconstruction robustness for unordered multi-view image collections such as cliff carvings. The reconstructed data were exported in the Local Light Field Fusion (LLFF) [43] format, providing high-quality input for subsequent 3D Gaussian Splatting and style transfer. Figure 6 illustrates the overall reconstruction quality and local details obtained using this pipeline.

For style transfer, style images were sourced from the WikiArt dataset, predominantly featuring abstract-style artworks. These images are used as visual style references to evaluate the capability of the proposed method across diverse artistic appearances.

To evaluate the proposed method, experiments were conducted using diverse style images to test its capability in handling various artistic styles. Furthermore, a comparative analysis was performed against several state-of-the-art methods, including StyleRF [32], Artistic Radiance Fields (ARFs) [14], and StyleGaussian [34]. StyleRF employs TensoRF for radiance field representation, rendering 2D feature maps that are subsequently transformed and decoded for stylization. ARFs utilize NeRFs for 3D scene representation and apply VGG-extracted content and style features together with the NNFM loss for style transfer. StyleGaussian adopts 3D Gaussian Splatting for scene representation and performs style transfer using an AdaIN-based feature alignment strategy.

All experiments were conducted on a workstation equipped with an NVIDIA RTX 4090D, 24 GB. The implementation was developed using PyTorch 3.8 with CUDA 11.3 acceleration. For the Kongshuidong Cliff Carvings dataset, 3D reconstruction using 3D Gaussian Splatting required approximately 10 min for optimization. The subsequent 3D style transfer process took approximately 120 min per style, including feature embedding and optimization. Once optimization was completed, novel view rendering and visualization could be performed in real time at interactive frame rates approximately 120 FPS.

4.2. Quantitative Results

Evaluating style transfer tasks inherently involves a degree of subjectivity, and a unified quantitative metric is currently lacking within the field. In 3D style transfer research, multi-view consistency serves as a common evaluation perspective. The quantitative comparison results are presented in Table 1. To assess multi-view consistency, two stylized viewpoint images were first selected. Using the softmax splatting technique [44], the image from one viewpoint was warped to the other viewpoint. Masked LPIPS [45] and RMSE (Root Mean Squared Error) scores were then computed within the overlapping regions. LPIPS (Learned Perceptual Image Patch Similarity) is a metric designed to measure the perceptual similarity between two images. Unlike traditional pixel-wise error metrics, LPIPS is implemented via a deep neural network that has been trained on large-scale datasets to capture human perceptual judgments. The model takes two images as input and outputs a similarity score, with lower values indicating higher perceptual resemblance. This approach allows LPIPS to better align with human visual perception by accounting for texture, structure, and semantic content, making it particularly suitable for evaluating visually oriented tasks such as style transfer and image synthesis. The combined LPIPS [45] (lower is better) and RMSE (lower is better) scores provide a measure of multi-view consistency.

We report both Short-range Consistency and Long-range Consistency to evaluate multi-view stability under different frame intervals [9]. Short-range consistency is computed on multiple adjacent frame pairs, while long-range consistency is computed on multiple frame pairs separated by 10 frames in the input sequence. For each method and setting, the final consistency scores are obtained by averaging the results over all selected frame pairs. Experimental results demonstrate that our method achieves superior multi-view consistency metrics on the cliff carvings dataset compared to ARF, StyleRF, and StyleGaussian. The ARF and StyleRF method, relying on NeRF-based reconstruction, exhibits significantly weaker reconstruction performance than methods based on 3DGS, which is reflected in these results.

4.3. Qualitative Results

Qualitative results are presented in Figure 7. Our method demonstrates outstanding performance in the 3D style transfer task for cliff carvings. In contrast, the ARF method, constrained by the implicit representation of neural radiance fields, struggles to guarantee multi-view consistency when processing cliff carvings data, frequently exhibiting missing viewpoints, artifacts, and discontinuous 3D scenes. The StyleRF method yields improved stylization results over ARFs but still suffers from significant detail loss and artifacts stemming from underlying reconstruction issues, leading to view inconsistency. StyleGaussian, which is also based on 3DGS, generally achieves reasonable reconstruction quality with fewer geometric artifacts. However, since its style transfer relies on AdaIN global feature alignment, it tends to produce overly homogenized appearance transfer, which weakens local texture expression and results in less faithful stylization on the fine-grained carved details. Our method produces continuous 3D scenes, effectively ensures multi-view consistency, and successfully captures and preserves the rich textural details of the cliff carvings. As shown in Figure 7, in the stylized output of the Manjusri, Samantabhadra, and Ten Thousand Bodhisattvas Assembly, intricate details of hundreds of Buddha heads remain discernible.

The results are intended as exploratory, interpretative visualizations rather than semantically accurate color restorations. Due to the lack of verifiable historical pigment information, artistic style images are used as external color guidance to demonstrate the capability of the proposed method to perform geometry-aware colorization and stylization on complex cliff carvings.

We further investigated the performance of our method when using monochromatic images as style references. Conventional 3D style transfer approaches typically suffer from significant texture detail loss when processing single-color style images. In contrast, our method successfully transfers the color characteristics from monochromatic style images to the cliff carving model while remarkably preserving rich texture details. Figure 8 demonstrates the comparative results of 3D style transfer guided by monochromatic style images.

4.4. Ablation Studies

Ablation studies focused on two key aspects:

Impact of 3DGS Reconstruction: We investigated the effect of the underlying 3D reconstruction representation on the stylization outcome for cliff carvings. A comparison with the ARF method shows that although both approaches employ the NNFM loss, our method—built upon 3D Gaussian Splatting (3DGS)—significantly outperforms ARFs, which relies on NeRFs, in terms of multi-view consistency, scene continuity, and fine-grained detail preservation. As illustrated in Figure 9, under identical style transfer settings, the NeRF-based reconstruction exhibits noticeable artifacts and view inconsistencies, whereas the 3DGS-based reconstruction produces more stable, continuous, and artifact-free stylized results.

Role of NNFM Loss:We further examined the role of the NNFM loss in balancing content preservation and style application. StyleGaussian also adopts 3DGS for scene representation and thus achieves comparable reconstruction quality and scene continuity. However, it employs the AdaIN loss for style transfer. For cliff carvings, which are characterized by intricate high-frequency textures and relatively flat geometric structures, AdaIN’s global statistical alignment mechanism tends to produce homogenized stylization effects, leading to the loss of carved texture details, as shown in Figure 9. In contrast, the NNFM loss establishes localized feature correspondences between the rendered views and the style image, enabling effective style transfer while better preserving the original carved details and textural characteristics. This comparison demonstrates the positive impact of NNFM loss on 3D style transfer for cliff carvings.

4.5. Additional Results

To further validate the generalizability of the proposed 3D style transfer method within the domain of cultural heritage preservation, supplementary experiments were conducted on other representative artifacts—namely, a Buddha statue and a bronze incense burner. These artifacts also commonly suffer from issues such as weathering and damage and exhibit complex geometric structures and intricate surface decorative features. By applying the identical style transfer framework, this study aims to evaluate the adaptability of the method to diverse artifact morphologies.

The experimental settings for these supplementary experiments remained consistent with the main experiment, encompassing both data acquisition and processing procedures. The objects under study were a Seated Shakyamuni Buddha statue figurine and a bronze incense burner manufactured during the Zhengtong reign of the Ming Dynasty. Their original images (captured via smartphone) are presented in Figure 10.

We provide the 3D reconstruction result of the Buddha statue using 3D Gaussian Splatting (3DGS) in Figure 11, along with the subsequent style transfer results derived from this reconstruction in Figure 12. Similarly, the 3D reconstruction result of the incense burner and its corresponding style transfer results are shown in Figure 13 and Figure 14, respectively.

To improve the interpretability and practical relevance of the experimental results, we additionally conduct a 3D style transfer experiment on a cliff carving dataset containing densely distributed and clearly recognizable small Buddha head motifs. Compared to large-scale cliff carving scenes, these repeated and semantically meaningful elements provide more intuitive visual cues for assessing stylization behavior.

By applying the proposed method to this dataset, we examine whether stylistic patterns can be consistently transferred while preserving the geometric integrity and repetition of the Buddha head structures. The results demonstrate that the proposed approach maintains stable multi-view consistency and coherent stylization across fine-grained iconographic details. The corresponding results are shown in Figure 15 and Figure 16.

5. Conclusions

This study has introduced a novel 3D style transfer framework for cliff carvings by integrating 3D Gaussian Splatting (3DGS) with a Nearest Neighbor Feature Matching (NNFM) loss. The proposed method addresses two key challenges in cultural heritage digitization: preserving fine-grained relief details while maintaining robust multi-view consistency. By combining an explicit 3DGS representation with NNFM-based localized feature correspondence, our approach enables controllable and spatially precise stylization, transferring diverse artistic appearances onto digitized carvings while preserving their geometric structure and salient surface textures. This capability supports interactive, real-time stylized visualization and provides a flexible tool for exploring multiple stylistic interpretations of heritage artifacts.

The core advantage of the proposed method lies in leveraging the real-time rendering capability and strong multi-view consistency of 3DGS for high-quality reconstruction, together with low-dimensional feature embedding and a learnable affine mapping to bridge the modality gap to the VGG feature space. To accommodate the relatively shallow depth variations common in cliff carvings, we employ the NNFM loss to establish local feature correspondences between the rendered feature field and the style image, thereby balancing content preservation and style application. This design allows the method to retain crucial textural cues while injecting style-specific patterns in a spatially localized manner.

It should be emphasized that the stylized color results presented in this work are not intended as definitive historical reconstructions. Given that pigments on many cliff carvings have severely degraded and cannot be reliably verified, this work focuses on demonstrating the feasibility of 3D style transfer for stone reliefs under realistic documentation conditions, and the results should be interpreted as hypothesis-generating visualizations. In future work, we plan to incorporate evidence-based residual pigment information by integrating scientific analysis (e.g., hyperspectral imaging and material analysis), moving toward archaeologically grounded recoloration when such evidence is available.

Limitations and future work. First, NNFM-based stylization can be sensitive to photometric inconsistencies in in-the-wild captures. Persistent shadows in concave grooves may bias the reconstructed renderings toward low-luminance appearances; under cosine-similarity NNFM with hard nearest-neighbor assignments in the VGG feature space, such shadow-biased and low-discriminability features may repeatedly match a small subset of high-response style features, producing spatially concentrated chromatic outliers (e.g., unexpected red/blue patches). Second, NNFM may locally amplify high-frequency variations in textured regions while favoring smoother correspondences in low-texture areas, which can lead to visually noticeable over-transfer (noise) or under-transfer (over-smoothing) in challenging cases. Third, the efficacy of VGG-based feature extraction may diminish for overly complex style images, sometimes biasing the transfer toward dominant color patterns. Finally, large-scale carving scenes increase the number of Gaussians, raising memory usage, computational cost, and training time. Future research will focus on (i) photometric calibration and exposure/white-balance balancing during acquisition, (ii) shadow-aware or luminance-normalized feature matching, (iii) softer matching strategies (e.g., kNN aggregation) and chroma/gamut regularization to suppress implausible colors, and (iv) more efficient stylization pipelines suitable for larger datasets and broader artifact categories (e.g., sculptures, polychrome objects, and reliefs).

Author Contributions

Methodology, Y.L. (Yang Li); software, Y.L. (Yang Li); validation, H.R. and Y.L. (Yang Li); formal analysis, Y.L. (Yang Li) and D.S.; investigation, H.R.; resources, M.G. and Y.L. (Yang Li); data curation, M.G.; writing—original draft, H.R.; writing—review and editing, M.G., H.R., Y.L. (Yang Li), Y.L. (Yacong Li), and D.S.; supervision, Y.L. (Yang Li) and D.S.; project administration, Y.L. (Yang Li); funding acquisition, M.G. and Y.L. (Yang Li). All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China, grant numbers 62101022, 62271036; the Pyramid Talent Training Project of Beijing University of Civil Engineering and Architecture, grant number JDYC20220818; and the Young Teachers Research Ability Enhancement program of Beijing University of Civil Engineering and Architecture, grant number X21083.

Data Availability Statement

The dataset underlying the findings of this study is available from the author Y.L. (Yang Li) upon reasonable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Tweed, C.; Sutherland, M. Built cultural heritage and sustainable urban development. Landsc. Urban Plan. 2007, 83, 62–69. [Google Scholar] [CrossRef]
Dahaghin, M.; Castillo, M.; Riahidehkordi, K.; Toso, M.; Del Bue, A. Gaussian heritage: 3D digitization of cultural heritage with integrated object segmentation. arXiv 2024, arXiv:2409.19039. [Google Scholar] [CrossRef]
Jamil, O.; Brennan, A. Immersive heritage through gaussian splatting: A new visual aesthetic for reality capture. Front. Comput. Sci. 2025, 7, 1515609. [Google Scholar] [CrossRef]
Mazzacca, G.; Karami, A.; Rigon, S.; Farella, E.; Trybala, P.; Remondino, F. Nerf for heritage 3D reconstruction. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2023, 48, 1051–1058. [Google Scholar] [CrossRef]
Siliutina, I.; Tytar, O.; Barbash, M.; Petrenko, N.; Yepyk, L. Cultural preservation and digital heritage: Challenges and opportunities. Amazon. Investig. 2024, 13, 262–273. [Google Scholar] [CrossRef]
Samavati, T.; Soryani, M. Deep learning-based 3D reconstruction: A survey. Artif. Intell. Rev. 2023, 56, 9175–9219. [Google Scholar] [CrossRef]
Mandujano, R.; Maria, G. Integration of historic building information modeling and valuation approaches for managing cultural heritage sites. In Proceedings of the 27th Annual Conference of the International Group for Lean Construction (IGLC), Dublin, Ireland, 1–7 July 2019; pp. 1433–1444. [Google Scholar]
Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 2414–2423. [Google Scholar]
Huang, H.-P.; Tseng, H.-Y.; Saini, S.; Singh, M.; Yang, M.-H. Learning to stylize novel views. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2021; pp. 13869–13878. [Google Scholar]
Mu, F.; Wang, J.; Wu, Y.; Li, Y. 3D photo stylization: Learning to generate stylized novel views from a single image. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2022; pp. 16273–16282. [Google Scholar]
Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3D gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
Fei, B.; Xu, J.; Zhang, R.; Zhou, Q.; Yang, W.; He, Y. 3D gaussian splatting as new era: A survey. IEEE Trans. Vis. Comput. Graph. 2024, 31, 4429–4449. [Google Scholar] [CrossRef] [PubMed]
Chen, D.; Li, H.; Ye, W.; Wang, Y.; Xie, W.; Zhai, S.; Wang, N.; Liu, H.; Bao, H.; Zhang, G. Pgsr: Planar-based gaussian splatting for efficient and high-fidelity surface reconstruction. IEEE Trans. Vis. Comput. Graph. 2024, 31, 6100–6111. [Google Scholar] [CrossRef]
Zhang, K.; Kolkin, N.; Bi, S.; Luan, F.; Xu, Z.; Shechtman, E.; Snavely, N. Arf: Artistic radiance fields. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 717–733. [Google Scholar]
Alshawabkeh, Y.; Baik, A.; Miky, Y. Integration of laser scanner and photogrammetry for heritage bim enhancement. ISPRS Int. J. Geo-Inf. 2021, 10, 316. [Google Scholar] [CrossRef]
Reutebuch, S.E.; Andersen, H.-E.; McGaughey, R.J. Light detection and ranging (lidar): An emerging tool for multiple resource inventory. J. For. 2005, 103, 286–292. [Google Scholar] [CrossRef]
Raj, T.; HanimHashim, F.; BaseriHuddin, A.; Ibrahim, M.F.; Hussain, A. A survey on lidar scanning mechanisms. Electronics 2020, 9, 741. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Croce, V.; Billi, D.; Caroti, G.; Piemonte, A.; De Luca, L.; Véron, P. Comparative assessment of neural radiance fields and photogrammetry in digital heritage: Impact of varying image conditions on 3D reconstruction. Remote Sens. 2024, 16, 301. [Google Scholar] [CrossRef]
Zhang, K.; Riegler, G.; Snavely, N.; Koltun, V. Nerf++: Analyzing and improving neural radiance fields. arXiv 2020, arXiv:2010.07492. [Google Scholar]
Chen, R.; Zhao, J.; Zhang, F.-L.; Chalmers, A.; Rhee, T. Neural radiance fields for dynamic view synthesis using local temporal priors. In International Conference on Computational Visual Media; Springer: Singapore, 2024; pp. 74–90. [Google Scholar]
Lee, J.C.; Rho, D.; Sun, X.; Ko, J.H.; Park, E. Compact 3D gaussian representation for radiance field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 21719–21728. [Google Scholar]
Zhu, H.; Zhang, Z.; Zhao, J.; Duan, H.; Ding, Y.; Xiao, X.; Yuan, J. Scene reconstruction techniques for autonomous driving: A review of 3D gaussian splatting. Artif. Intell. Rev. 2024, 58, 30. [Google Scholar] [CrossRef]
Liu, H.; Liu, B.; Hu, Q.; Du, P.; Li, J.; Bao, Y.; Wang, F. A review on 3D gaussian splatting for sparse view reconstruction. Artif. Intell. Rev. 2025, 58, 215. [Google Scholar] [CrossRef]
Wu, T.; Yuan, Y.-J.; Zhang, L.-X.; Yang, J.; Cao, Y.-P.; Yan, L.-Q.; Gao, L. Recent advances in 3D gaussian splatting. Comput. Vis. Media 2024, 10, 613–642. [Google Scholar] [CrossRef]
Liu, K.-H.; Liu, T.-J.; Wang, C.-C.; Liu, H.-H.; Pei, S.-C. Modern architecture style transfer for ruin or old buildings. In 2019 IEEE International Symposium on Circuits and Systems (ISCAS); IEEE: New York, NY, USA, 2019; pp. 1–5. [Google Scholar]
Zhu, J.-Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017; pp. 2223–2232. [Google Scholar]
Wang, Z.; Zhao, L.; Xing, W.; Lu, D. Glstylenet: Higher quality style transfer combining global and local pyramid features. arXiv 2018, arXiv:1811.07260. [Google Scholar] [CrossRef]
Li, W.; Wu, T.; Zhong, F.; Oztireli, C. Arf-plus: Controlling perceptual factors in artistic radiance fields for 3D scene stylization. In 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV); IEEE: New York, NY, USA, 2025; pp. 2301–2310. [Google Scholar]
Yu, A.; Li, R.; Tancik, M.; Li, H.; Ng, R.; Kanazawa, A. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: New York, NY, USA, 2021; pp. 5752–5761. [Google Scholar]
Chiang, P.-Z.; Tsai, M.-S.; Tseng, H.-Y.; Lai, W.-S.; Chiu, W.-C. Stylizing 3D scene via implicit representation and hypernetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: New York, NY, USA, 2022; pp. 1475–1484. [Google Scholar]
Liu, K.; Zhan, F.; Chen, Y.; Zhang, J.; Yu, Y.; ElSaddik, A.; Lu, S.; Xing, E.P. Stylerf: Zero-shot 3D style transfer of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2023; pp. 8338–8348. [Google Scholar]
Chen, A.; Xu, Z.; Geiger, A.; Yu, J.; Su, H. Tensorf: Tensorial radiance fields. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 333–350. [Google Scholar]
Liu, K.; Zhan, F.; Xu, M.; Theobalt, C.; Shao, L.; Lu, S. Stylegaussian: Instant 3D style transfer with gaussian splatting. In SIGGRAPH Asia 2024 Technical Communications; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–4. [Google Scholar]
Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision; IEEE: New York, NY, USA, 2017. [Google Scholar]
Kotovenko, D.; Grebenkova, O.; Sarafianos, N.; Paliwal, A.; Ma, P.; Poursaeed, O.; Mohan, S.; Fan, Y.; Li, Y.; Ranjan, R.; et al. Wast-3D: Wasserstein-2 distance for scene-to-scene stylization on 3D gaussians. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2024; pp. 298–314. [Google Scholar]
Zhou, S.; Chang, H.; Jiang, S.; Fan, Z.; Zhu, Z.; Xu, D.; Chari, P.; You, S.; Wang, Z.; Kadambi, A. Feature 3Dgs: Supercharging 3D gaussian splatting to enable distilled feature fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 21676–21685. [Google Scholar]
Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. arXiv 2021, arXiv:2111.12077. [Google Scholar]
Schonberger, J.L.; Frahm, J.-M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2016; pp. 4104–4113. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Qin, M.; Li, W.; Zhou, J.; Wang, H.; Pfister, H. Langsplat: 3D language gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2024; pp. 20051–20060. [Google Scholar]
Wang, Z.; Li, Y.; Li, H. Chinese inscription restoration based on artificial intelligent models. npj Herit. Sci. 2025, 13, 326. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Ortiz-Cayon, R.; Kalantari, N.K.; Ramamoorthi, R.; Ng, R.; Kar, A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (ToG) 2019, 38, 29. [Google Scholar] [CrossRef]
Niklaus, S.; Liu, F. Softmax splatting for video frame interpolation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2020; pp. 5437–5446. [Google Scholar]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: New York, NY, USA, 2018; pp. 586–595. [Google Scholar]

Figure 1. Overview of 3D Gaussian Splatting. Optimization commences with the sparse Structure-from-Motion (SfM) point cloud, generating an initial set of 3D Gaussian representations. This set is subsequently optimized with adaptive density control parameters. During the optimization process, the tile-based accelerated renderer is employed, resulting in training time that outperforms the prevailing rapid radiance field methods.

Figure 2. NNFM loss achieves style transfer by minimizing the cosine distance between each rendered feature and its nearest neighbor in the style feature space.

Figure 3. Pipeline of the 3D style transfer for cliff carvings. We first reconstruct a 3D model using multiple photographs, generating a high-quality 3D Gaussian representation. Subsequently, we embed features from the VGG-16 network into the 3D Gaussian. We then stylize this feature-embedded reconstruction using an style image employing a Nearest Neighbor Feature Matching (NNFM) style loss. Finally, we can obtain a stylized 3D Gaussian representation.

Figure 4. Overall view of Wanfotang, and the cliff carvings within.

Figure 5. Point cloud reconstructed by COLMAP. Blue points indicate the COLMAP point cloud, whereas gray elements represent the 3DGS model.

Figure 6. The cliff carving reconstructed using 3DGS, serving as the content image for style transfer.

Figure 7. Qualitative results. Comparison of our method with two state-of-the-art 3D style transfer methods, StyleRF and ARF. The superior stylization quality and detail preservation achieved by our method are visually apparent.

Figure 8. Three-dimensional style transfer of cliff carvings using monochromatic style guidance.

Figure 9. Ablation study on the impact of reconstruction backbone and style loss. The figure compares NeRF-based and 3DGS-based reconstructions under identical style transfer settings, as well as the effects of different style losses (AdaIN vs. NNFM) within the 3DGS framework. Results show that 3DGS provides more stable geometry and improved multi-view consistency compared to NeRF, while the NNFM loss better preserves fine-grained carved textures than AdaIN, which tends to produce homogenized stylization effects.

Figure 10. Original images of experimental artifacts: (a) Seated Shakyamuni Buddha statue; (b) Ming Dynasty bronze incense burner.

Figure 11. Three-dimensional reconstruction result of the Buddha statue using 3DGS.

Figure 12. Three-dimensional style transfer results for the Buddha statue.

Figure 13. Three-dimensional reconstruction result of the bronze incense burner using 3DGS.

Figure 14. Three-dimensional style transfer results for the bronze incense burner.

Figure 15. Three-dimensional reconstruction result of the dense Buddha heads using 3DGS.

Figure 16. Three-dimensional style transfer results for the dense Buddha heads.

Table 1. Quantitative results. Our method is evaluated from the perspective of multi-view consistency using LPIPS (↓) and RMSE (↓) metrics.

	Short-Range Consistency		Long-Range Consistency
	LPIPS (↓)	RMSE (↓)	LPIPS (↓)	RMSE (↓)
ARF	0.162	0.143	0.220	0.203
StyleRF	0.104	0.117	0.122	0.149
StyleGaussian	0.052	0.067	0.109	0.112
Ours	0.044	0.041	0.079	0.072

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, Y.; Ren, H.; Li, Y.; Sui, D.; Guo, M. An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting. Math. Comput. Appl. 2026, 31, 47. https://doi.org/10.3390/mca31020047

AMA Style

Li Y, Ren H, Li Y, Sui D, Guo M. An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting. Mathematical and Computational Applications. 2026; 31(2):47. https://doi.org/10.3390/mca31020047

Chicago/Turabian Style

Li, Yang, He Ren, Yacong Li, Dong Sui, and Maozu Guo. 2026. "An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting" Mathematical and Computational Applications 31, no. 2: 47. https://doi.org/10.3390/mca31020047

APA Style

Li, Y., Ren, H., Li, Y., Sui, D., & Guo, M. (2026). An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting. Mathematical and Computational Applications, 31(2), 47. https://doi.org/10.3390/mca31020047

Article Menu

An Improved Method for 3D Style Transfer of Cliff Carvings Based on Gaussian Splatting

Abstract

1. Introduction

2. Related Work

2.1. 3D Reconstruction

2.2. Style Transfer

3. Methods

3.1. 3D Gaussian Splatting

3.2. Nearest Neighbor Feature Matching Loss

3.3. 3D Style Transfer

4. Results

4.1. Dataset

4.2. Quantitative Results

4.3. Qualitative Results

4.4. Ablation Studies

4.5. Additional Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI