AI- and AR-Assisted 3D Reactivation of Characters in Paintings

Shih, Naai-Jung

doi:10.3390/heritage8060207

Open AccessArticle

AI- and AR-Assisted 3D Reactivation of Characters in Paintings

by

Naai-Jung Shih

Department of Architecture, National Taiwan University of Science and Technology, 43, Section 4, Keelung Road, Taipei 106, Taiwan

Heritage 2025, 8(6), 207; https://doi.org/10.3390/heritage8060207

Submission received: 27 April 2025 / Revised: 16 May 2025 / Accepted: 29 May 2025 / Published: 4 June 2025

(This article belongs to the Special Issue AI and the Future of Cultural Heritage)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Ancient paintings are an intangible window to the economy, politics, and customs of the past. Their characteristics have evolved or were made obsolete, with only limited contemporary connections remaining. This research aims to preserve and to interact with characters in 2D paintings to evolve their cultural identity through combining AI and AR. The scope of this research covers traditional Chinese paintings archived by the National Palace Museum in digital collections, mainly “New Year’s Market in a Time of Peace”. About 25 characters were used for training and 3D reconstruction in RODIN^®. The models were converted into Augment^® and Sketchfab^® platforms as reactivated AR characters to interact with new urban fabrics and landscapes. Stable Diffusion^® and RODIN^® were successfully integrated to perform image training and reconstruct 3D AR models of various styles. As a result, interactions were conducted in two ways: in a mixed context with mixed characters in a painting and in a familiar context in the real world with mixed characters. It was found that AR facilitated the interpretation of how the old urban fabric was arranged. Using AI and AR is a current issue. Combining AI and AR can activate ubiquitous preservation to perform recursive processing from diffused images in order to reconstruct 3D models. This activated heritage preservation method is a reasonable alternative to redefining intangible subjects with a new and evolved contemporary cultural identity.

Keywords:

AI; AR; RODIN^®; characters; heritage; preservation

1. Introduction

Ancient paintings illustrate tangible aspects of heritage, such as styles, characters, and constructions, as an intangible window into the economy, politics, and customs of the past. These illustrations usually show how a peaceful and joyful era may be achieved from the artist’s point of view. The context was supposedly used to promote a generally desired ideal living environment, depicting various joyful activities of civilians. Urban and suburban district markets were painted as miniature depictions of the culture and customs during different dynasties. Although subjects can be metaphorically exemplified within the current context, almost all of their characteristics have evolved or have been made obsolete, with only limited contemporary connections remaining. Their contemporary transfer should inspire a new interpretation of experience in heritage preservation. Regeneration and interaction should be developed to connect tangible and intangible modern-day experiences.

1.1. Research Purpose

The aim of this research is to preserve and interact with characters in 2D paintings by 3D using AI and AR. AI generation creates new tangible characters and enhances the comprehension of an intangible culture, while interaction in AR allows for portable tangible characters on smartphones. Thus, AI and AR could be combined to activate a new cultural heritage preservation paradigm.

This research covers the open access digital collections of traditional Chinese paintings archived at the National Palace Museum, mainly New Year’s Market in a Time of Peace [1] (Figure 1). This painting presents a collection of vendors, street artists, and people of different ages exhibiting different behaviors on a roll of canvas painted in a 3D parallel perspective.

1.2. Related Studies

Ancient characters represent an intangible window to the styles of the past. Identity in contemporary culture includes interactions between urban spaces and deployed graphic statements in relation to characters. These interactions involve unique spatial structures, and the role of applications extends to the social–spatial dialectic, cultural identity, and collaborative tools and environments.

Public visibility is contingent on the urban environment [2]. Characters and graffiti share morphological resemblance, while urban redevelopment provides a platform and opportunities for increased visibility for graffiti [3,4]. The old painting elucidates old lifestyles and living environments using characters. Graffiti is part of the context of the social–spatial dialectic [5].

An AR character is a tailored form of content that needs to be reconstructed on a virtual platform ahead of time and then deployed afterward. Few studies have been conducting regarding AI and AR simulation. Three-dimensional reconstruction requires sufficiently well-pictured imagery to improve the modeling quality. A survey has shown that 3D Gaussian Splatting (3DGS), which represents a paradigm shift in neural rendering, has the potential to become a mainstream method for the 3D representations of characters. It effectively transforms multi-view images into 3D Gaussian images in real-time [6] and produces highly detailed 3D reconstructions. Software can estimate camera poses for arbitrarily long video sequences [7]. In a study on the surface reconstruction of large-scale scenes captured using a UAV, the quality of the surface reconstruction was ensured at heavy computational costs [8]. High-precision real-time rendering reconstruction has also been applied to cultural relics and buildings in large scenes [9], peach orchards [10], and historic architecture using 360° capture [11]. In addition to assisting in the massive 3D digitization of remains after the Notre-Dame de Paris fire [12], neural rendering can be applied to leaf structures [13], the analysis and promotion of dance heritage [14], and as an ethical framework for cultural heritage and creative industries [15]. Special renderers have been developed for effective visualization of point clouds or meshes [16].

However, the 2D nature of paintings and field-access restrictions usually prevent effective 3D reconstruction. RODIN^® applies a generative model for sculpting 3D digital avatars using diffusion [17], with the 3D neural radiance field (NeRF) model exhibiting computational efficiency. Three-dimensional content generation has benefited from advanced neural representations and generative models [18]. In regard to details, generative AI can be used to create accessories for a virtual costume [19].

Three-dimensional digital avatars and the creation of virtual costumes and accessories may reconstruct detailed paintings of 2D characters. The reconstruction should reactivate the tangible heritage of ancient paintings by rebuilding the characters and through the use of modern-day urban social fabrics for a more well-placed connection between them.

Both images and physical facilities contribute to preservation. The role of active imagination is important [20] and is missing in many intangible aspects of cultural heritage [21]. Just like the complex interaction that occurs with heritage languages, paintings are a type of visual language that illustrate a fundamental structural difference from the contemporary modern-day context. The generation of meaning through images calls for the exploration of specific mechanisms [22]. Adaptive reuse of cultural heritage can be a valid strategy to recover heritage buildings [23], while sustainable conservation can transform buildings into resources in a contemporary context [24]. Activating both images and physical facilities can be challenging in regard to interaction with the urban context. The use of augmented reality and GPS-guided applications has proven successful to revive cultural heritage [25].

This study takes a new approach to AR in comparison to existing 3D scanning or photogrammetry [26,27]. This approach allows for a fast and intuitive simulation using an image-based generative approach with AI, and translates this to AR. Digital technology has evolved from the positive effects of street art in the urban landscape [28], and new forms of visual language systems have emerged in the form of stickers for asynchronous communication [29]. Three-dimensional digital characters should create a new type of identity and facilitate cross-cultural communication, with the potential to increase its clarity.

2. Materials and Methods

Characters in ancient 2D paintings were usually differentiated, created, and directed in order to respond to urban context through situated interactions. The subjects usually covered many types, such as Gods, Goddesses, females, artists, poets, exiled persons, children, street buskers, vendors, soldiers, animals, or ancient beasts. The market painting, New Year’s Market in a Time of Peace, is a part of the digital resources archived at the National Palace Museum. The whole roll of the painting was scanned into frames at 600 dpi for open download under the image copyright of “CC BY 4.0@www.npm.gov.tw”. The painting illustrates and interprets subjects through their roles, jobs, living setups, or clothing.

Characters and urban social fabrics are interconnected. A simulation flowchart was developed, following the processes of image collection, image editing, parameterized training and generation, texture referencing, AR editing, and AR publishing (Figure 2). Image editing includes masking, AI generation of missing parts, filtering by reducing noise, and quality enhancement, while AR editing includes the adjustment of relative scale and deployment orientation. The scope of our project began with a test subject or a single character and expanded to a family, a vending booth, a stage with audiences, or a street block. The private-to-public classification arrangement was simulated with the support of cloud-accessed AR models and smartphones.

The platform consists of several parts (Figure 2). The humanistic experience was analyzed via deployment on a campus and in a commercial district, verified by first- and second-generation recursive 3D reconstruction, which is sustainable and environmentally friendly. In the first-generation reconstruction using RODIN^®, there was only one image of the old painting, instead of the general photogrammetry of 3DGS, where as many images as possible. On the other hand, the second-generation reconstruction used (1) smartphones to take images or videos and (2) 3D Zephyr^® or similar software to create 3D models of the characters and the urban social fabric from images or videos. The AR platform manages AR cloud access. Field deployment was facilitated using a smartphone supporting depth API for a no-drift and focused inspection of the displayed characters. Verification was performed using reconstructed 3D models, screenshots, and side-by-side comparisons.

For this painting, about 25 characters were used for training and 3D reconstruction using RODIN^® as part of the collaborative tests with another 550 3D models under various levels of acceptance. Field tests were conducted by selecting related screenshots, and a program was used to render outputs of the 780 images and videos from smartphones such as the Sony Xperia^® 1 II (Sony Group Corp., made in Tokyo, Japan) and Xiaomi^® 15 Ultra (Xiaomi Corp., made in Beijing, China).

2.1. The Tool for 3D Training and Generation

RODIN^® from Hyper3D^® (Table 1) was used to reconstruct subjects using AI. AI-trained models should create features that are similar to the features in the original painting. RODIN^® trained characters from an initiated seed benchmark of -1. Parameterized variations were performed to create symbolic or physical details at different levels of similarity, especially for clothes (including wrinkles and curvature), accessories, and animals. Inconsistencies that arose during RODIN^® training may have resulted in a variety of versions, each of which contained certain differences.

The consistency, capacity, and completeness of the generated models were of concern, especially with only one image being available. The final model featured differently oriented objects that were differentiated or clustered, with people facing forward or backward and with depth between foreground and background objects. Each cluster of people involved a unique training sequence and composition hierarchy. The allocated hierarchy and training completeness allowed for, at most, a cluster of five people with recognizable details.

Follow-up AR interactions and interpretation were conducted on the Augment^® (until April 2025) and Sketchfab^® platforms through cloud access to specific AR model formats on different sites. AR models with different compositions were produced to rearrange the people in the picture with regard to closeness and facial orientation. Collaborative media were used; for example, the 3D characters were also 3D-printed for physical verification.

2.2. Test Sequence

Most of the unique human figures were painted or composed using different levels of curvatures. The experiment started with a simple free-form object, a symmetric object, and an asymmetric object that were or were not related to wood sculptures of Chinese-themed ornaments. In the test sequences, images of general free-form objects, such as flowers, were selected, as well as prototype and non-prototype physical sculptures, prior to creating human characters in different roles or professions. The features ranged from thin rose petals, fine hairs, and facial features on a lion head to a rather symbolic representation of a crane. Depth layers may or may not be correctly generated with only one image.

The source images or field photographs consisted of layers of abstract or characteristic forms and compositions, with or without plain backgrounds (Figure 3). Three-dimensional models were reconstructed with or without reference to existing physical forms. The selected urban fabric included two campuses and a business street, which featured a sequence of views that one might find on an everyday walking or driving route.

2.3. Seed Elaboration and Style Fusion

From general to restricted seed-led 3D reconstruction, the tests encompassed abstract or basic shapes to AI-simulated characters. The latter were sourced from the painting. In order to represent the possible evolution of characters in context, images were applied, starting with a text-based description and evolving into iconic 3D modeling.

The process included three parts (Figure 4): (a) selecting the most meaningful characters; (b) selecting the preferred 3D reconstructed outcomes; and (c) deciding the favored seed and the CFG scale (classifier-free guidance scale) material rendering value. Geometry generation may assist in creating a balance in the physical details between focused specific parts (hairs, right-hand fingers, or chains, for example) and the whole body and clothes in general (Figure 4b). Seed development was attempted by comparing the painting with 3D physical models or Stable Diffusion^® (SD)-generated outcomes. The seeds were also elaborated for multiple views or under fusion, with a satisfactory proportion between different images. Gestures (body language), clothes, and accessories were adjusted under different weights for a realistic style combination.

Subjective decisions had to be made during each adjustment process. As mentioned above, a straightforward decision was made regarding the similarity to the features depicted in each character. The final 3D model was not a 100% accurate replica of the target in the painted character, as some part of the body was always not present or hidden behind other objects. The configuration regarding the shape and texture of clothing was still mainly interpreted and interpolated using RODIN^®. In this study, we did not use the tools provided within the software to modify shape.

2.4. AR Model Categories and Field Tests

In this approach, I presume that conversion to modern-day cultural heritage should proceed through an interaction between AR characters and a 3D physical urban fabric to cross distinct urban spaces. Augment^® and Sketchfab^® provided a cloud-accessed database of AR models, which were created in RODIN^® and edited on AR platforms for on-site download and field deployment (Figure 5).

3. Results

Scenes and screenshots were generated from recordings. The former were smartphone-framed 3D visual frustums; the latter were smartphone screenshots from AR apps with or without control buttons or prompts in Augment^® or Sketchfab^®. The composition and related poses in AR were applied at three sites on campus.

3.1. Characters

Characters consist of general individuals, vendors, workers, children, street artists, and families (Figure 6). They were purposely adapted into the current-day urban social fabric and are depicted partaking in daily experiences.

3.2. Interaction Patterns

Three-dimensional models of people were used to create a scene of a specific scenario in the spring market. People were positioned relative to each other in various compositions, alone or in a group. In Sketchfab^®, three models can be joined to form one. The compositions and related poses were applied in different settings using AR (Figure 7) as presented in the painting (Table 2). This application is similar to today’s role-playing activities, where participants wear ancient costumes, but using AR instead. Lines of sight frequently intersected and gestures were commonly used, implying interconnection between individuals or within groups.

3.3. Urban Social Fabric Adaptations

Introduction into the environment facilitates rethinking about how the old social fabric was arranged in a painting. The scenarios for modern-day urban adaptation consist of themes, roles, spaces, and their interconnections. Themes emerge from town, community, market, and family to seasons. For example, spring is the first season of a new year, and the market is a micro-representation of a country’s economy. New Year’s Market in a Time of Peace features a seasonal metaphor to represent people’s various living patterns through activities occurring at or near the market. This painting by Ding Guanpeng was created during the Qin Dynasty, in 1742 [1], on a paper roll with a landscape layout.

Roles range from family gatherings and vendors to young and old customers, while space setups range from open or staged gatherings to entertainment spaces. One of the most successful scenes was set in a business district full of parked cars and complex advertising panels, in which the characters seemed to have already merged with and became part of the pedestrians (Figure 7). The route, spanning from a campus to the outer streets and then back to the campus, allowed for a positive trans-context experience, especially when the scene was enriched by an elementary school field trip on the campus (Figure 8).

3.4. Reconstruction of a Character and Urban Social Fabric

A character [32] and landscape were filmed (Figure 9a) and reconstructed as proof of presence in Zephyr^® and Postshot^® (Figure 9b,c).

3.5. Three Dimensions with Stable Diffusion^®

Setups of ancient characters do not always merge well with general modern-day scenes. Additional AI tools such as SD^® were applied to create consistencies between characters and styles. Textural consistency is related to style and weaving patterns, such as texture or fibers, while structural consistency consists of the rational modeling of a configuration from the front to the back of invisible parts, such as makeup, hair, jewelry, clothes, accessories, or finishes. This creates a complete set, extending around the body as it was part of it.

RODIN^®, SD^®, and Photoshop^® have built on traditional image tools using modern-day training models to interpret and reactivate heritage works. RODIN^® is both a generation and assembly tool capable of 3D modeling using different images for multiple views or fusion. It works with 2D imagery resources or paintings rendered in different styles (Chinese or Western) or 3D tones, from watercolors, generative renderings, or unclear images or paintings. It features abstract line drawing, text description, complicated but not matching forms, or guiding lines and references in order to reach the required level of realism. Photoshop^® generation also infills outputs and diffusive outcomes.

Similarities exist between SD^® and RODIN^® in the convergence of diffused 3D models. SD features a different control interface to limit or direct towards expected or relatively acceptable answers [33]. AI-assisted image restoration was first conducted using the SD^® Controlnet paradigm (Figure 10), using the Canny, Openpose, MLSD, Lineart, Softedge, Scribble/Sketch, Inpaint, InstructP2P, Reference, T21-Adapter, or IP-Adapter options. Image diffusion with text prompts facilitated more AR interaction in terms of style across dynasties or professions, for example, between officials, workers, and emperors, while RODIN^® created the other 3D parts to replace missing information. As a result, a loop was developed from a refined diffusive process constantly fed with heuristics to correct graphics.

Mixing styles is as important as highlighting individual aspects. The former concerns issues resulting from transferring already known object syntax matched with novel patterns to either highlight or blend differences. Any research tool capable of this can provide sufficient background information after differences are discovered, followed by creating new patterns from old ones. Differences between the characters may include differences between categories and details (Table 3).

When mixing 3D styles, RODIN^® combines or blends different details in two images under various weight distribution proportions (Figure 11). Details may include the one or two pieces making up the front of a Chinese jacket, including the aglet and waistband on a robe or slack. Different weights are designated to generate 3D versions of models under different levels of similarity, which leads to a modularized creation process for the different characters. For example, mixtures of accessories and clothes are generated or trained by combining 3D forms of ancient hairstyles and garments with makeup painted by different artists from different dynasties. The balance can be adjusted and enhanced to form more varieties by choosing the dominating or key image before material generation (from PBR temperature 4.5/reference strength 0.65 to full).

The combined style does not necessarily represent the evolved development of customs, but is rather a reversed study method to distinguish the similar and different parts of the two characters, providing an alternative to creating a dress code.

4. Discussion

Activated heritage preservation aims to redefine old subjects in a contemporary modern-day context. Seeking a matching context is the first step towards understanding living and artistic compositions. Cross-dynasty composition is a good idea for cross-referencing characters outside of the complex background of an individual. The tools and setup used to achieve this should also be applicable to accessories, goods, or furniture. It also facilitates the study of painting styles using symbolic curves, either as smooth or polygonal calligraphy lines, to depict cloth wrinkles.

This novel way of viewing an ancient painting contributes to reinterpretations of the context of old paintings using AI and interaction in AR, which uses a smartphone in the basic setup. This paper is structured to present feasibility data, platforms, devices, and examples. This study intends to reactivate painted characters in a familiar modern-day social fabric. To do this, the interaction was staged in two forms: in a mixed context with mixed characters from a painting, and in a familiar context in the real world with mixed characters.

4.1. Group Rearticulation

Each scene was a picture residing in a larger picture as part of a story. Context was determined from convergent lines of sight, gestures, or orientation. People were seldom isolated and presented without any kind or level of communication with others. Even if a single person was present in the scene, they were overlooked by others from a distance, in contrast to a portrait of a lone person. As a result, this can be represented as a one-way stage performance surrounded by groups of audiences, for example, maids expecting a master’s call for service, or as a conversation between two people in a homogeneous or heterogeneous background.

Lines of sight and facial expressions revealed that there was more to uncover regarding the interaction within small groups of people, rather than just considering them unrelated individuals. One scene featured two ladies emerging from a gate to see a man with two children off to the market (Figure 12). Ladies were supposed to stay at home away from the public, and it is interesting that, in the painting, there are only a few females in the field, such as a street artist, a girl, a servant, and an old lady. The traditional social value was rearranged in RODIN^® using the front image first, and the missing part was filled in using Photoshop^® and generative AI. Then, using AR, the two groups of family members could be augmented together on a modern campus with street artists and vendors.

4.2. Adaptation and Restructuring

Adaptive reuse of national treasures should consider the composition and restructuring of characters. AR provides an interesting immersive experience where characters can be introduced into a contrasting context. The first question considered for urban AR was the selection of a similar or differentiated urban social fabric. Since the living background is different, using AR, the characters were adapted to a new site or scenario, such as the traditional market or campus-like landscape in spring with a street artist or vendor. There are many choices when tailoring a character to a site.

The composition was inevitably restructured through adaptation due to contrasting eras or inconsistent landscapes. The interlaced timelines created differences and, at the same time, forced exchanges between characters and the urban social fabric. Similarly to the dynamics that occurred during ancient festivals, composition and contrast were replicated again in AR as an optional stage-on-demand in a city.

The introduction of an old painting into the current urban social fabric can be achieved in a divergent or convergent manner. In contrast to hanging paintings on museum walls or posting media on webpages, divergent thinking can be applied to promote or reactivate preservation. Graffiti is a good example of where the deployment can either be divergently distributed to as many places as possible or, at the same time, convergently gathered at a hotspot shared with other artists. This is why graffiti was used as a metaphor to describe the combination of AI and AR. SD^® also works as an additional method of divergent activation, using controlled alternatives as a research tool to understand styles.

4.3. Three-Dimensional Rationality in Stage Effects

The inconsistency between 3D models is a valuable indicator for determining a painting’s meaning and composition. Contents that are inconsistent or need to be verified enhance inspection in terms of structural and visual details. Yakov Chernikhov, a Russian architect and graphic designer, was known for the constructivist style, frequently adopting a one-point perspective to exaggerate a building’s scale [36]. The same intention was applied in the creation of Chinese paintings, and AI was used to rationalize the pose and relative distribution between characters and accessories. It was found that the characters acted similarly to actors, using the front-facing area to maximize the stage effect. Deployment was conducted in detail to facilitate invisible conversation across regions within a limited canvas space. It is assumed that AI not only reconstructed 3D subjects based on one single image, but also rationalized the supposedly correct configuration of these characters.

The artist’s intentions can be adjusted in AR depending on how it is deployed. The differentiation between AI-assisted generation and physical appearance was verified visually; the more images that were taken on-site for the same subject, the more consistent the reconstructed model was. Differences were observed, and deviated between supposedly rationalized and purposely irrational deployed outcomes.

Moreover, details were purposely deformed in some way as a stage effect for audiences, as the artifacts used to be media in temples or paintings with an educational purpose. However, this effect is subject to different levels of spatial restriction, such as on a 360-degree panorama stage or a theater stage, and 3D characters can be framed or boxed in at the front, with a limited viewing depth, in a 2D painting in perspective or a parallel perspective. This deployment is important, especially where the subjects were deployed and restricted by a narrowed visual cone or viewing orientation.

4.4. Development of a Theoretical Framework

The assumption was straightforward in that the preserved or activated subjects should be introduced within a modern-day fabric, and not just as a morphological icon or commercial peripheral object. Verification should also be straightforward using an accessible device or equipment.

Theoretical and methodological levels were connected to reconstruct the framework and to reorganize the results. It can be understood as follows: images or videos taken from smartphones are used repeatedly to document and create 3D models. This process is simple and facilitates heritage preservation and activation, and it is assisted by two of the most popular technologies, AI and AR. The success of the reconstruction verifies the feasibility of the theoretical and methodological aspects.

This looped process reconstructed characters from a painting in the urban fabric of the real world, based on the assumption that a city is a museum supporting the on-demand humanistic experience of ancient characteristics. The reconstruction of both the character and the social fabric highlights that this method is a subjective and personal documentation tool for exploring urban areas, and represents a new check-in ritual for reconnecting with the environment. It merges art into our daily lives instead of being a painting on a museum wall.

5. Conclusions

AI and AR activated the ubiquitous preservation and interaction. Reconnecting the past with the present was made possible and, at the same time, the context of an ancient society was visualized through the interactions between characters.

Recursive processing of diffused images and reconstructed 3D models facilitated the interaction between the painting and reality. The painting was reactivated and readapted to new urban fabrics using characters that were reconstructed, rearticulated, and recomposed in different combinations of single individuals or groups. Not only was AI able to reconstruct the characters, but AR was also used to reinterpret human relationships with the environment. AR has become a powerful tool to reinterpret traditional contexts in the modern world.

This interaction was modeled in two ways: in a mixed context with mixed characters from a painting, and in a familiar context in the real world with these mixed characters. Deployment of the technology was either sustainable and environment-friendly, or was merged into a situated scene similar to a location-specific dress code. Using AI and AR is an appropriate method to create an intangible window to the past in either a contrasting or consistent approach.

This comprehensive critical AI approach used to study old paintings is both a contribution and a limitation. The combination of AI and AR opens up the possibility of adapting platforms, enabling rearticulation and restructuring within the domain of ethics. The 3D rationality presents a positive way to explore detail. One of the most important findings of this study is that collaboration across different AI platforms (SD^®, for example) can improve our comprehension of the past.

The theme of the new year’s market was an appropriate example. This study does not limit future studies, but rather enables a collaborative exploration of AI data and tools. Future research will follow the development of AI and AR in order to visualize and merge new character styles from different pictures.

Funding

This research received no external funding.

Data Availability Statement

Data are contained within the article.

Acknowledgments

New Year’s Market in a Time of Peace [1] and references [30,31,32,34,35] are archived at the National Palace Museum in Taiwan. The digital resources are open to download and use for academic purpose at 600 dpi under the copyright of “CC BY 4.0@www.npm.gov.tw”, Heritage 08 00207 i001

. The author would like to express sincere appreciation. The open data for my academic research are intermediate-level images (6 million pixels) of cultural relics such as calligraphy and paintings. As indicated at upload, intermediate-level images and texts can be used publicly without application, without restriction on use, and without payment. A name attribution should be provided where appropriate. The reference suggested by the National Palace Museum is used, moreover it is included to fulfill the MDPI manuscript’s reference format requirement. Reference [34], which refers to a copy of the original painting by Gu, K. from the Southern Song Dynasty, was in Palace Museum, Beijing, China. Its photographic reproduction is considered to be in the public domain in the United States. The author would like to express sincere appreciation.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ding, G. New Year’s Market in a Time of Peace. National Palace Museum: Taipei, Taiwan, Qing Dynasty, 1742, CC BY 4.0 @ www.npm.gov.tw. Available online: https://digitalarchive.npm.gov.tw/Collection/Detail/5707?dep=P (accessed on 19 March 2025). (In Chinese)
Morrison, C. “Erasing a mural does not erase reality”: Queer visibility, urban policing, and the double life of a mural in Ecuador. Environ. Plan. D Soc. Space 2022, 40, 432–450. [Google Scholar] [CrossRef]
Parker, A.; Khanyile, S. Creative writing: Urban renewal, the creative city and graffiti in Johannesburg. Soc. Cult. Geogr. 2024, 25, 158–178. [Google Scholar] [CrossRef]
Li, W.; Liu, P. Evoking Nostalgia: Graffiti as Medium in Urban Space. Sage Open 2023, 13, 1–16. [Google Scholar] [CrossRef]
Altenberger, I. Signs, billboards, and graffiti a social-spatial discourse in a regenerated council estate. Soc. Semiot. 2024, 34, 253–268. [Google Scholar] [CrossRef]
Bao, Y.; Ding, T.; Huo, J.; Liu, Y.; Li, Y.; Li, W.; Gao, Y.; Luo, J. 3D gaussian splatting: Survey, technologies, challenges, and opportunities. IEEE Trans. Circuits Syst. Video Technol. 2025, Early Access. [Google Scholar] [CrossRef]
Dong, Z.H.; Ye, S.; Wen, Y.H.; Li, N.; Liu, Y.J. Towards Better Robustness: Progressively Joint Pose-3DGS Learning for Arbitrarily Long Videos. arXiv 2025, arXiv:2501.15096. [Google Scholar] [CrossRef]
Qian, J.; Yan, Y.; Gao, F.; Ge, B.; Wei, M.; Shangguan, B.; He, G. C3DGS: Compressing 3D Gaussian Model for Surface Reconstruction of Large-Scale Scenes Based on Multi-View UAV Images. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2025, 18, 4396–4409. [Google Scholar] [CrossRef]
Wang, W. Real-Time Fast 3D Reconstruction of Heritage Buildings Based on 3D Gaussian Splashing. In Proceedings of the 2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE), Jinzhou, China, 29–31 August 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1014–1018. [Google Scholar] [CrossRef]
Chen, Y.; Xiao, K.; Gao, G.; Zhang, F. High-Fidelity 3D Reconstruction of Peach Orchard Using a 3DGS-Ag Mode. Comput. Electron. Agric. 2025, 234, 110225. [Google Scholar] [CrossRef]
Rahimi, F.B.; Demers, C.M.; Dastjerdi, M.R.K.; Lalonde, J.F. Agile digitization for historic architecture using 360° capture, deep learning, and virtual reality. Autom. Constr. 2025, 171, 105986. [Google Scholar] [CrossRef]
Comte, F.; Pamart, A.; Ruby, K.; De Luca, L. Strategies and Experiments for Massive 3D Digitalization of the Remains After the Notre-Dame de Paris’ Fire. In Proceedings of the 10th International Workshop 3D-ARCH “3D Virtual Reconstruction and Visualization of Complex Architectures, Siena, Italy, 21–23 February 2024; Volume 48, pp. 127–134. [Google Scholar] [CrossRef]
Wang, L.; Yang, L.; Xu, H.; Zhu, X.; Cabrel, W.; Mumanikidzwa, G.T.; Liu, X.; Jiang, W.; Chen, H.; Jiang, W. Single-view-based high-fidelity three-dimensional reconstruction of leaves. Comput. Electron. Agric. 2024, 227, 109682. [Google Scholar] [CrossRef]
Stacchio, L.; Garzarella, S.; Cascarano, P.; De Filippo, A.; Cervellati, E.; Marfia, G. DanXe: An extended artificial intelligence framework to analyze and promote dance heritage. Digit. Appl. Archaeol. Cult. Herit. 2024, 33, e00343. [Google Scholar] [CrossRef]
Stacchio, L.; Balloni, E.; Gorgoglione, L.; Mancini, A.; Giovanola, B.; Tiribelli, S.; Zingaretti, P. An ethical framework for trustworthy Neural Rendering applied in cultural heritage and creative industries. Front. Comput. Sci. 2024, 6, 1459807. [Google Scholar] [CrossRef]
Stuart, L.A.; Pound, M.P. 3DGS-to-PC: Convert a 3D Gaussian Splatting Scene into a Dense Point Cloud or Mesh. arXiv 2025, arXiv:2501.07478. [Google Scholar] [CrossRef]
Wang, T.; Zhang, B.; Zhang, T.; Gu, S.; Bao, J.; Baltrusaitis, T.; Shen, J.; Chen, D.; Wen, F.; Chen, Q.; et al. RODIN^®: A generative model for sculpting 3D digital avatars using diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4563–4573. [Google Scholar] [CrossRef]
Li, X.; Zhang, Q.; Kang, D.; Cheng, W.; Gao, Y.; Zhang, J.; Liang, Z.; Liao, J.; Cao, Y.P.; Shan, Y. Advances in 3D generation: A survey. arXiv 2024, arXiv:2401.17807. [Google Scholar] [CrossRef]
Belkevičiūtė, G. Skaitmeninių Technologijų Inovacijos Mados Produktų Virtualizavimui. Ph.D. Thesis, Kauno Technologijos Universitetas, Kaunas, Lithuania, 2025. Available online: https://stud.lt/skaitmeniniu-technologiju-inovacijos-mados-produktu-virtualizavimui/ (accessed on 13 February 2025).
Alves, S. Understanding Intangible Aspects of Cultural Heritage: The Role of Active Imagination. Hist. Environ. Policy Pract. 2018, 9, 207–228. [Google Scholar] [CrossRef]
Montrul, S. Heritage Languages: Language Acquired, Language Lost, Language Regained. Annu. Rev. Linguist. 2023, 9, 399–418. [Google Scholar] [CrossRef]
Mengoni, A. Visual Semiotics. In The Palgrave Handbook of Image Studies; Purgar, K., Ed.; Palgrave Macmillan: Cham, Switzerland, 2021. [Google Scholar] [CrossRef]
Gravagnuolo, A.; Angrisano, M.; Bosone, M.; Buglione, F.; De Toro, P.; Fusco Girard, L. Participatory evaluation of cultural heritage adaptive reuse interventions in the circular economy perspective: A case study of historic buildings in Salerno (Italy). J. Urban Manag. 2024, 13, 107–139. [Google Scholar] [CrossRef]
Nikolić, M.; Šćekić, J.; Drobnjak, B.; Takač, E. Examined in Theory—Applicable in Practice: Potentials of Sustainable Industrial Heritage Conservation in a Contemporary Context—The Case of Belgrade. Sustainability 2024, 16, 2820. [Google Scholar] [CrossRef]
Hincapié, M.; Díaz, C.; Zapata-Cárdenas, M.-I.; Toro Rios, H.J.; Valencia, D.; Güemes-Castorena, D. Augmented reality mobile apps for cultural heritage reactivation. Comput. Electr. Eng. 2021, 93, 107281. [Google Scholar] [CrossRef]
Shih, N.J.; Diao, P.-H.; Qiu, Y.-T.; Chen, T.-Y. Situated AR Simulations of a Lantern Festival Using a Smartphone and LiDAR-Based 3D Models. Appl. Sci. 2021, 11, 12. [Google Scholar] [CrossRef]
Shih, N.-J.; Wu, Y.-C. Augmented Reality- and Geographic Information System-Based Inspection of Brick Details in Heritage Warehouses. Appl. Sci. 2024, 14, 8316. [Google Scholar] [CrossRef]
Cercleux, A.-L. Graffiti and Street Art between Ephemerality and Making Visible the Culture and Heritage in Cities: Insight at International Level and in Bucharest. Societies 2022, 12, 129. [Google Scholar] [CrossRef]
Alshenqeeti, H. Are Emojis Creating a New or Old Visual Language for New Generations? A Socio-semiotic Study. Adv. Lang. Lit. Stud. 2016, 7, 56–69. [Google Scholar]
Xia, G. Lofty Scholars in the Autumn Woods. National Palace Museum: Taipei, Taiwan, Song Dynasty, 1180–1230, CC BY 4.0 @ www.npm.gov.tw. Available online: https://digitalarchive.npm.gov.tw/Collection/Detail/387?dep=P (accessed on 19 March 2025). (In Chinese)
Yao, W. Drink Sellers; National Palace Museum: Taipei, Taiwan, Qing Dynasty, 1736–1795, CC BY 4.0 @ www.npm.gov.tw. Available online: https://digitalarchive.npm.gov.tw/Collection/Detail/6707?dep=P (accessed on 19 March 2025). (In Chinese)
Xu, C. A Lady. National Palace Museum: Taipei, Taiwan, 1899–1961, CC BY 4.0 @ www.npm.gov.tw. Available online: https://digitalarchive.npm.gov.tw/Collection/Detail/33246?dep=P (accessed on 19 March 2025). (In Chinese)
Shih, N.-J. AI-Generated Graffiti Simulation for Building Façade and City Fabric. Societies 2024, 14, 142. [Google Scholar] [CrossRef]
Gu, K. Nymph of the Luo River. Palace Museum, Beijing, China, Southern Song Dynasty, 345–406. Available online: https://commons.wikimedia.org/wiki/File:Gu_Kaizhi-Nymph_of_the_Luo_River_(full),_Palace_Museum,_Beijing.jpg (accessed on 16 April 2025).
Zhou, W. Fairy Ladies Hold a Literary Gathering. National Palace Museum: Taipei, Taiwan, Southern Tang Dynasty, 1066, CC BY 4.0 @ www.npm.gov.tw. Available online: https://digitalarchive.npm.gov.tw/Collection/Detail/14842?dep=P (accessed on 19 March 2025).
Shih, N.J.; Tsai, Y.T. A photogrammetry-based verification of assumptions applied in the interpretation of paper architecture. Comput. Graph. 2002, 26, 109–124. [Google Scholar] [CrossRef]

Figure 1. New Year‘s Market in a Time of Peace [1] painted by Ding Guanpeng, and a 3D character created using RODIN^®.

Figure 2. Simulation flowchart of the 3D reactivation of characters from paintings.

Figure 3. Preliminary tests of structural and visual details: (a) free-formed objects; (b) wood sculptures.

Figure 4. Attempts at seed development: (a) character selection; (b) geometry generation; (c) material generation.

Figure 5. AR preparation: (a) model database in Sketchfab; (b) RODIN^®-parameterized generation of structural and visual details; (c) QR (quick response) codes presented in Augment^® (https://cz8s.app.link/1k8ZYilMRRb, accessed on 25 March 2025); (d) AR editing interface in Sketchfab^®; (e) field setup; (f) screenshot with size in Augment^®; (g) screenshot without size in Sketchfab^®.

Figure 6. The four types of components and models.

Figure 7. Field deployment screenshots from Augment^® and Sketchfab^® in combination with characters generated in [30,31].

Figure 8. Mobile vendors, street artists, and people at a picnic pictured with a school field trip on a campus.

Figure 9. A lady [32]: (a) frames captured from a movie filmed around a character on campus; (b) 3D reconstructed model of the character and landscape in Zephyr^®; (c) postshot reconstruction process and trimmed 3DGS model.

Figure 10. Combining SD^® and RODIN^® for diffusive generation of outcomes: (a) looped process; (b) SD controls; (c) original image [34] and 3D models; (d) SD diffused image and 3D models.

Figure 11. Use of fusion weights to control the structural and visual 3D details of models in image (a) “A Lady” [32] and (b) “Fairy Ladies Hold a Literary Gathering” [35]; there is about 75% of image (a) in (c), 50% of each image in (d), and 75% of image (b) in (e).

Figure 12. Reformulation of the original composition (top) to the new arrangement (bottom).

Table 1. Software versions, hardware specifications, and settings.

	AI Software	Image Software	AR Platform	Laptop
Name	RODIN^®	Photoshop^®	Sketchfab^®, Augment^®	MSI^® Vector GP66
version	Gen-1.5 V1.2	25.4.0		Intel: i7-12700H 2.70GHz
settings	Seed: -1 - 65535			RAM: 64 GB Graphic card: NVIDIA RTX 3060Ti, 6 GB VRAM Window 11

Table 2. Composition and related poses in AR.

Categories	Composition	Poses or Behaviors
Sets	Street artist + two children: children’s response to a street artist and his monkey Street artist + two assistants: foreground and background eye contact (irrational reconstruction) A peddler + two unwilling customers A father + a child on shoulder + one child holding his hand	A family of five in a market Friends shopping for firecrackers A wife behind a door panel or joining her husband to visit the market

Table 3. Categories and details in mixed styles.

Category	Details
Poses	Head: concentration, contemplation, meditation Eyes: closed, staring
Face	Appearance, features Line of sight Emotion
Clothes	Styles: wrinkles (moldings), textures, colors, dominating lines Accessories Fabrics: continuity in patterns
Poses + clothes	Arms with roll-up sleeves ▪ Wrinkles, crevices ▪ Explicit or implied body language Hands and accessories or goods for sale

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shih, N.-J. AI- and AR-Assisted 3D Reactivation of Characters in Paintings. Heritage 2025, 8, 207. https://doi.org/10.3390/heritage8060207

AMA Style

Shih N-J. AI- and AR-Assisted 3D Reactivation of Characters in Paintings. Heritage. 2025; 8(6):207. https://doi.org/10.3390/heritage8060207

Chicago/Turabian Style

Shih, Naai-Jung. 2025. "AI- and AR-Assisted 3D Reactivation of Characters in Paintings" Heritage 8, no. 6: 207. https://doi.org/10.3390/heritage8060207

APA Style

Shih, N.-J. (2025). AI- and AR-Assisted 3D Reactivation of Characters in Paintings. Heritage, 8(6), 207. https://doi.org/10.3390/heritage8060207

Article Menu

AI- and AR-Assisted 3D Reactivation of Characters in Paintings

Abstract