Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism

Fang, Yang; Ismail, Issarezal; Hadi, Hamidi Abdul

doi:10.3390/app15179346

Open AccessArticle

Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism

by

Yang Fang

,

Issarezal Ismail

^* and

Hamidi Abdul Hadi

College of Creative Arts, University Teknologi MARA, Seri Iskandar 32610, Perak, Malaysia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9346; https://doi.org/10.3390/app15179346

Submission received: 22 July 2025 / Revised: 14 August 2025 / Accepted: 21 August 2025 / Published: 26 August 2025

Download

Browse Figures

Versions Notes

Abstract

To overcome the limitations of low texture accuracy in traditional sculpture color restoration methods, this study proposes an improved Deep Convolutional Generative Adversarial Network (DCGAN) model incorporating a dual attention mechanism (spatial and channel attention) and a channel converter to enhance restoration quality. First, the theoretical foundations of the DCGAN algorithm and its key components (generator, discriminator, etc.) are systematically introduced. Subsequently, a DCGAN-based application model for sculpture color restoration is developed. The generator employs a U-Net architecture integrated with a dual attention module and a channel converter, enhancing both local feature representation and global information capture. Meanwhile, the discriminator utilizes an image region segmentation approach to optimize the assessment of consistency between restored and original regions. The loss function follows a joint optimization strategy, combining perceptual loss, adversarial loss, and structural similarity index (SSIM) loss, ensuring superior restoration performance. In the experiments, mean square error (MSE), peak signal-to-noise ratio (PSNR), and SSIM were used as evaluation metrics, and sculpture color restoration tests were conducted on an Intel Xeon workstation. The performance of the proposed model was compared against the traditional DCGAN and other restoration models. The experimental results demonstrate that the improved DCGAN outperforms traditional methods across all evaluation metrics, and compared to traditional DCGAN, the proposed model achieves significantly higher SSIM and PSNR, while reducing MSE. Compared to other restoration models, PSNR and SSIM are further enhanced, MSE is reduced, and the visual consistency between the restored and undamaged areas is significantly improved, with richer texture details.

Keywords:

sculpture restoration; digital heritage; DCGAN; attention mechanism; deep learning; texture reconstruction; image inpainting

1. Introduction

Sculpture embodies the artistic ideas of its creators and records human development across different historical periods, regions, and cultural phenomena. As a collective cultural heritage of humanity, sculpture connects the past and the future and plays an indispensable role in human civilization [1,2,3]. Historical sculptures can provide archaeologists, historians, and other cultural researchers with material evidence for religious, humanistic, customary, political entity, and economic studies, which, although not able to provide direct information like written records, can establish a historical framework for historians based on material remains, especially for eras or events that are not well known in written records [4,5]. The artist’s use of beautiful lines, shapes, and textures gives each piece of sculpture an intrinsic artistic value, and the sculpture reflects the artist’s pursuit and ambition, confidence in life, and love of the world, and passes on the artistic creator’s spiritual expression and pursuit of life to future generations [6,7,8]. ‘Using bronze as a mirror, one can straighten one’s clothes; using history as a mirror, one can know the rise and change of dynasties; using people as a mirror, one can reflect on successes and failures’. The exhibition of works of art in museums and heritage galleries, and other venues, allows people to understand history and see the cultural achievements of the country and the nation in different periods, and at the individual level, can draw on the wisdom of predecessors to guide the future path of life. At the national level, it can enhance everyone’s national self-confidence and thus improve the sense of national identity and national cohesion [9,10]. Therefore, the protection and repair of damaged sculptures is of great significance to history and culture workers, the general public, and even the nation.

Many sculptures are damaged due to aging of the materials used, unfavorable storage conditions, dust, relative humidity, and oxidation, with color distortion, blurring, peeling, blistering, and cracking of the skin and loss of paint [11,12]. Traditional physical restoration methods are risky and irreversible. Digitizing sculptures and applying AI technology enables the visualization and adjustment of restoration results. These digital outcomes can guide physical restoration efforts while preserving the artwork’s original style and semantics [13,14]. Therefore, it is necessary to build a systematic, high-precision digital restoration technology to recreate the sculpture’s original appearance, and to consider the style of the work of art, semantics, material characteristics, and other multi-dimensional information, for sculpture restoration work to provide a solid theoretical foundation and technical support. Traditional digital restoration methods use neighboring pixels of an image for texture synthesis to fill in the damaged area, which is mainly suitable for small area patch restoration, where the texture does not change much. However, it is difficult to repair larger damaged areas with complex structures, and these methods have difficulty capturing the style of the artist and the semantics intended in the artwork [15,16,17]. Deep learning, transfer learning algorithms, etc., have been used for digital restoration of images, which use the feature extraction and style migration capabilities of Convolutional Neural Networks (CNNs) to detect cracks and fill in missing parts of the image. While CNN methods have advantages in highly structured image restoration, artworks such as sculptures usually have unique textures and complex structures, and the artistic style and semantics are more demanding on the restoration process. These demanding conditions limit the application of CNN methods in sculpture restoration [18,19,20]. Generative Adversarial Network (GAN) algorithms are trained in an adversarial manner, where the generator and the discriminator compete with each other to improve the generator’s capability, and this training mode is suitable for digital restoration of structurally complex images. In order to make the GAN algorithm more effective for digital image restoration of damaged sculptures, Zhao et al. proposed a high-resolution broken image restoration method based on a convolutional self-coding Generative Adversarial Network (DCGAN) [11]. Wang et al. suggested the use of recurrent GANs for the restoration of damaged sculptures in order to improve the realism of the restored images [21]. Zou et al. used the pix2pix GAN algorithm for sculpture restoration, which improved the clarity and color of the restored images [22]. Kumar et al. proposed a generative adversarial method with a five-layer encoder network, which achieves effective repair of sculpture point damage [23]. Cao et al. proposed a GAN algorithm based on a pre-trained residual network, which yields a better sculpture restoration result as compared to the traditional GAN algorithm and better sculpture repair results [24]. However, the network architecture of existing GAN-based sculpture restoration methods only consists of convolutional and residual layers, which can only learn the local relationships between neighboring image pixels, but not the global relationships between image features, so that the existing techniques are still deficient in local blurring and texture loss in sculpture restoration.

The DCGAN (Deep Convolutional Generative Adversarial Network) generator uses transposed convolution to gradually upsample the low-dimensional noise to generate color images with high-dimensional features, and this structure can effectively capture the color distribution and local texture in the sculpture images, fully restoring the appearance of the sculpture’s color and surface texture details [25,26]. In addition, the DCGAN stabilizes the training process by using a batch normalization layer to overcome the gradient vanishing and accelerate the model convergence to achieve clear restoration of the image with a limited number of training rounds. Additionally, the introduction of nonlinear activation functions, such as ReLU and LeakyReLU, enhances the model’s nonlinear expression ability, which can further improve the structure restoration accuracy of the restored image [27,28]. However, the traditional DCGAN network architecture is dominated by convolutional layers, which can only learn the local features of the image, and it is difficult to achieve the global semantic association across regions [29,30,31]. The complex texture structure and color scheme of sculptures and the variety of semantic expressions of cultural symbols require the restoration model to have the ability to perceive the overall composition, stylistic consistency, and spatial structure globally. Since the DCGAN training process is unsupervised, the lack of high-quality paired datasets will lead to problems such as color deviation, texture repetition, and even structural distortion in the generated images, affecting the realism and artistry of the restored images [32,33,34]. In view of the above analysis, it is necessary to make improvements to the following aspects to enhance the practical value of the DCGAN in sculpture image restoration: (1) introducing an attention mechanism to strengthen the model’s ability to focus on the key areas of the sculpture image to improve the local restoration accuracy and style consistency of the texture and structure; (2) integrating the Transformer structure or global feature modeling module to capture global semantic information across regions in the image to enhance the ability to understand complex artistic styles; (3) construct a multi-style, multi-material data enhancement strategy to improve the model’s ability to migrate styles and adaptability; (4) design a finer multi-scale loss function, taking into account the pixel-level error and differences in perceptual features and artistic style consistency, to ensure that the restoration results are true and accurate and the restoration results are consistent.

To address the limitations of existing restoration methods, this paper proposes a Generative Adversarial Network (GAN) for the digital restoration of damaged sculpture images, incorporating dual (spatial and channel) attention modules and a channel converter. To achieve high-fidelity image restoration, the proposed network integrates spatial and channel attention layers into the encoder part of the generator, enhancing the understanding of global relationships among image features across both spatial and channel dimensions, thereby enabling fine-grained restoration of sculpture images. To verify the effectiveness and superiority of the method, this study focuses on two core research questions: (RQ1) Can integrating dual attention mechanisms and a channel converter into the DCGAN framework improve the accuracy and stylistic consistency of sculpture image restoration? (RQ2) Does the proposed improved DCGAN outperform traditional DCGAN and other mainstream restoration methods in terms of SSIM, PSNR, and MSE metrics across different types of sculpture damage? For this purpose, the proposed method is compared with multiple mainstream algorithms on sculpture samples of various styles and damage types, with a comprehensive evaluation conducted with both qualitative visual performance and quantitative indicators. The results demonstrate that the proposed method achieves significant advantages in maintaining overall artistic style coherence, restoring fine texture details, and enhancing semantic consistency, highlighting its broad application prospects in digital heritage restoration and sculpture art conservation. The proposed model integrates theories and methods from multiple disciplines, including deep learning, computer vision, and art restoration, and can be widely applied in intelligent restoration workflows such as virtual museums and cultural heritage databases, with strong potential to promote the practical implementation of AI technologies in heritage conservation.

2. Related Technologies Background

2.1. DCGAN Algorithm

Artificial neural networks (ANNs) can accurately learn potential mapping relationships between input and output data without considering explicit mathematical expressions; thus, artificial neural networks are considered to be an effective method to achieve function approximation and data fitting [35,36]. Among them, as a representative ANN, the DCGAN architecture mainly consists of modules such as a convolutional layer, batch normalization layer, activation function, and loss function. The convolutional layer is the foundation and core of the DCGAN, and this module can extract data features from images. The batch normalization layer can speed up the training and convergence of the DCGAN, the activation function enables the neural network to solve nonlinear problems, and the loss function is used to evaluate the accuracy of the neural network [37,38]. These modules are combined with each other, and their synergy can achieve accurate fitting of image data, which lays the methodological foundation for digital image restoration of damaged sculptures.

The DCGAN is mainly composed of two key modules, the generator and the discriminator, as shown in Figure 1. The generator is responsible for receiving random noise and outputting generated samples, which can learn the complex implicit distribution form of real sample data, and then generate a sample cheating detection model. The discriminator receives generated samples and real samples from the generator and the sample dataset, respectively, and then achieves the goal of distinguishing generated samples from real samples through continuous training. During the training process, the generator and the discriminator are in the process of learning and iterating with each other until the model is unable to determine the difference between the output samples and the real samples.

2.2. DCGAN Generator

The generator of the DCGAN consists of a series of transposed convolutional layers that use the batch normalization layer algorithm and the ReLU activation function, except for the Tanh activation function in the last transposed convolutional layer, which work together to progressively achieve the transformation of random low-dimensional noise vectors sampled on a random low-dimensional noise vector into a high-dimensional photorealistic image [39,40]. Firstly, a low-dimensional noise vector is input to the generator (usually the data in this vector conforms to a standard normal distribution), and then the target image size is sampled and learned through the transposed convolutional layer, and finally, a target image with the same size as the input training image in the model is output. In the process of color restoration of damaged sculptures, the generator learns the color characteristics of the original sculpture and can achieve the restoration function of converting random noise into an image similar to the color of the original sculpture. During computer processing, the generator maps the hidden vector z to the data space through a series of Conv2dTranspose transpose convolutional layers, each of which is paired with a BatchNorm2d layer and a ReLu activation layer, and the output data is controlled by the Tanh function to return data in the range of [−1, 1]. Since the data actually represents a sculpture image, an RGB image of the same size as the real image is generated during the sculpture digital image restoration process. The training objective of the generator is to minimize the variability of the distribution between the generated samples and the real samples, so the minimum gradient of the training objective function for both [41]:

G^{*} = \arg \min_{G} D i v (P_{d a t a (x)}, P_{G})

(1)

where

P_{G}

and

P_{d a t a}

represent the generated and real sample data, respectively.

2.3. DCGAN Discriminator

The discriminator usually consists of a series of convolutional layers that gradually transform the downsampling of high-dimensional images into low-dimensional feature data in order to distinguish the real image from the generated image [21]. Firstly, the real image or generated image is input into the module; then, a series of convolutional layers are processed to transform the downsampling into low-dimensional feature vectors. Each intermediate convolutional layer is followed by a batch normalization layer and a Leaky ReLU activation function, and the last convolutional layer uses a Sigmoid activation function to output the probability values to discriminate the authenticity of the image. This process is used in sculpture color restoration, where the discriminator is used for the similarity determination between the restored image and the original sculpture image, and prompts the generator to improve continuously to finally obtain a realistic restored image. The mathematical formulae in the discriminator are as follows [42]:

D^{*} = \arg \min_{d} D i v (P_{G}, P_{d a t a})

(2)

To keep the distribution form of the generated samples unchanged, the parameters of the generator need to be fixed before training the discriminator. During the training of the discriminant model, the input sample labels and the true distribution sample labels are set to 0 and 1. The true probability of the input samples is the value of the objective function of the discriminator, which is expressed as [43]:

V (G, D) = E_{X ~ P_{d a t a (x)}} [\log D (x)] + E_{Z ~ P_{Z (x)}} [\log (1 - D (G (z))]

(3)

where

E_{X ~ P_{d a t a (x)}}

and

E_{Z ~ P_{Z (x)}}

are the expected x and random noise data z of the real data, respectively;

D (x)

is x output by the real data through the discriminator;

D (x)

G (z)

is z output by random noise through the discriminator.

3. Improved DCGAN Model Construction

Figure 2 illustrates the overall workflow of the proposed I-DCGAN model and its structural improvements over the traditional DCGAN framework. The traditional DCGAN primarily achieves local texture generation by stacking convolutional layers but lacks the capability to model global semantic structures and maintain style consistency. In this study, the generator is enhanced with a dual-attention mechanism—comprising spatial attention and channel attention—and a channel conversion module, enabling perceptual modeling of the global context while improving local detail restoration through feature/style alignment. Specifically, the generator takes a damaged sculpture image (optionally with a mask) as input, processes it through an encoder (Conv blocks + BN + ReLU/LeakyReLU), dual-attention modules, a channel converter, and a decoder (transposed convolutions with skip connections) to produce the restored image G(x). The output is optimized using reconstruction losses, including L1/SSIM, perceptual (VGG), and optional style/TV losses, with gradients back-propagated to update generator parameters. The discriminator, implemented as a PatchGAN or multi-scale structure, distinguishes between real images y and generated images G(x), computing an adversarial loss Ladv. The total generator loss LG combines adversarial, reconstruction, and perceptual losses, with both generator and discriminator parameters updated via backpropagation. These improvements in the generator, discriminator, and loss function collectively contribute to the enhancement of restoration performance. To further illustrate this process, core modules of the proposed I-DCGAN (Pseudo-Code) are presented in Appendix A.

Based on previous studies on GAN-based image restoration [11,21,22,23,24], and the theoretical advantages of dual attention mechanisms in enhancing both local and global feature learning [25,27,29], we formulate the following hypotheses: H1: The improved DCGAN (I-DCGAN) achieves significantly higher SSIM values than the traditional DCGAN in sculpture image restoration; H2: The improved DCGAN achieves significantly higher PSNR values than the traditional DCGAN in sculpture image restoration; H3: The improved DCGAN achieves significantly lower MSE values than the traditional DCGAN in sculpture image restoration; H4: The improved DCGAN consistently outperforms other mainstream restoration models (CA, EC, RN, LGNet, CTSDG) across different sculpture damage types in terms of SSIM, PSNR, and MSE. Table 1 presents the structural comparison between the T-DCGAN and I-DCGAN.

In summary, the proposed improvements to the DCGAN framework—including dual attention modules, channel conversion between encoder/decoder layers, a patch-based discriminator, and a multi-objective loss function—address the core limitations of traditional DCGAN models in sculpture restoration. These enhancements collectively improve the model’s ability to handle complex textures, preserve structural and stylistic integrity, and achieve more realistic restoration outcomes.

3.1. Generator Architecture Improvement

The architecture of the generator model is designed to progressively recover data features and create ‘fake’ sample images. To be precise, the generator first receives a 100-dimensional random noise vector, and then completes the upsampling process by deep transposition convolution to obtain feature maps at different scales. In this process, each sampling operation is followed by a batch normalization layer and an activation function layer, except for the last layer, which uses the Tanh activation function, and all other activation function layers use the ReLU activation function. After seven upsampling operations, the initial input random noise vector is finally transformed into a grey-scale image with a resolution of 256 × 256 × 1.

In the process of performing DCGAN image restoration model construction for sculpture color restoration, the generator part adopts the U-Net architecture based on encoder/decoder. The input and output objects of this architecture are both images, which has a great advantage in the application of the DCGAN image restoration machine learning model construction for sculpture color restoration. In addition, the convolutional module in the encoder part of the architecture facilitates the learning of local dependencies between image features, the dual attention module consisting of spatial and channel attention layers helps to learn the spatial and inter-channel global dependencies between image features, as shown in Figure 3, and the decoder generates high-quality restoration images from both local and global image features through the attention and convolutional layers. Furthermore, the addition of the jump connections between the channel converter helps in multi-scale feature fusion and fills the semantic gap between the encoder and decoder layers to further improve the quality of the generated image.

The encoder of the generator contains three parts: the feature map module, the convolutional layer module, and the dual attention module. The function of the feature map module is to map the 3-channel input image to the 64-channel feature map, which provides input data for the convolution layer. The convolution module downsamples the image features to achieve the learning of local dependencies between the image features, which consists of four convolution modules, greatly improving the learning ability and learning efficiency. Each convolutional module is immediately followed by a dual-attention module consisting of spatial and channel attention modules, which can realize the learning of spatial and channel global dependencies between image features. The feature map module maps the final feature map to the three channels and outputs the desired image. In addition, the channel converter added between the jump connections from encoder to decoder not only helps in multi-scale feature fusion between the two, but also fills the semantic gap between the two. The convolutional encoder consists of the following components: three 3 × 3 convolutional modules filled with 1, with convolutional kernel sizes of 2 × 2, 3 × 3, and 2 × 2 in that order, followed by a batch normalization operation, a ReLU activation function operation, a random deactivation operation, and a maximal pooling operation, which act together in order to achieve feature encoding.

As shown in Figure 4, the dual attention module contains channel and spatial attention modules in the form of a feed-forward convolutional neural network, which utilizes the global correlation and feature weights of the channels to enhance relevant features and suppress weaker ones. In the channel attention module, the input features are subjected to maximum pooling and average pooling operations and then passed through a shared multilayer perceptron; after that, a Sigmoid activation function is applied to generate the channel attention for the elements corresponding to the input features. The spatial attention module generates spatial features by maximum pooling and average pooling operations. It also uses 3 × 3 null convolutions to efficiently aggregate contextual information. Then, the Sigmoid function generates spatial attention features. Finally, the spatial and channel features of the input features are combined to achieve dual-channel attention to enhance the detailed feature information. In addition, a channel converter is used between jump connections to ensure effective feature fusion from the encoder to the decoder module. The channel converter module has two subcomponents: the inter-channel cross-fusion converter and the inter-channel cross-attention.

Figure 5 shows how the channel converter works, which consists of three steps. First, jump connections are converted into tokens using multi-scale feature embeddings. Then, these tokens are passed through multiple inter-channel cross-attention layers. Then, the outputs of these layers are passed to the multilayer perceptron to refine the features, and, finally, inter-channel cross-attention is performed on these features. The channel converter ensures multi-scale fusion of encoder and decoder features and reduces the semantic gap between them.

3.2. Discriminator Improved

When inputting an image to the discriminator for classification, the traditional Generative Adversarial Network (GAN) architecture directly determines whether the entire image is true or false. In order to improve the accuracy of image modification details, this study proposes a region segmentation algorithm that uses a discriminator that does not judge the whole image as true or false, but rather divides the input image into numerous blocks and then classifies it as true or false. The discriminator performs a convolution operation on the image to obtain an output matrix of predicted values for each block, each value in the output matrix represents the probability that the corresponding image block is true or false, and then all the responses are averaged to obtain the final true/false prediction of the discriminator, the working principle of which is shown in Figure 6.

The architecture of the discriminator consists of a series of convolutional layers that enable the sampling of three-channel input images and their eventual transformation into one-channel predictions to obtain the output result. The initial feature map convolutional block of the discriminator is a 1 × 1 convolutional layer, which maps the 3-channel input image into a 64-channel image feature map. The second feature map block of the discriminator contains four contraction blocks; multiple contraction blocks are designed to help the downsampling speed of the image. Each contraction block consists of two convolutional layers, the first one contains a 3 × 3 convolution, followed by batch normalization, random deactivation, and corrected linear unit (Leaky ReLU) activation function with leakage; the second convolutional layer consists of a 3 × 3 convolution, followed by batch normalization, modified linear unit with leakage (Leaky ReLU), and a maximum pooling operation block. There is an initial feature block, followed by a final feature block, the architecture of which is shown in Figure 7.

3.3. Loss Function Design

The generator linearly combines the adversarial loss, the perceptual loss, and the structural similarity index loss as a loss function for training. Figure 8 shows a schematic of the training process of the generator. Firstly, the repaired sculpture is spliced with the damaged sculpture and input to the patch discriminator, which predicts the result of the data values in the form of a matrix based on the inputs, with each value in the matrix representing the probability of a patch being true or false. The discriminator compares this predicted matrix with the all-1 matrix, and the difference between these two matrices is used as the adversarial loss for generator training. The perceptual loss is computed as the smoothed L1 loss between the original sculpture image and the generated sculpture image between the hidden layer activation values of the VGG-16 network. Similarly, the structural similarity index loss is computed as a measure of structural similarity between the original sculpture image and the generated sculpture image. The generator loss can be expressed as:

l o s s_{g e n e r a t o r} = l o s s_{B C E} + λ_{1} \times l o s s_{s m o o t h L 1} + λ_{2} \times l o s s_{S S I M}

(4)

where λ₁ and λ₂ are empirical values, which are 200 and 10, respectively, in this study. The smooth L₁ loss (

l o s s_{s m o o t h L 1}

) can be calculated by the following formula:

l o s s_{s m o o t h L 1} = \{\begin{matrix} 0.5 {(x_{n} - y_{n})}^{2} / β, & |x_{n} - y_{n}| < β \\ |x_{n} - y_{n}| - 0.5 β, & |x_{n} - y_{n}| \geq β \end{matrix}

(5)

where

β

is the hyperparameter with value 1, x and y represent the original and restored sculpture images, respectively, and n is the instance index.

Counterloss (

l o s s_{B C E}

) can be calculated by the following formula:

l o s s_{B C E} = - w_{n} [y_{n} \cdot \log σ (x_{n}) + (1 - y_{n}) \cdot \log (1 - σ (x_{n}))]

(6)

Structural similarity index measurement (SSIM) loss (

l o s s_{S S I M}

) can be calculated by the following formula:

l o s s_{S S I M} = 1 - \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(7)

where

μ_{x}

,

μ_{y}

and

σ_{x}

,

σ_{y}

are the mean and variance of x and y, respectively, and are the covariance of x and y.

Since the output of the discriminator is the probability value of each submodule being true or false, the submodule-based discriminator is trained using binary cross-entropy (BCE) loss. Its output is compared with the all-zero or all-one matrix corresponding to the false or true classification. Thus, for spurious outputs generated by the generator, the predictions of the small module discriminator are compared to the all-zero matrix. For original artwork images, the predictions of the small module discriminator are compared to the all-one matrix. Figure 9 shows a schematic of the discriminator training process. In the discriminator training process, similar to the conditional Generative Adversarial Network model constructed by Mirza, first, the damaged artwork is input into the generator as a condition, at which point the repaired image is generated by production. The repaired image is fed into the discriminator for feedback. Using a binary cross-entropy (BCE) loss function, the parameters of the discriminator are updated to classify the restored image as false and the original image as true. Based on the feedback from the discriminator, the parameters of the generator are optimized using a linear combination of adversarial loss, perceptual loss, and structural similarity index (SSIM) loss functions.

4. Experiment and Results Analysis

4.1. Experimental Program

To verify the superiority of the algorithms proposed in this study, quantitative and qualitative comparison experiments with more representative image restoration models have been conducted to validate the effectiveness of the Fourier convolution and multi-feature modulation modules proposed in this study in the image restoration process. We trained and tested the algorithm proposed in this paper using datasets. The first dataset contains 1000 sculpted images with pixel widths ranging from 124 to 992 pixels and pixel heights of 256 pixels. All the images in the dataset are uniformly resized to 256 × 256 pixels for training and experimentation before training; 900 images of sculptures in the dataset are used for the training dataset, and 100 images are used for the testing dataset. During the experiments, we trained the network by removing some regions of the original images to generate corrupted images, thus obtaining the original images for experiments as well as the corresponding corrupted images.

The common damage problems of sculpture models include four major categories: water and rain erosion and weathering and salinization, peeling and layer shedding, fading and discoloration, and localized damage. In order to demonstrate the advantages of the improved DCGAN over other modeling algorithms for the restoration of different sculptures, the CNN + Attention model (CA), Edge Connect (EC) model, Refine Net (RN) model, Coarse-to-fine + Transformer (CTSDG) model, Local and Global Refinement Network (LGNet) model, and the traditional DCGAN model with the improved DCGAN model proposed in this paper for the restoration of four different damaged types of sculptures, and a quantitative comparative analysis of the restoration effect is carried out.

4.2. Experimental Environment and Evaluation Index

This section provides a detailed introduction to the planned experiments, presents the obtained results, and conducts a quantitative analysis and comparison of the test outcomes. The experiments were conducted on an Intel Xeon workstation equipped with a Quadro RTX 6000-A GPU and 48GB of memory. To evaluate the performance of different restoration methods, root mean square error (MSE), peak signal-to-noise ratio (PSNR), and structural similarity index (SSIM) were used as quantitative evaluation metrics. The meanings of these parameters are listed in Table 2, and the mathematical formulas for their calculation are as follows [44]:

M S E = \frac{1}{m n} \sum_{i = 0}^{m - 1} \sum_{j = 0}^{n - 1} {[I (i, j) - K (i, j)]}^{2}

(8)

P N S R = 10 \log_{10} ({p_{v a l u e}}^{2}) / M S E

(9)

S S I M = \frac{(2 μ_{x} μ_{y} + C_{1}) (2 σ_{x y} + C_{2})}{(μ_{x}^{2} + μ_{y}^{2} + C_{1}) (σ_{x}^{2} + σ_{y}^{2} + C_{2})}

(10)

where

p_{v a l u e}

is the maximum possible pixel value.

4.3. Evaluation of Different Algorithms for Damaged Sculpture Restoration

4.3.1. Water and Rain Erosion and Weathered and Salted Sculpture Restoration

Figure 10 shows a comparison of the restoration results of the sculpture using different methods. In the restoration task for the weathered stone lion image, the seven models show different restoration characteristics and effects. It can be seen that although the CA model can repair the image of the sculpture in a simple way, the quality of the repair is obviously not able to achieve the ideal repair results. While the EC model is better than the CA model in terms of repair results, there are obvious blurring and artifacts in some areas, which cannot accurately respond to the edge structure of the sculpture. The RN model can restore the sculpture more clearly, but there are obvious traces of restoration with the original image, and the CTSDG model can solve the problem of blurring in the center of the image through the interaction of the image texture, but there is incoherence between the restored edges and the surrounding areas. The LGNet model can restore the sculpture better, but there are some distortions, as well. Although the traditional DCGAN algorithm is able to restore the color of ancient stone sculptures with complex textures and rich colors to a certain extent, the restored images still suffer from a lack of natural color transitions, blurred textures in some areas, and many other deficiencies in detail processing. Compared with the traditional method, the improved DCGAN method, which introduces the residual network module and the optimized convolution and upsampling layer, can better capture the deep features of the sculpture. The restored image obtained is more realistic and natural in the color transition region and the texture is clearer, which overcomes the deficiency of the color mutation in the traditional model, and the restoration results generated by the method proposed in this paper guarantee the overall consistency with the undamaged region, enriching the natural texture and overall consistency with the undamaged region.

Figure 11 demonstrates the normalized comparison results of different image restoration models in the three metrics of structural similarity index (SSIM), mean square error (MSE), and peak signal-to-noise ratio (PSNR). It can be seen that the I-DCGAN model performs optimally in all three indexes, with the highest SSIM value (0.95), the largest PSNR value (53.25), and the smallest MSE value (4.79), which provides an obvious advantage in image structure restoration and visual quality. In contrast, traditional methods such as the CA model are at the lowest level in all the indexes, showing poor restoration ability. The LGNet and T-DCGAN models, as the intermediate layer improvement methods, have improved performance, but are still not as good as the I-DCGAN. Compared with the T-DCGAN, the structural similarity index and the peak signal-to-noise ratio of the sculpture restoration by the improved I-DCGAN method in this study are both improved (5.56% and 6.22%), and the root mean square error appeared significantly reduced (19.67%). This shows that the I-DCGAN method can repair the damaged images better than the T-DCGAN, and the restoration performance is substantially improved. Compared with the LGNet model with better restoration effect, the SSIM and PSNR values of the I-DCGAN restoration method proposed in this paper are improved by 1.06% and 10.20% respectively, and the MSE value is decreased by 7.55%. Taken together, the improved DCGAN method with the introduction of the residual network module and the optimized convolution and upsampling layers has the largest SSIM and PSNR and the smallest MSE, indicating that the improved restoration method shows better results in both maintaining the consistency of the image structure and improving the visual quality.

4.3.2. Peeling and Lamellar Detachment Sculpture Restoration

Figure 12 shows a comparison of the reconstruction effects of different image restoration models on a damaged artefact image (camel sculpture). On the left is the original clear image used for reference (Real image), with complete structure, color, and texture information, followed by the Input image, which has been processed with information loss, showing obvious image structure defects, color breaks, and visual degradation. The reconstruction results of seven mainstream restoration methods (CA, EC, RN, LGNet, CTSDG, T-DCGAN, I-DCGAN) are shown from left to right in Figure 12. As can be seen in Figure 12, the restored images of the CA and EC models are still blurred, with incomplete restoration of the camel’s face and torso, and obvious texture loss. The RN model has a certain ability to restore the contour and structure, but the colors are oversaturated, and the details are not natural. The LGNet and CTSDG models have been improved compared to the traditional methods, and the main body of the image is basically reconstructed, but the level of detail is still slightly homogeneous. The T-DCGAN model maintains the overall structure of the image, but the details are still slightly homogeneous. The DCGAN model performs well in terms of overall structure maintenance, but the color reproduction is cold, and some details are still blurred. The I-DCGAN model presents the reconstruction effect closest to the real image. It has the most natural restoration in terms of color, brightness, and texture details, especially the high restoration of the camel’s body shadows and texture, which indicates that the method has stronger modeling ability in complex texture and shape recognition.

Figure 13 shows the quantitative evaluation of seven image restoration models (CA, EC, RN, CTSDG, LGNet, T-DCGAN, I-DCGAN) in the form of bar charts and line graphs for three image quality assessment metrics, namely, SSIM (structural similarity), MSE (mean square error), and PSNR (peak signal-to-noise ratio). A normalized histogram is used in Figure 13 to compare the relative levels of the three metrics, while a line graph of the raw values is superimposed to analyze the absolute performance of the various models for image restoration. As can be seen from the figure, the I-DCGAN model has the best performance in all the indicators; its SSIM value is the highest (0.96), MSE value is the lowest (5.04), and PSNR value is the highest (56.85), which indicates that its restoration image is the most excellent in terms of structural consistency, error control, and visual quality. In contrast, traditional models such as CA and EC are at a lower level in all indicators and have poorer restoration effects. In addition, the LGNet and T-DCGAN also show some competitiveness, but still not as good as the overall performance of the I-DCGAN. Compared with the T-DCGAN, I-DCGAN significantly improves in all the indexes and shows stronger image generation ability. Specifically, the SSIM value of the I-DCGAN (0.96) is 5.5% higher than that of the T-DCGAN (0.91), indicating that it is closer to the real image in terms of structural similarity. The MSE value of the I-DCGAN (5.04) is 16.8% less than that of the T-DCGAN (6.06), indicating that the I-DCGAN model repairs the image with less error. The PSNR of the I-DCGAN is as high as 56.85 dB, which is far more than the 46.03 dB of the T-DCGAN (exceeding by 23.5%), representing clearer and less-noisy generated images. From the comparison of the three aspects of image clarity, color reproduction, and structural integrity, it can be seen that the I-DCGAN has a greater improvement than the traditional model in terms of image quality and generation effect, and the I-DCGAN has a significant advantage in the task of cultural relics image restoration. This quantitative evaluation index further supports the conclusion of the assessment of the aforementioned visualization results.

4.3.3. Discolored Sculpture Restoration

Figure 14 shows a comparison of the restoration effect of multiple models on a faded cultural relic image (Terracotta Warriors). The original image is an undamaged image with saturated colors, intact details, and natural light and shadow, and the input image is an image that is obviously faded, blurred, and with serious missing details. After the restoration of each model, the CA model has more restored colors, but the edges are slightly blurred, and the light processing is poor. The EC model has a brighter image, distorted colors, whitish skin tone, and a strong sense of artificiality as a whole. The RN model has relatively clearer details and improved color fidelity, but it is still slightly darker. The LGNet model has more natural colors, a clear silhouette, and fair details. The CTSDG model has stronger colors, and the textures of the clothes are better restored, but the face is not natural enough. The CTSDG model is more colorful, and the texture of the clothes is better reproduced, but the face is not natural enough. The T-DCGAN model has slightly blurred outlines and low color saturation, which is improved but not realistic enough. The I-DCGAN model is closest to the real image, with natural colors, clear edges, reasonable shadow processing, and complete retention of details.

Figure 15 demonstrates the quantitative comparison of the restoration effects of different models. Figure 15 shows that the I-DCGAN achieves the optimal results in structural similarity index (SSIM), mean square error (MSE), and peak signal-to-noise ratio (PSNR) metrics of faded artefact image restoration. Compared with the T-DCGAN model, the SSIM of the I-DCGAN is 0.95, which is 9.2% higher than the 0.87 of the T-DCGAN, indicating that it has more fidelity in restoring the image structure, edge, and texture information. In terms of MSE, the value of the I-DCGAN is 5.54, which is lower than that of 6.31 of the T-DCGAN, with a reduction of 12.2% in error, reflecting its accuracy in pixel-level reconstruction. The I-DCGAN achieves 57.53 dB in PSNR, which is 27.9% higher than the 44.96 dB of the T-DCGAN, showing that its generated images have great advantages in visual clarity and noise control. In addition to the DCGAN model, the CTSDG model has the highest SSIM (0.94), but it is still lower than that of the I-DCGAN (0.95), with an enhancement of 1.06%. In terms of MSE, LGNet performs relatively well (6.22), and there is still room for error reduction of 10.9% compared to that of the I-DCGAN (5.54). In terms of PSNR, LGNet is the best-performing model except the T-DCGAN, with a PSNR of 41.54 dB, but the I-DCGAN still improves 15.99 dB, a rise of 38.5%, which is far more than other methods. In addition, traditional methods such as CA, EC, and RN are significantly lagging behind the I-DCGAN in a number of metrics. The CA model, for example, has an SSIM of only 0.78, an MSE as high as 7.03, and a PSNR of only 28.61 dB, which are, respectively, 17.9% lower (SSIM), 26.9% higher (MSE), and 50.3% less (PSNR), showing significant deficiencies in structural restoration, error control, and visual quality. Through comparison, it is found that the I-DCGAN not only achieves a significant performance improvement compared to the T-DCGAN, but also performs well in all indicators compared to other methods. The balanced improvement in structure restoration accuracy, image clarity, and error control proves that the I-DCGAN is an efficient image restoration model that has more comprehensive advantages and practical application value.

4.4. Experimental Testing Results Summary

To evaluate the validity of the proposed research hypotheses (H1–H4), the experimental results presented in Section 4 were analyzed with respect to the structural similarity index (SSIM), mean square error (MSE), and peak signal-to-noise ratio (PSNR) across multiple sculpture damage types. Table 3 provides a concise summary of the hypothesis testing outcomes.

As shown in Table 3, all four hypotheses were supported by the experimental findings. Specifically, the proposed I-DCGAN demonstrated consistent superiority over the traditional DCGAN in SSIM, PSNR, and MSE across all tested damage types. Moreover, the method consistently achieved the best overall performance compared with other mainstream restoration models (CA, EC, RN, LGNet, CTSDG), confirming its robustness and generalizability for sculpture image restoration tasks. These results validate the theoretical assumptions derived from the previous literature and confirm the practical benefits of integrating dual attention mechanisms and channel converters into the DCGAN framework.

5. Discussion

5.1. Potential Applications and Implications of This Study

The proposed improved DCGAN (I-DCGAN) model, incorporating dual attention mechanisms and a channel converter, demonstrates substantial potential in advancing both academic research and practical applications within the field of digital cultural heritage restoration. Its ability to accurately reconstruct fine-grained texture details while preserving the overall stylistic integrity of artworks provides a robust foundation for multiple downstream applications.

From an academic perspective, the I-DCGAN offers a technically rigorous framework that bridges computer vision, deep learning, and art conservation. The architectural and methodological innovations introduced in this study may inform future work in other domains requiring high-fidelity texture synthesis and style preservation, such as historical document restoration, archaeological image reconstruction, and medical imaging enhancement. The model’s design also serves as a valuable reference for integrating hybrid attention mechanisms into GAN-based frameworks under data-limited conditions.

Beyond its academic significance, the I-DCGAN demonstrates considerable promise in real-world deployment scenarios. It can be directly embedded into virtual museum platforms to provide automated, high-fidelity previews of restored sculptures. Integrating the model into interactive exhibition systems would allow visitors to dynamically switch between original damaged artifacts and their digitally restored counterparts, thereby enhancing public engagement and educational outreach. In large-scale cultural heritage databases, the model could function as a backend service for automated batch restoration, annotation, and condition assessment of visual records—including sculptures, murals, and archaeological artifacts—significantly reducing manual processing time while improving the consistency of archival data. Furthermore, the I-DCGAN can form the core of a human–AI collaborative restoration workflow, where professional conservators review, refine, and approve AI-generated restorations. The system could present multiple candidate outputs ranked by structural similarity and stylistic fidelity, enabling experts to select, adjust, or merge results. This collaborative approach combines the efficiency of automated processing with the domain expertise of conservators, thereby improving both technical accuracy and cultural authenticity.

The implications of this study extend beyond technical performance gains. By enabling scalable, high-quality digital restoration, the proposed model supports the preservation of fragile cultural artifacts without physical intervention. It also provides curators, researchers, and the public with unprecedented access to historically accurate reconstructions, fostering deeper cultural understanding and appreciation while ensuring the long-term safeguarding of heritage assets.

5.2. Limitations and Constraints of This Study

While the improved DCGAN (I-DCGAN) model demonstrates notable academic and practical value, with promising applications in virtual museums, cultural heritage databases, and human–AI collaborative restoration workflows, it is equally essential to critically examine its inherent limitations. Despite the model’s clear advantages in restoring the color and texture of sculptures, several constraints remain that warrant careful consideration. The following section provides a balanced assessment of these limitations and positions the I-DCGAN within the broader landscape of alternative restoration methodologies.

First, the model’s performance is constrained by the lack of large-scale, high-quality paired datasets of damaged and original sculpture images. This scarcity limits its capacity to generalize across diverse cultural contexts, material types, and complex texture styles. Moreover, the acquisition of such datasets in the heritage restoration domain remains both labor-intensive and costly, posing a substantial barrier to further advancements. Second, while the integration of a dual attention mechanism and channel converter enhances both fine-grained local texture reconstruction and overall stylistic coherence, the model can still produce stylistic deviations or structural misinterpretations when dealing with severely degraded or semantically abstract artworks. In such cases, achieving an optimal balance between local detail fidelity and global style preservation remains challenging. Third, although Vision Transformer (ViT) architectures are recognized for their strong capability in modeling long-range dependencies, a pure ViT-based approach was not adopted in this study for three main reasons.

(1): ViTs require extensive, high-quality paired datasets for supervised training, which are currently unavailable for sculpture restoration.
(2): The proposed I-DCGAN already incorporates hybrid attention modules and a channel converter, enabling partial global feature modeling within a GAN framework better suited for limited-data conditions.
(3): For restoration tasks, GAN-based methods are inherently more effective at producing high-fidelity local details, whereas unmodified ViT architectures tend to prioritize global relationships at the expense of detailed local synthesis.

Nevertheless, the I-DCGAN itself has shortcomings that alternative approaches, such as ViTs or other advanced architectures, could potentially address. While the dual attention mechanism allows for partial capture of global stylistic cues, its long-range dependency modeling capability remains inferior to that of Transformer-based methods, making it less effective for artworks with widely dispersed or highly interdependent stylistic elements. In addition, as a GAN-based model, the I-DCGAN is susceptible to training instability and mode collapse, occasionally resulting in artifacts or inconsistencies in fine detail synthesis. Beyond ViTs, other paradigms offer instructive contrasts. Traditional CNN-based inpainting methods are efficient in learning localized structural patterns but generally fail to capture the broader semantic composition required for the restoration of complex heritage artworks. Recent advances in diffusion models have delivered state-of-the-art performance in image generation and restoration, producing highly realistic outputs; however, they typically demand significantly greater computational resources and training time, and their effectiveness in fine-grained detail reconstruction for severely degraded cultural heritage data remains less explored. Within the GAN family, CycleGAN excels in unpaired image-to-image translation but can introduce structural artifacts, while StyleGAN achieves high realism but often alters the original artistic style excessively—an undesirable outcome in cultural heritage preservation. Compared with these approaches, the I-DCGAN strikes a favorable balance between local detail fidelity, global style preservation, and computational efficiency, though it still falls short of the global semantic modeling capacity and ultra-high-fidelity synthesis achieved by certain newer architectures.

5.3. Future Research Directions for AI-Based Sculpture Restoration

While the I-DCGAN model demonstrates notable strengths in balancing fidelity, coherence, and efficiency, the constraints discussed above highlight areas where its performance could be further improved. In light of the findings and limitations discussed above, several targeted research avenues can be pursued to advance the technical capabilities, cultural sensitivity, and practical applicability of AI-driven sculpture restoration.

(1): Expanding the training corpus to include a broad spectrum of materials, artistic styles, cultural origins, and damage types would enhance both model generalization and transferability. Incorporating high-resolution imagery with standardized metadata would also facilitate reproducibility and enable cross-domain benchmarking.
(2): Combining 2D visual data with 3D scanning, hyperspectral imagery, and textual descriptions could provide richer semantic and structural representations of artworks. Such multimodal fusion is particularly beneficial for sculptures with intricate surface details, symbolic motifs, or subtle stylistic cues.
(3): Leveraging Transformer-based or hybrid architectures capable of modeling long-range dependencies and complex spatial compositions may improve the restoration of artworks with widely dispersed or interdependent stylistic elements. Integrating these capabilities into GAN or diffusion frameworks could offer a balanced synthesis of fine local detail and coherent global style.
(4): Embedding the model into interactive workflows where professional conservators review, refine, and approve AI-generated outputs would ensure both technical accuracy and cultural authenticity. Iterative feedback loops between experts and the system could also serve as a mechanism for continuous improvement.
(5): Conducting comprehensive comparative analyses against alternative state-of-the-art methods—such as diffusion models, StyleGAN, CycleGAN, and CNN–Transformer hybrids—would clarify the trade-offs between accuracy, computational efficiency, and stylistic fidelity, enabling optimal model selection for specific restoration contexts.

By pursuing these directions, future research can address current methodological gaps, broaden the applicability of AI-based restoration techniques, and contribute to the sustainable preservation and dissemination of cultural heritage.

6. Conclusions

Sculpture represents a significant component of human cultural heritage. To facilitate the digital restoration of damaged sculptures, this study proposes a novel image restoration method based on generative artificial intelligence technology. This approach enables the digital restoration of artworks while preserving their original artistic style and semantics, providing valuable guidance and reference for art conservators and museums in conducting physical restoration. In this study, a Generative Adversarial Network (GAN) model incorporating dual attention modules and channel converters is introduced to achieve digital restoration of damaged sculptures. The proposed neural network is trained using a linear combination of perceptual loss, adversarial loss, and structural similarity index (SSIM) loss, enabling high-precision image restoration. A comparative analysis was conducted between traditional sculpture image restoration methods and the improved DCGAN approach. The results demonstrate that the proposed method outperforms existing art restoration techniques in terms of SSIM, mean squared error (MSE), and peak signal-to-noise ratio (PSNR) metrics. The enhanced DCGAN model effectively refines texture details while maintaining overall stylistic consistency between pre- and post-restoration images. Given its superior performance in restoring sculpture color and texture with high structural and stylistic fidelity, the proposed I-DCGAN model holds strong potential for integration into digital heritage management systems. It can be embedded into web- or cloud-based platforms to support automated pre-restoration previews in virtual museums, enabling curators to visualize damaged artworks with high fidelity without physical intervention. In large-scale heritage databases containing visual records such as murals, sculptures, or archaeological images, the model may function as a backend service for batch restoration, annotation, or condition assessment. When integrated with user-friendly interfaces, it further allows restoration professionals, art historians, and museum staff to interactively explore restoration scenarios, thereby enhancing accuracy and efficiency in decision-making.

Author Contributions

Y.F.: Conceptualization; software; data curation; methodology; writing—original draft preparation; I.I.: Investigation; formal analysis; writing—original draft preparation; supervision; H.A.H.: Data curation; software; validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received funding from Universiti Teknologi MARA (UITM).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon request. The data are not publicly available due to privacy.

Acknowledgments

We would like to express our gratitude to Universiti Teknologi MARA (UITM) for its financial support and favorable experimental conditions for this research.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Core Modules of the Proposed I-DCGAN (Pseudo-Code)

To ensure the reproducibility and transparency of the proposed method, the core structural components of the I-DCGAN model are provided in this Appendix A.

These modules correspond directly to the architectural elements and workflow illustrated in the main text (see Figure 2). The code is presented in pseudo-code form to highlight the model’s logic and functional composition while omitting low-level implementation details such as data loading and optimizer settings. Due to project-related restrictions, the complete source code cannot be publicly disclosed at this stage. However, the key modules of the generator, discriminator, dual attention mechanism, channel converter, and multi-objective loss framework are provided here to allow reviewers and readers to understand the essential structure and functioning of the proposed model. A full executable version can be made available to authorized parties upon request to the corresponding author, once the project’s confidentiality constraints are lifted.

Module DualAttention(channels C):

# Channel Attention

Apply global average pooling and max pooling on feature map

Pass through shared MLP (Conv→ReLU→Conv)

Compute channel weights via Sigmoid

Multiply feature map by channel weights

# Spatial Attention

Compute channel-wise average and max maps

Concatenate and apply 7 × 7 convolution

Compute spatial weights via Sigmoid

Multiply result by spatial weights

Return attention-enhanced feature map

Module ChannelConverter(enc_channels Ce, dec_channels Cd):

Project encoder features to decoder channel size via 1 × 1 convolution

Apply squeeze-and-excitation gating

Multiply projected features by gating weights

Return fused features

Module Generator(input_channels, base_channels):

# Encoder

e1 ← ConvolutionBlock(input_channels, base)

e2 ← ConvolutionBlock(base, base × 2) → DualAttention

e3 ← ConvolutionBlock(base × 2, base × 4) → DualAttention

e4 ← ConvolutionBlock(base × 4, base × 8) → DualAttention

bottleneck ← ConvolutionBlock(base × 8, base × 16)

# Channel Converter

c4, c3, c2, c1 ← ChannelConverters for each skip connection

# Decoder

up4: Upsample bottleneck; concat with c4(e4); apply ConvBlocks

up3: Upsample; concat with c3(e3); apply ConvBlocks

up2: Upsample; concat with c2(e2); apply ConvBlocks

up1: Upsample; concat with c1(e1); apply ConvBlocks

Output: 1 × 1 convolution + Tanh → Restored image

Module PatchDiscriminator (input_channels, base_channels):

Apply series of Conv→LeakyReLU layers, downsampling at each stage

Final Conv outputs patch-level real/fake logits map

Loss Framework:
- Adversarial Loss:
  -
  For D: BCE(real,1) + BCE(fake,0)
  -
  For G: BCE(fake,1)
- Reconstruction Loss:
  -
  L1 distance or (1 − SSIM) between restored and original images
- Perceptual Loss:
  -
  Smooth L1 between VGG feature maps of restored and original
- Total Generator Loss:
  - L_G = λ_adv × L_adv + λ_rec × L_rec + λ_perc × L_perc

References

Morriss-Kay, G.M. The evolution of human artistic creativity. J. Anat. 2010, 216, 158–176. [Google Scholar] [CrossRef]
Yang, Z.; Pan, W.-T. Interoperability Analysis between Traditional Chinese Sculpture and Painting Modeling from the Perspective of Big Data Management. Math. Probl. Eng. 2022, 2022, 3237282. [Google Scholar] [CrossRef]
Stoean, R.; Bacanin, N.; Stoean, C.; Ionescu, L. Bridging the past and present: AI-driven 3D restoration of degraded artefacts for museum digital display. J. Cult. Herit. 2024, 69, 18–26. [Google Scholar] [CrossRef]
Huang, P.-C.; Li, I.C.; Wang, C.-Y.; Shih, C.-H.; Srinivaas, M.; Yang, W.-T.; Kao, C.-F.; Su, T.-J. Integration of Artificial Intelligence in Art Preservation and Exhibition Spaces. Appl. Appl. Sci. 2025, 15, 562. [Google Scholar] [CrossRef]
Shih, N.-J. Using AI to Reconstruct and Preserve 3D Temple Art with Old Images. Technologies 2025, 13, 229. [Google Scholar] [CrossRef]
Wang, X. The analysis of sculpture image classification in utilization of 3D reconstruction under K-means++. Sci. Rep. 2025, 15, 18127. [Google Scholar] [CrossRef]
Tong, Y.; Cai, Y.; Nevin, A.; Ma, Q. Digital technology virtual restoration of the colours and textures of polychrome Bodhidharma statue from the Lingyan Temple, Shandong, China. Herit. Herit. Sci. 2023, 11, 12. [Google Scholar] [CrossRef]
Pietroni, E.; Ferdani, D. Virtual Restoration and Virtual Reconstruction in Cultural Heritage: Terminology, Methodologies, Visual Representation Techniques and Cognitive Models. Information 2021, 12, 167. [Google Scholar] [CrossRef]
Xiang, L.; Unjah, T.; Abdul Halim, S. Integrating the contribution of museums for building peaceful and inclusive society and promoting justice. Discov. Sustain. 2025, 6, 24. [Google Scholar] [CrossRef]
Ma, Y.; Tong, Q.; He, X.; Su, B. Exploring virtual restoration of architectural heritage through a systematic review. npj Herit. Sci. 2025, 13, 167. [Google Scholar] [CrossRef]
Zhao, F.; Ren, H.; Sun, K.; Zhu, X. GAN-based heterogeneous network for ancient mural restoration. Herit. Sci. 2024, 12, 418. [Google Scholar] [CrossRef]
Maitin, A.M.; Nogales, A.; Delgado-Martos, E.; Intra Sidola, G.; Pesqueira-Calvo, C.; Furnieles, G.; García-Tejedor, Á.J. Evaluating Activation Functions in GAN Models for Virtual Inpainting: A Path to Architectural Heritage Restoration. Appl. Sci. 2024, 14, 6854. [Google Scholar] [CrossRef]
Lyu, Q.; Zhao, N.; Song, J.; Yang, Y.; Gong, Y. Mural inpainting via two-stage generative adversarial network. npj Herit. Sci. 2025, 13, 188. [Google Scholar] [CrossRef]
Gupta, V.; Sambyal, N.; Sharma, A.; Kumar, P. Restoration of artwork using deep neural networks. Evol. Syst. 2019, 12, 439–446. [Google Scholar] [CrossRef]
Guan, J.; Li, H.; Cai, X.; Chen, E.; Lin, J.; Ding, Z.; Ni, Y. Progressive generative mural image restoration based on adversarial structure learning. npj Herit. Sci. 2025, 13, 309. [Google Scholar] [CrossRef]
Wei, X.; Fan, B.; Wang, Y.; Feng, Y.; Fu, L. Progressive enhancement and restoration for mural images under low-light and defective conditions based on multi-receptive field strategy. npj Herit. Sci. 2025, 13, 63. [Google Scholar] [CrossRef]
Chen, Q.; Li, G.; Xie, L.; Xiao, Q.; Xiao, M. Structure guided image completion using texture synthesis and region segmentation. Optik 2019, 185, 896–909. [Google Scholar] [CrossRef]
Maali Amiri, M.; Messinger, D.W. Virtual cleaning of works of art using deep convolutional neural networks. Herit. Sci. 2021, 9, 94. [Google Scholar] [CrossRef]
Li, Z.; Han, N.; Wang, Y.; Zhang, Y.; Yan, J.; Du, Y.; Geng, G. Image inpainting based on CNN-Transformer framework via structure and texture restoration. Appl. Soft Comput. 2025, 170, 112671. [Google Scholar] [CrossRef]
Senthil Anandhi, A.; Jaiganesh, M. An enhanced image restoration using deep learning and transformer based contextual optimization algorithm. Sci. Rep. 2025, 15, 10324. [Google Scholar] [CrossRef]
Wang, X.; Wu, K.; Zhang, Y.; Xiao, Y.; Xu, P. A GAN-based Denoising Method for Chinese Stele and Rubbing Calligraphic Image. Vis. Comput. 2022, 39, 1351–1362. [Google Scholar] [CrossRef]
Zou, Z.; Zhao, P.; Zhao, X. Virtual restoration of the colored paintings on weathered beams in the Forbidden City using multiple deep learning algorithms. Adv. Eng. Inform. 2021, 50, 101421. [Google Scholar] [CrossRef]
Kumar, P.; Gupta, V.; Grover, M. Dual attention and channel transformer based generative adversarial network for restoration of the damaged artwork. Eng. Appl. Artif. Intell. 2024, 128, 107457. [Google Scholar] [CrossRef]
Cao, M.; Feng, H.; Xiao, H. An Improved GAN-Based Image Restoration Method for Imaging Logging Images. Appl. Sci. 2023, 13, 9249. [Google Scholar] [CrossRef]
Wu, Z.; Wei, C.; Xia, Y.; Ji, Z. SAITI-DCGAN: Self-Attention Based Deep Convolutional Generative Adversarial Networks for Data Augmentation of Infrared Thermal Images. Appl. Sci. 2024, 14, 11391. [Google Scholar] [CrossRef]
Su, J.; Xu, B.; Yin, H. A survey of deep learning approaches to image restoration. Neurocomputing 2022, 487, 46–65. [Google Scholar] [CrossRef]
Liu, B.; Lv, J.; Fan, X.; Luo, J.; Zou, T.; Kumar, M. Application of an Improved DCGAN for Image Generation. Mob. Inf. Syst. 2022, 2022, 9005552. [Google Scholar] [CrossRef]
Han, X.; Ma, J.; Cheng, H.L.; Feng, L. Application Research on Image Recovery Technology Based on GAN. J. Sens. 2024, 2024, 7498160. [Google Scholar] [CrossRef]
Schonfeld, E.; Schiele, B.; Khoreva, A. A U-Net Based Discriminator for Generative Adversarial Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8204–8213. [Google Scholar]
Kunwar, A.; Pant, D.R.; Skön, J.-P.; Heikkonen, J.; Turjamaa, R.; Kanth, R. Ocular Disease Classification Using CNN with Deep Convolutional Generative Adversarial Network. In Advances in Computer Science and Ubiquitous Computing; Lecture Notes in Electrical Engineering; Springer: Berlin/Heidelberg, Germany, 2024; pp. 74–82. [Google Scholar]
Lv, Y.; Wang, J.; Gao, G.; Li, Q. LW-DCGAN: A lightweight deep convolutional generative adversarial network for enhancing occluded face recognition. J. Electron. Imaging 2024, 33, 053057. [Google Scholar] [CrossRef]
Pan, S.; Yin, X.; Ding, M.; Liu, P. SIR-DCGAN: An Attention-Guided Robust Watermarking Method for Remote Sensing Image Protection Using Deep Convolutional Generative Adversarial Networks. Electronics 2025, 14, 1853. [Google Scholar] [CrossRef]
Xu, Z.; Zhang, C.; Wu, Y. Digital inpainting of mural images based on DC-CycleGAN. Herit. Sci. 2023, 11, 169. [Google Scholar] [CrossRef]
Zhang, J.; Bai, S.; Zeng, X.; Liu, K.; Yuan, H. Supporting historic mural image inpainting by using coordinate attention aggregated transformations with U-Net-based discriminator. npj Herit. Sci. 2025, 13, 305. [Google Scholar] [CrossRef]
Abuwatfa, W.H.; AlSawaftah, N.; Darwish, N.; Pitt, W.G.; Husseini, G.A. A Review on Membrane Fouling Prediction Using Artificial Neural Networks (ANNs). Membranes 2023, 13, 685. [Google Scholar] [CrossRef]
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed]
Huang, N.; He, J.; Zhu, N.; Xuan, X.; Liu, G.; Chang, C. Identification of the source camera of images based on convolutional neural network. Digit. Investig. 2018, 26, 72–80. [Google Scholar] [CrossRef]
Kucheryavski, S. Extracting useful information from images. Chemom. Intell. Lab. Syst. 2011, 108, 2–12. [Google Scholar] [CrossRef]
Liew, S.S.; Khalil-Hani, M.; Bakhteri, R. Bounded activation functions for enhanced training stability of deep neural networks on visual pattern recognition problems. Neurocomputing 2016, 216, 718–734. [Google Scholar] [CrossRef]
Eckle, K.; Schmidt-Hieber, J. A comparison of deep networks with ReLU activation function and linear spline-type methods. Neural Netw. 2019, 110, 232–242. [Google Scholar] [CrossRef]
Cordeiro, G.M.; Ortega, E.M.M.; Popović, B.V.; Pescim, R.R. The Lomax generator of distributions: Properties, minification process and regression model. Appl. Math. Comput. 2014, 247, 465–486. [Google Scholar] [CrossRef]
Li, Y.; Xiao, N.; Ouyang, W. Improved generative adversarial networks with reconstruction loss. Neurocomputing 2019, 323, 363–372. [Google Scholar] [CrossRef]
Plakias, S.; Boutalis, Y.S. Exploiting the generative adversarial framework for one-class multi-dimensional fault detection. Neurocomputing 2019, 332, 396–405. [Google Scholar] [CrossRef]
Yang, J.; Lin, Y.; Ou, B.; Zhao, X. Image decomposition-Based structural similarity index for image quality assessment. EURASIP J. Image Video Process. 2016, 2016, 31. [Google Scholar] [CrossRef]

Figure 1. Schematic diagram of generating adversarial network model.

Figure 2. I-DCGAN Flowchart.

Figure 3. Schematic diagram of encoder working principle.

Figure 4. Schematic diagram of the dual attention module.

Figure 5. Schematic diagram of channel converter.

Figure 6. Structure of multi-scale feature segmentation network based on dual attention module.

Figure 7. Schematic diagram of the discriminator architecture and working principle.

Figure 8. Schematic diagram of generator training process.

Figure 9. Schematic diagram of discriminator training process.

Figure 10. Comparison of the effectiveness of restoration of water-eroded versus rain-eroded and weathered and salted sculptures using different methods.

Figure 11. Comparison of the restoration effects of different methods on water and rain erosion and weathered and salted sculptures by quantitative indicators.

Figure 12. Comparison of the effectiveness of restoration of peeling and lamellar detachment sculptures using different methods.

Figure 13. Comparison of the restoration effects of different methods on peeling and lamellar detachment sculptures by quantitative indicators.

Figure 14. Comparison of the effectiveness of restoration of discolored sculptures using different methods.

Figure 15. Comparison of the restoration effects of different methods on discolored sculptures.

Table 1. Structural comparison between T-DCGAN and I-DCGAN.

Module	Traditional DCGAN	Improved I-DCGAN
Generator	Stack of transposed convolutions	U-Net with dual attention modules + channel converter
Discriminator	Binary real/fake judgment	Patch-level judgment with region segmentation
Attention	None	Spatial + Channel
Loss Function	Adversarial loss	Adversarial + Perceptual + SSIM (multi-objective loss)

Table 2. Image quality evaluation metrics explanation.

Name	Meaning	Value Range	Interpretation
SSIM	Measures similarity in brightness, contrast, and structure between images	[0, 1] (1 means perfectly identical)	The closer to 1, the more similar the image is to the original
MSE	Average of the squared differences between corresponding pixels of two images	[0, ∞) (0 means perfectly identical)	The smaller the better; 0 means completely identical
PSNR	Ratio between image signal strength and noise strength (measured in dB)	Typically between 20 and 40 dB	The higher the better; higher values indicate less distortion

Table 3. Summary of Hypotheses Testing Results.

Hypothesis	Description	Supported?	Evidence from Results
H1	I-DCGAN achieves higher SSIM than traditional DCGAN	Yes	SSIM ↑ 5.56% (Water and Rain Erosion), ↑ 5.5% (Peeling), ↑ 9.2% (Discoloration)—Figure 11, Figure 13 and Figure 15
H2	I-DCGAN achieves higher PSNR than traditional DCGAN	Yes	PSNR ↑ 6.22% (Water and Rain Erosion), ↑ 23.5% (Peeling), ↑ 27.9% (Discoloration)—Figure 11, Figure 13 and Figure 15
H3	I-DCGAN achieves lower MSE than traditional DCGAN	Yes	MSE ↓ 19.67% (Water and Rain Erosion), ↓ 16.8% (Peeling), ↓ 12.2% (Discoloration)—Figure 11, Figure 13 and Figure 15
H4	I-DCGAN outperforms all other models across damage types	Yes	Best SSIM, highest PSNR, lowest MSE across all scenarios—Figure 11, Figure 12, Figure 13, Figure 14 and Figure 15

Note: ↑ and ↓ denote increase and decrease, respectively.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, Y.; Ismail, I.; Hadi, H.A. Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism. Appl. Sci. 2025, 15, 9346. https://doi.org/10.3390/app15179346

AMA Style

Fang Y, Ismail I, Hadi HA. Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism. Applied Sciences. 2025; 15(17):9346. https://doi.org/10.3390/app15179346

Chicago/Turabian Style

Fang, Yang, Issarezal Ismail, and Hamidi Abdul Hadi. 2025. "Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism" Applied Sciences 15, no. 17: 9346. https://doi.org/10.3390/app15179346

APA Style

Fang, Y., Ismail, I., & Hadi, H. A. (2025). Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism. Applied Sciences, 15(17), 9346. https://doi.org/10.3390/app15179346

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Digital Restoration of Sculpture Color and Texture Using an Improved DCGAN with Dual Attention Mechanism

Abstract

1. Introduction

2. Related Technologies Background

2.1. DCGAN Algorithm

2.2. DCGAN Generator

2.3. DCGAN Discriminator

3. Improved DCGAN Model Construction

3.1. Generator Architecture Improvement

3.2. Discriminator Improved

3.3. Loss Function Design

4. Experiment and Results Analysis

4.1. Experimental Program

4.2. Experimental Environment and Evaluation Index

4.3. Evaluation of Different Algorithms for Damaged Sculpture Restoration

4.3.1. Water and Rain Erosion and Weathered and Salted Sculpture Restoration

4.3.2. Peeling and Lamellar Detachment Sculpture Restoration

4.3.3. Discolored Sculpture Restoration

4.4. Experimental Testing Results Summary

5. Discussion

5.1. Potential Applications and Implications of This Study

5.2. Limitations and Constraints of This Study

5.3. Future Research Directions for AI-Based Sculpture Restoration

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A. Core Modules of the Proposed I-DCGAN (Pseudo-Code)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI