Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments

Alaswad, Daniyah; Zohdy, Mohamed A.

doi:10.3390/app15158670

Open AccessArticle

Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments

by

Daniyah Alaswad

^* and

Mohamed A. Zohdy

Department of Electrical and Computer Engineering, Oakland University, Rochester, MI 48309, USA

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(15), 8670; https://doi.org/10.3390/app15158670

Submission received: 22 April 2025 / Revised: 30 May 2025 / Accepted: 5 June 2025 / Published: 5 August 2025

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

Download

Browse Figures

Versions Notes

Abstract

Artificial intelligence technologies, particularly machine learning and computer vision, are being increasingly utilized to preserve, restore, and create immersive virtual experiences with cultural artifacts and sites, thus aiding in conserving cultural heritage and making it accessible to a global audience. This paper examines the performance of Generative Adversarial Networks (GAN), especially Style-Based Generator Architecture (StyleGAN), as a deep learning approach for producing realistic images of Egyptian monuments. We used Sigmoid loss for Language–Image Pre-training (SigLIP) as a unique image–text alignment system to guide monument generation through semantic elements. We also studied truncation methods to regulate the generated image noise and identify the most effective parameter settings based on architectural representation versus diverse output creation. An improved discriminator design that combined noise addition with squeeze-and-excitation blocks and a modified MinibatchStdLayer produced 27.5% better Fréchet Inception Distance performance than the original discriminator models. Moreover, differential evolution for latent-space optimization reduced alignment mistakes during specific monument construction tasks by about 15%. We checked a wide range of truncation values from 0.1 to 1.0 and found that somewhere between 0.4 and 0.7 was the best range because it allowed for good accuracy while retaining many different architectural elements. Our findings indicate that specific model optimization strategies produce superior outcomes by creating better-quality and historically correct representations of diverse Egyptian monuments. Thus, the developed technology may be instrumental in generating educational and archaeological visualization assets while adding virtual tourism capabilities.

Keywords:

generative adversarial networks (GANs); StyleGAN; Egyptian monuments; differential evolution; SigLIP; deep learning

1. Introduction

The breathtaking heritage of ancient Egyptian monuments continues to face deterioration from natural weather elements, air pollution, and recurrent resource shortages while battling damage from the continuous heavy foot traffic of visitors [1,2].

Recent technological developments have allowed for the generation of high-quality images that have enabled the evolution of electronic preservation systems that address modern cultural heritage needs, facilitate global access to cultural heritage through virtual monuments, create educational learning materials, and provide a valuable tool to guide monument and artifact restoration programs [3,4,5].

In particular, Generative Adversarial Networks (GANs) have been instrumental in image synthesis within multiple domains, with a style-based generator architecture for Generative Adversarial Networks (StyleGAN) as the leading architecture for controlling high-resolution image creation [6]. Past studies have illustrated the capacity of GANs to contribute to cultural heritage preservation and dissemination through visual training enhancement and Chinese calligraphy generation [7] alongside artifact virtual restoration and interactive historical scene development [8]. However, existing applications have limitations in generating the unique characteristics of Egyptian monuments because they incorporate properties such as extensive spatial dimensions along with highly geometric structures, ornate decorations, and centuries of natural weathering patterns. Moreover, the application of StyleGAN for generating accurate imagery of Egyptian monuments remains underexplored in computational creativity research and digital heritage conservation.

Thus, this research aims to fill this gap by developing StyleGAN architectures to generate photorealistic Egyptian monument imagery, focusing on architectural optimization and image manipulation while evaluating the performance of these architectures. Specifically, this research follows five main objectives:

To develop a specialized StyleGAN framework while evaluating its ability to generate realistic representations of Egyptian monuments that preserve cultural heritage site characteristics and visual appeal;
Regarding the primary objective, the goal is to evaluate the original discriminator design against enhanced versions that add noise injection, a squeeze-and-excitation block, as well as improved statistical layer discriminators;
To implement and evaluate sigmoid loss for language image pre-training (SigLIP) when configuring semantic alignments between text descriptions and visual representations of Egyptian monuments;
To explore the use of Differential Evolution (DE) for latent-space optimization in monument generation and check its success rate against proven methods;
To investigate how different truncation values affect the quality–diversity relationship while generating Egyptian monument imagery for noise control.

2. A Literature Review

2.1. Generative Adversarial Networks

The emergence of a generative concept based on adversarial networks can be traced back to Goodfellow et al. [9] and has become the standard for generative tasks. GANs operate through an adversarial training system that employs two neural networks, where the generator battles against the discriminator during a max competition. While the generator builds fake data to fit the target needs during training, the discriminator determines how to distinguish genuine objects from fakes. Different advanced modifications distinguish the evolutionary steps of the GAN architecture. For instance, Radford et al. [10] created deep convolutional GANs by implementing convolutional layers alongside leaky ReLU activations and batch normalization that stabilized training, thus enhancing the data generation results. Further, Arjovsky et al. [11] introduced Wasserstein GANs (WGANs) by using Wasserstein distance instead of Jensen–Shannon divergence in the original GAN construction, which improved training stability and provided a superior gradient path.

2.2. StyleGAN Architecture

The developers of StyleGAN addressed major GAN architectural shortcomings by incorporating style transfer principles in the generative process by Karras et al. [12]. The main improvement in StyleGAN is its mapping network, which binds the latent input codes to an intermediate style control region through adaptive instance normalization at different image resolution levels (AdaIN).

Karras et al. [13] presented StyleGAN2, which resolves different issues in its predecessor. The updates in this version included the addition of a new generator framework without water droplet defects and the addition of path length regularizers, which strengthened the capabilities of the latent space and changed progressive growth procedures. Additionally, Karras et al. [14] primarily developed StyleGAN3 to address the aliasing issues that create problematic artifacts during latent code usage.

2.3. GANs in Cultural Heritage Applications

The use of GANs in cultural heritage domains has become the focus of multiple research investigations. For instance, Lyu et al. [7] developed GANs to create realistic imagery focused on Chinese architectural sites, thereby demonstrating their beneficial applications for architectural assessment and conservation. Later, Sabatelli et al. [15] demonstrated that GANs can reproduce characteristics of specialized artistic styles.

Furthermore, Anwar [16] presented a method using generative models to restore architectural buildings that only contained fragmentary elements. Similarly, Cao et al. [17] employed GANs for digital restoration by leveraging models to rebuild missing artwork elements based on existing visual information and artistic stylistic frameworks.

2.4. Egyptian Monuments in Digital Contexts

Existing research on Egyptian monuments in digital environments has mainly concentrated on documentation and 3D reconstruction processes. Hawkins et al. [18] created systems to record Egyptian monuments under high-definition digital conditions regarding illumination effects and surface characteristics. More recently, Grilli et al. [19] performed a study to understand the high-definition digital replication generation capabilities of photogrammetry and laser scanning methods for Egyptian monuments. Furthermore, Guidi et al. [20] used archaeological data integration with procedural modeling for the virtual reconstruction of partially destroyed Egyptian temples. In addition, Gabellone [21] established research methods to produce historically accurate digital restorations of Egyptian architectural heritage.

A summary of the literature and previous contributions to the digital reconstruction of cultural artifacts is presented in Table 1.

3. Methodology

3.1. Data Collection

This study developed research data using various complementary information sources to build a broad and accurate database of Egyptian monumental structures. The primary data sources included the following:

Crowdsourced Collections: Wikimedia Commons, the world’s largest crowdsourced media repository, delivers historically documented images that the public has verified for accuracy;
Research Repositories: Kaggle’s Egypt Monuments Dataset (222 MB) provides a curated collection of monument pictures [2].

The final combined dataset comprised approximately 5000 photos, according to the target research requirements, from diverse historical periods and geographical locations. In addition, multiple Egyptian architectural features, such as pyramids, temples, obelisks, sphinxes, and additional elements, were represented. The original image resolution varied between 256 × 256 pixels and 512 × 512 pixels.

Multiple approaches have been implemented to ensure dataset quality control. First, a verification process was implemented while developing the image database to confirm that every source image originated from reputable public-domain repositories and confirmed metadata, such as licensing information or explicit rights statements. Additionally, reverse image searches were conducted using tools such as Google Images and TinEye to cross-reference and rule out potential copyright restrictions and ambiguous usage terms.

The dataset has been cleaned of any images that were duplicates or nearly duplicates using perceptual hashing techniques. The method employed for comparison was a representation of each image as a hash, which allowed for easy computation of visual likeness. Images that had hash values lower than a specific threshold of resemblance set to T were identified as duplicates and removed without preserving diversity and quality within the dataset [22]. Removing overfitting to needless patterns increases StyleGAN output generation while maintaining dataset integrity improves algorithm performance [12].

Third, a group of images (n = 500) was subjected to manual review to ensure that they maintained quality requirements. In addition, strict adherence to the terms of service stipulated by the data collection platforms was maintained throughout the research process to ensure ethical data utilization. A data processing flowchart is presented in Figure 1.

3.2. Image Preprocessing

All collected images underwent a rigorous preprocessing pipeline comprising three fundamental steps to guarantee consistent and effective model training (Figure 1). First, RGB color space normalization involved removing alpha channels and converting all images to the RGB color space format. Second, all images were resized to 256 × 256 pixels using the bilinear interpolation method for standardization. Third, the normalization range of the pixel values ranged from 0 to 1 to guarantee smooth training gradients.

The preprocessed dataset was converted to the TFRecords format, which provided efficient binary storage, optimized I/O performance, reduced preprocessing overhead during training, and supported parallel data loading, which significantly improved the training throughput by minimizing bottlenecks during the computationally intensive TensorFlow-based StyleGAN3 model training processes.

The figure has been enhanced to explicitly visualize each of the three fundamental steps. The revised figure now shows Step 1 with a visual example of RGB color space normalization removing alpha channels, Step 2 demonstrating bilinear resizing from variable dimensions to standardized 256 × 256 pixels with before/after comparison images, and Step 3 illustrating pixel value normalization from [0, 255] to [0, 1] range with numerical representation.

3.3. StyleGAN Architecture

The StyleGAN generator followed its classic flow by using several components that operated in the following order:

The generator adopts a dense layer that takes random input vectors as its starting point to create its initial feature vector representation;
The transposed convolutional layers within the Upsampling Path increased the spatial dimensions of the feature maps one step at a time;
A normalization step occurs after every transposed convolutional layer within the sequence;
LeakyReLU activation functions were applied after normalization;
The output layer adopts a tanh activation method, which generates pixel values suitable for image content.

The StyleGAN generator is trained in an adversarial framework, where it transforms random noise vectors

z

into synthetic images,

G (z) .

The generator’s objective is to maximize the probability that the discriminator will misclassify its synthetic outputs as authentic, formalized through the loss function as follows:

L_{G} = - E_{z \sim p_{z} (z)} [\log (D (G (z)))]

(1)

To minimize this loss, backpropagation computes gradients relative to the generator parameters

θ_{G},

with updates performed via gradient descent:

θ_{G} \leftarrow θ_{G} - η \nabla_{θ_{G}} L_{G}

, where

η

denotes the learning rate.

The architecture of StyleGAN enhances this process through a mapping network that transforms input latent codes into an intermediate latent space, enabling style control across different resolutions [12]. Figure 2 illustrates the discrepancies in the initial training phases.

The discriminator noticed remarkable changes in the building’s structure.

During training, the discriminator receives noise from Gaussian distributions that are applied to its input features based on the following equation:

x_{n o i s e} = x + α ε, w h e r e ε ~ N (0, I)

(2)

2.

A fundamental design of the squeeze-and-excitation blocks through two core phases performs an explicit channel-dependence evaluation, which includes the following:

Squeeze: Global average pooling compresses the spatial information. The excitation portion contains two fully connected layers that determine the channel-weight values;
The learned weight determines the scaling of the original features in the channel-wise scaling.

The squeeze-and-excitation (SE) blocks are integrated into the discriminator after every second convolutional layer, using a reduction ratio of 16 and sigmoid activation for the gating mechanism. The improved MinibatchStdLayer is positioned before the final classification layer with a group size of 4 samples and includes numerical stability enhancements through epsilon addition (ε = 1 × 10⁻⁸) to prevent division by zero during variance calculations. The squeeze-and-excitation (SE) blocks are integrated into the discriminator after every second convolutional layer, using a reduction ratio of 16 and sigmoid activation for the gating mechanism.

Standard deviation is calculated according to how the data is grouped by batches;
Optional variance computation;
Adaptive group size handling;
Numerical stability improvements;
ResNet-style Skip Connections: Skip connections facilitate an improved gradient flow through the network.

y = F (x) + x

(3)

The entire Enhanced Discriminator Architecture is illustrated in Figure 3, and a comparison of the images generated by the default and custom discriminators is shown in Figure 4.

3.4. SigLIP for Image–Text Alignment

During every step of our evaluation, the SigLIP system was the main resource for optimizing the mapping of the lexicon to image results of Egyptian monuments. In particular, the Segmented Image Generation and Language Image Processing (SigLIP) model cont11ex-framed-image-audio (siglip-base-patch16-224) was used for complete captioning and imaging tasks alongside the semantic assessment framework.

To assess the image with the text framework, calculation of the distance that divides the standardized embedding vectors Euclidean-equation norm value results needs combining the following equation:

E r r o r = {‖n o r m a l i z e (f_{i m g} (I)) - n o r m a l i z e {(f}_{t e x t} (T))‖}_{2}

(4)

The SigLIP implementation used the siglip-base-patch16-224 pre-trained model, including the following system components:

Text-processing pipeline
- Turning the target descriptions to tokenized inputs,
- Obtaining text embeddings using the SigLIP encoder;
Image-processing pipeline
- Performance evaluation employing various truncation configurations was conducted on the StyleGAN3 model for generating images of Egyptian monuments,
- The comparison of visual embeddings obtained using the SigLIP vision encoder with CLIP was conducted.

The SigLIP model was integrated with no training on top of it by adding the pre-trained siglip-base-patch16-224 weights.

Generated images were preprocessed to 224 × 224 pixels and normalized to [−1, 1], while text descriptions were tokenized with a 77-token maximum length. As per Equation (4), images generated from StyleGAN were compared against the provided descriptions, and the evaluation was performed iteratively. As previously noted, as the fitness function, alignment scores directed latent code optimization through differential evolution.

3.5. Differential Evolution Optimization

We used the DE algorithm for latent-space optimization to enact systematic searches for seed values that generate images with maximum semantic relationships to target descriptions. The selection of DE over alternative optimization strategies was motivated by several factors specific to latent space optimization in StyleGAN. Unlike gradient-based methods (Adam, SGD), DE does not require differentiable objective functions, making it suitable for our semantic alignment task, which involves discrete image–text matching.

The DE algorithm is executed through a self-guided sequence of specific operations.

The algorithm starts with random initiation of seed candidate populations as its initial step;
A mutant vector ( $v_{i}$ ) was generated for each population seed ( $s_{i}$ ), where the mutation factor (F) determined the amount of exploration in the search.

$v_{i} = s_{r 1} + F (s_{r 2} - s_{r 3})$

(5)
The mutated seed ( $v_{i}$ ) undergoes a crossover to merge with the original seed ( $s_{i}$ ).

Among the seeds evaluated for inclusion in the next generation, the seed with the least errors was chosen.

3.6. Truncation Analysis for Noise Control

We investigated how noise impacts image quality and diversity using different truncation ranges from 0.1 to 1.0. In StyleGAN, truncation works by altering the latent-space sampling distribution based on the equation:

w^{'} = \bar{w} + ψ (w - \bar{w})

(6)

The amount of deviation from the mean in the equation depends on a term called (ψ).

3.7. Evaluation Metrics

All the following options were included in the model assessment evaluation metrics together with other options.

The Fréchet Inception Distance (FID) was used to measure the statistical similarity between the distributions of the real and generated images using the following equation:

$F I D = {‖μ_{r} - μ_{g}‖}^{2} + T r (\sum_{r} + \sum_{g} - 2 {(\sum_{r} \sum_{g})}^{1 / 2})$

(7)
The Inception Score (IS) was used to measure the dual aspects of the generated image quality and diversity as follows:

$I S (G) = {e x p ({|E}_{x ~ p_{g}} [D_{K L} (p (y| x)‖}_{p (y)})])$

(8)
A separate evaluation was performed to determine the extent to which the learned distribution covered its targets and accurately represented their quality;
Architectural accuracy and historical fidelity, diversity between designs and the existence of artifacts, and material rendering quality were evaluated through visual inspections by expert evaluators.

4. Results

4.1. Discriminator Architecture Comparison

The investigation of discriminator network designs demonstrated major performance differences as superior designs surpassed the baseline framework (Table 2).

The results demonstrate that SE blocks contributed the largest individual improvement (FID reduction of 2.9 points), followed by noise injection (2.5 points), improved MinibatchStdLayer (2.2 points), and skip connections (1.49 points). Skip connections improved the training gradient flow, subsequently decreasing training time by about 15% while still contributing to the overall performance improvements.

The refined model based on the enhanced discriminator architecture yielded better results across all the evaluation standards. The FID rating improved by 27.5% (33.3 → 24.21) compared with the initial model (Figure 5). The IS improved by 12% (4.98 → 5.58), indicating improved image quality and diversity (Figure 6). The enhanced design also significantly increased precision (0.34 → 0.41) and recall values (0.13 → 0.19), indicating better quality performance and a wider diversity range with a 38.1% better F1 score (Figure 7 and Figure 8).

Adding features of the noise injection, the squeeze-and-excitation blocks, and the improved Mini-batchStdLayer to the discriminator-enhanced architecture allows us to generate realistic Egyptian monuments while maintaining the diversity of outputs.

4.2. SigLIP Image–Text Alignment

The SigLIP implementation measured and improved the semantic match between Egyptian monument images generated from text. During testing, SigLIP resulted in better alignment accuracy than the baseline CLIP models, which resulted in an improvement of 71%. The application of DE optimization further decreased the alignment error by 8% to 15% across different monument types. Images produced through truncation levels between 0.5 and 0.7 provided the most accurate alignment between synthesized images and written descriptions. The best results were achieved when the truncation measures were administered at a moderate degree.

Furthermore, the tuned StyleGAN3 model yielded accurate outputs of Egyptian architectural elements found in text inputs, producing effective representations of famous structures, including pyramids and sphinxes (Figure 9; Table 3). SigLIP proved its merit by accomplishing visual concept recognition from natural language descriptions with an accuracy of 95.5%.

4.3. Truncation Parameter Analysis

Conducting our analysis with systematic changes in the truncation parameters (ψ) within the range of 0.1 to 1.0 prompted us to reach the subsequent conclusions:

Low Truncation Values (0.1–0.3) produced highly photorealistic monument images, enhanced visual quality, reduced artifacts, and exhibited limited architectural diversity;
Medium Truncation Values (0.4–0.7) achieved an optimal balance between fidelity and preservation of architectural diversity. The Egyptian monuments dataset reached its best results using ψ = 0.7 for data-tasing (Figure 10);
High Truncation Values (0.8–1.0) maximized architectural diversity and creative variations and increased the likelihood of artifacts and unrealistic features, with the occasional appearance of novel architectural combinations.

4.4. Differential Evolution Performance

Implementing DE as an evolutionary optimization algorithm helped establish better semantic links between Egyptian monument images and specific textual descriptions during latent space exploration. The DE-optimized seeds generated images with 15% less alignment error for specific Egyptian monuments than the randomly selected seeds (Figure 11).

The optimization process generated accurate imagery for Egyptian monument architectural features, such as obelisks, along with an accurate reproduction of sphinx characteristics and pyramid designs. The DE algorithm executed an organized search to find the best seed values by generating images with the highest target description conformity using sequential processes of initial seed selection, followed by seed alteration, a crossover step, and a final selection process.

This project enables genetic models and evolutionary optimization methods to pair and produce specific cultural heritage visual content. The DE optimization method delivered beneficial results for generating excellent pictures of monument types that preserved architectural legitimacy.

5. Discussion

This research investigated the application of the StyleGAN architecture for realistic Egyptian monument image generation, as it fills an important void between generative AI and cultural heritage preservation. Systematic research confirms that properly developed generative adversarial networks can successfully represent the distinct architectural characteristics, proportional properties, and stylistic motifs found in Egyptian monumental architecture.

This study offers the most relevant new advances in the field throughout the research process. For better generation results, we developed an improved noise-injection discriminator design that includes squeeze-and-excitation blocks and an additional MinibatchStdLayer module. Using SigLIP, we were able to perform semantically guided specific monument type generation through image–text alignment. The best DE method was used for optimization in the latent space, where the best seeds were found for problem settings. In the last stage, a systematic evaluation of the truncation parameter effects on the quality–diversity balance of monument generation was performed.

The performance of this study is constrained by numerous limitations, the most significant being the boundaries and scope of the training dataset. The dataset had a considerable number of examples, but it left out numerous historical periods and regions associated with Egyptian monumental structures. While our architectural modifications were beneficial for improving discrimination performance, there were several problems with reliably obtaining the representation of small features, including, but not limited to, damaged patterns and hieroglyphics. Because of the available resources, the extensive computational resources needed to be efficient with these neural network model optimizations are a barrier to widespread use.

6. Future Research Directions

Further research opportunities with promise can stem from this study along the following lines:

Extending differential evolution optimization to multi-objective scenarios using our optimal truncation range (ψ = 0.4–0.7) as constraints;
Developing specialized loss functions incorporating squeeze-and-excitation attention patterns to improve hieroglyphic detail generation;
Implementing few-shot learning adaptations of our enhanced discriminator for underrepresented monument types. Fourth, integrating SigLIP-guided generation with 3D data for controlled viewpoint synthesis.

7. Conclusions

It was demonstrated that StyleGAN frameworks, in various forms, have the power to generate many renderings of Egyptian architecture. The technical field of generative modeling and the applied domain of cultural heritage preservation benefit from architectural enhancements and optimization techniques, as well as the evaluation methods developed here. The creation of generative AI marks new horizons regarding safeguarding heritage collections by aiding in the teaching process as well as improving access to cultural heritage locations.

From a technical perspective, we successfully integrated noise injection, squeeze-and-excitation blocks, and improved MinibatchStdLayer into the StyleGAN discriminator, demonstrated effective semantic control through SigLIP-guided generation, established a systematic truncation analysis framework for quality–diversity balance, and validated differential evolution as an effective optimization strategy for latent space exploration. These technical achievements enabled the generation of diverse, historically accurate representations of Egyptian monuments, including pyramids, sphinxes, temples, and obelisks, with 95.5% accuracy in visual concept recognition from natural language descriptions, producing images suitable for educational and archaeological visualization applications.

Author Contributions

Conceptualization, D.A. and M.A.Z.; methodology, D.A. and M.A.Z.; software, D.A.; validation, D.A. and M.A.Z.; formal analysis, D.A. and M.A.Z.; investigation, D.A. and M.A.Z.; resources, D.A. and M.A.Z.; writing—original draft preparation, D.A.; writing—review and editing, D.A.; visualization, D.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are publicly available, as referenced in [2].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Salem, A.E.; Eissa, A.T.; Hassan, T.H.; Saleh, M.I. Preserving the Past: A Dynamic Analysis of Heritage Tourism and Land Conservation in Mamluk Cairo. Heritage 2025, 8, 30. [Google Scholar] [CrossRef]
Hassan, M.A.; Hamdy, A.; Nasr, M. Egypt Monuments Dataset Version 1: A scalable benchmark for image classification and monument recognition. Int. J. Adv. Comput. Sci. Appl. 2023, 14. [Google Scholar] [CrossRef]
Gîrbacia, F. An Analysis of Research Trends for Using Artificial Intselligence in Cultural Heritage. Electronics 2024, 13, 3378. [Google Scholar] [CrossRef]
Hsieh, K.; Tsaur, T.S.S.; Chao, M.; Li, I.C.; Huang, P.C.; Lu, M.J. Cultural Heritage Meets AI: Advanced Text-to-Image Models for Digital Reconstruction and Preservation. In Proceedings of the 2024 6th International Conference on Control and Robotics (ICCR), Yokohama, Japan, 5–7 December 2024; pp. 265–269. [Google Scholar]
Muradov, M.; Gardyński, A.; Markiewicz, J. Integration of multi-temporal photogrammetric images in conservation work at the Royal Castle in Warsaw. J. Mod. Technol. Cult. Herit. Preserv. 2024, 3. [Google Scholar] [CrossRef] [PubMed]
Dash, A.; Ye, J.; Wang, G. A Review of Generative Adversarial Networks (GANs) and Its Applications in a Wide Variety of Disciplines: From Medical to Remote Sensing. IEEE Access 2024, 12, 18330–18357. [Google Scholar] [CrossRef]
Lyu, P.; Bai, X.; Yao, C.; Zhu, X.; Huang, T.; Liu, G. Auto-Encoder Guided GAN for Chinese Calligraphy Synthesis. In Proceedings of the 2017 14th IAPR International Conference on Document Analysis and Recognition (ICDAR), Kyoto, Japan, 9–15 November 2017; pp. 1095–1100. [Google Scholar]
Noel, V.A.A. New Technologies in the Preservation of Cultural Artifacts with Spatial, Temporal, Corporeal, Kinetic Dimensions: Artifacts in the Trinidad Carnival. Stud. Digit. Herit. 2017, 1, 251–268. [Google Scholar] [CrossRef][Green Version]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; MIT Press: Cambridge, MA, USA, 2014; Volume 2, pp. 2672–2680. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar] [CrossRef]
Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; JMLR: Brookline, MA, USA, 2017; Volume 70, pp. 214–223. [Google Scholar]
Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4396–4405. [Google Scholar]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar]
Karras, T.; Aittala, M.; Laine, S.; Härkönen, E.; Hellsten, J.; Lehtinen, J.; Aila, T. Alias-free generative adversarial networks. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Online, 6–14 December 2021; p. 66. [Google Scholar]
Sabatelli, M.; Kestemont, M.; Daelemans, W.; Geurts, P. Deep Transfer Learning for Art Classification Problems. In Proceedings of the Computer Vision—ECCV 2018 Workshops, Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2019; pp. 631–646. [Google Scholar]
Anwar, M. Practical Techniques for Restoration of Architectural Formation Elements in Historical Buildings. World J. Eng. Technol. 2019, 7, 193–207. [Google Scholar] [CrossRef]
Cao, J.; Zhang, Z.; Zhao, A.; Cui, H.; Zhang, Q. Ancient mural restoration based on a modified generative adversarial network. Herit. Sci. 2020, 8, 7. [Google Scholar] [CrossRef]
Hawkins, T.; Cohen, J.; Debevec, P. A photometric approach to digitizing cultural artifacts. In Proceedings of the 2001 Conference on Virtual Reality, Archeology, and Cultural Heritage, Glyfada, Greece, 28–30 November 2001; pp. 333–342. [Google Scholar]
Grilli, E.; Remondino, F. Machine Learning Generalisation across Different 3D Architectural Heritage. ISPRS Int. J. Geo-Inf. 2020, 9, 379. [Google Scholar] [CrossRef]
Guidi, G.; Frischer, B.D. 3D Digitization of Cultural Heritage. In 3D Imaging, Analysis and Applications; Liu, Y., Pears, N., Rosin, P.L., Huber, P., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 631–697. [Google Scholar]
Gabellone, F. Digital Twin: A new perspective for cultural heritage management and fruition. Acta Imeko 2022, 11, 7. [Google Scholar] [CrossRef]
Zauner, C. Implementation and Benchmarking of Perceptual Image Hash Functions. Master’s Thesis, University of Applied Sciences, Hagenberg, Austria, 2010. [Google Scholar]

Figure 1. Data preprocessing pipeline for StyleGAN training on Egyptian monuments.

Figure 2. Early-stage StyleGAN3 generator outputs after 1000 training iterations.

Figure 3. Enhanced discriminator architecture with integrated noise injection, squeeze-and-excitation blocks, and improved MinibatchStdLayer.

Figure 4. Comparison between default StyleGAN discriminator (top) and enhanced custom discriminator (bottom) outputs.

Figure 5. Comparison of FID scores between default and custom discriminators during training.

Figure 6. Comparison of IS values between default and custom discriminators during training.

Figure 7. Recall performance comparison between CustomGAN and StyleGAN during training.

Figure 8. Precision performance comparison between CustomGAN and StyleGAN during training.

Figure 9. SigLIP alignment demonstration with text prompts, generated images, and real references.

Figure 10. Egyptian monument images generated with different truncation values showing quality–diversity tradeoff.

Figure 11. DE Optimizer Application.

Table 1. Gap analysis from previous research.

Study	Contributions	Identified Gaps	Future Research Directions
Karras et al. [12] StyleGAN	First style-based GAN architecture: high-quality images from GANs operating with separate visual control.	Artifacts in generated images. Limited application to cultural heritage domains. No specific adaptations for architectural content.	Technology adaptations operate in particular fields while partnering with semiconductor technologies and semantic administration systems
Karras et al. [13] StyleGAN2	Water droplet artifact removal, path length regulation, generator architectural enhancement.	There is not enough review within cultural domains. It takes a lot of time and effort to conduct an evaluation. Not enough attention is given to details.	There was simultaneous progress on the methods of domain transfer, the development of efficient training approaches, and detail enhancement techniques
Lyu et al. [7]	Collection of Generation of Chinese Architectural Heritage datasets, creation of domain-specific datasets, implementation of style transfer techniques.	Evaluation metrics of the model were limited, and there was no focus on Egyptian monuments, whereas the architectural detail required more thorough protective measures.	Cross-cultural architectural generation Egyptian-specific adaptations Enhanced evaluation frameworks.
Lyu et al. [7]	StyleGAN for Chinese calligraphy, cultural artifact generation, and detail preservation techniques.	The style transfer applications involved 2D art but not architecture. Application to Egyptian cultural heritage was excluded, so there was limited understanding of hierarchical features.	Extension to 3D architectural forms, Egyptian hieroglyphic integration, and multi-scale feature representation.
Anwar [16]	Reconstruction from fragmentary evidence, archaeological application of GANs, historical accuracy evaluation.	Limited dataset size. No Egyptian monument focus. Insufficient semantic.	The GAN system excels at completing Egyptian monument fragments while attracting expert archaeologists to improve available archaeological datasets for reconstruction.
Grilli et al. [19]	A 3D modeling of Egyptian monuments, photogrammetric techniques, archival quality documentation.	No generative approaches. Limited to existing monuments only. No integration with text descriptions.	The GAN algorithm performs 3D reconstruction alongside creative damage repair operations under text directive.

Table 2. Evaluation results.

Model	Evaluation Metrics
Model	FID ↓	IS ↑	Precision ↑	Recall ↑
1	33.3	4.98	0.34	0.13
2 + Noise Injection	30.8	5.12	0.36	0.15
2 + SE Blocks	27.9	5.35	0.39	0.17
2 + Improved MinibatchStdLayer	25.7	5.48	0.40	0.18
2 + Skip Connections	24.21	5.58	0.41	0.19

Table 3. Image text alignment results.

Model	Alignment Tools
Model	CLIP	SigLip
Top Match	King Tutankhamun	King Tutankhamun
Max Score	0.302	0.999
Tutankhamun	yes	yes
Processing Time	0.025 s	0.052 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Alaswad, D.; Zohdy, M.A. Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments. Appl. Sci. 2025, 15, 8670. https://doi.org/10.3390/app15158670

AMA Style

Alaswad D, Zohdy MA. Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments. Applied Sciences. 2025; 15(15):8670. https://doi.org/10.3390/app15158670

Chicago/Turabian Style

Alaswad, Daniyah, and Mohamed A. Zohdy. 2025. "Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments" Applied Sciences 15, no. 15: 8670. https://doi.org/10.3390/app15158670

APA Style

Alaswad, D., & Zohdy, M. A. (2025). Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments. Applied Sciences, 15(15), 8670. https://doi.org/10.3390/app15158670

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Advanced Deep Learning Methods to Generate and Discriminate Fake Images of Egyptian Monuments

Abstract

1. Introduction

2. A Literature Review

2.1. Generative Adversarial Networks

2.2. StyleGAN Architecture

2.3. GANs in Cultural Heritage Applications

2.4. Egyptian Monuments in Digital Contexts

3. Methodology

3.1. Data Collection

3.2. Image Preprocessing

3.3. StyleGAN Architecture

3.4. SigLIP for Image–Text Alignment

3.5. Differential Evolution Optimization

3.6. Truncation Analysis for Noise Control

3.7. Evaluation Metrics

4. Results

4.1. Discriminator Architecture Comparison

4.2. SigLIP Image–Text Alignment

4.3. Truncation Parameter Analysis

4.4. Differential Evolution Performance

5. Discussion

6. Future Research Directions

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI