Physically Informed Synthetic Data Generation and U-Net Generative Adversarial Network for Palimpsest Reconstruction
Abstract
1. Introduction
- A physically informed synthetic generator modeling parchment degradation.
- A novel GAN architecture with asymmetric skip connections.
- The first comparative analysis of generative models for palimpsest reconstruction.
2. Related Works
- Ink diffusion: parameterized by Fick’s second law of diffusion [16].
- Parchment structure: biomechanical fiber modeling.
- Spectral superposition: wavelength-dependent layer interactions.
3. Methodological Approach
3.1. Synthetic Data Generation Framework
Algorithm 1: Physically Informed Palimpsest Sample Generation |
Input :Image size , script list , font map , corpus , degradation parameters ; Output:RGB image with two text layers and degradation: combined, underlying, overwritten
|
3.2. Model Architecture
3.2.1. Baseline Variational Autoencoder (VAE)
- Reconstruction loss: Binary Cross-Entropy over pixels:
- KL divergence: regularizes latent space:
3.2.2. Enhanced VAE with Attention Mechanisms
3.2.3. Proposed Adversarial Architecture for Palimpsest Reconstruction
3.2.4. Component-Wise Contribution Analysis
3.3. Proposed Methodological Enhancements
3.3.1. Domain-Specific Innovations
3.3.2. Synthetic Dataset and Comprehensive Evaluation Framework
- A combined palimpsest image: A three-channel (RGB) tensor, typically with dimensions of , representing the visually degraded manuscript, with pixel values normalized to the range . This serves as the input to the reconstruction models.
- The underlying text layer: A single-channel tensor, with dimensions of , representing the ground truth of the obscured text. This is the target output for the reconstruction task against which the performance of the model is evaluated.
- The overwritten text layer: A single-channel tensor, with dimensions of , representing the visible inscription layer. Although not the primary reconstruction target, its inclusion facilitates multitask learning potential or a deeper analysis of layer interference.
- Script metadata: Categorical labels indicating the historical scripts (e.g., Greek, Latin, Gothic, and Syriac) used for both the underlying and overwritten layers, providing valuable contextual information for dataset characterization and script-specific performance analysis.
3.3.3. Robust Training Frameworks
- Adaptive optimization: An AdamW optimizer with a learning rate of and a weight decay of is applied due to its effectiveness in deep learning contexts and its inherent regularization properties. Furthermore, the ReduceLROnPlateau learning rate scheduler dynamically adjusts the learning rate based on the validation loss. Its purpose is to automatically reduce the learning rate when the model’s performance on a validation metric (e.g., validation loss) stops improving. It reduces the learning rate by a factor of if the loss does not improve for five consecutive epochs. This adaptive approach fosters robust convergence.
- Kullback–Leibler Divergence (KLD) Warm-up: The total VAE loss comprises a reconstruction term (Binary Cross-Entropy) and a regularization term derived from the Kullback–Leibler Divergence between the learned latent distribution and a standard Gaussian prior. To mitigate the issue of posterior collapse, a KLD warm-up strategy is implemented. The weight applied to the KLD term linearly increases from 0 to its full value (e.g., 0.01) over the initial training epochs. This allows the encoder to develop meaningful latent representations before being heavily constrained by the regularization.
- Gradient clipping: To prevent issues with exploding gradients, a common challenge in training deep networks, gradient norms are clipped to a maximum value of 1.0. This technique contributes significantly to training stability.
- Rigorous evaluation cycle: Each training epoch is followed by a dedicated validation phase, during which the model’s performance is assessed on an unseen dataset. This ensures an unbiased and comprehensive assessment of its generalization capabilities.
3.4. Training Protocol
3.5. Evaluation Metrics
- Mean-Squared Error (MSE). MSE quantifies pixel-wise reconstruction fidelity [38]. While sensitive to global intensity shifts, it fails to capture structural preservation. We report MSE in a normalized intensity space , with lower values indicating better performance.
- Intersection over Union (IoU). IoU measures spatial overlap between binarized text regions [39]. Critical for palimpsests, it evaluates character localization independent of stroke intensity. Values are in the range of [0, 1], with 1 indicating perfect segmentation.
- F1-Score. The harmonic mean of precision and recall balances false positives and false negatives [40]. For highly imbalanced text/background distributions, F1 more reliably reflects performance than accuracy.
- Precision. Precision measures reconstruction specificity—the proportion of detected content that is actual text. High precision minimizes false attributions, critical in historical analysis [3].
- Recall. Recall quantifies sensitivity to faint text elements, and is essential for recovering degraded scripts where missing characters alter meaning [1].
4. Results
4.1. Metrics’ Performance
4.2. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Emery, D.; Easton, R. Spectral Imaging and Analytical Approaches for Palimpsest Research. J. Cult. Herit. 2021, 48, 129–138. [Google Scholar]
- Seales, W.B.; Parker, C.; Segal, M.; Tov, E.; Shor, P.; Porath, Y. From Damage to Discovery: Virtual Unwrapping of Damaged Manuscripts. IEEE Signal Process. Mag. 2016, 33, 28–37. [Google Scholar]
- Jampour, M. Revealing Palimpsests with Latent Diffusion Models: A Generative Approach to Image Inpainting and Handwriting Reconstruction. In Proceedings of the 2025 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW), Tucson, AZ, USA, 28 February–4 March 2025; pp. 242–249. [Google Scholar]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Mitra, A.; Roy, S.; Bhattacharya, U. Multispectral Document Imaging: A Survey. Comput. Vis. Image Underst. 2021, 210, 103245. [Google Scholar]
- Perino, M.; Pronti, L.; Moffa, C.; Rosellini, M.; Felic, A.C. New Frontiers in the Digital Restoration of Hidden Texts in Manuscripts: A Review of the Technical Approaches. Heritage 2024, 7, 683–696. [Google Scholar] [CrossRef]
- Chen, J.; Yu, W.; Sun, K.; Li, C.; Wang, J. Document Image Enhancement using Generative Adversarial Networks. Pattern Recognit. Lett. 2021, 152, 82–88. [Google Scholar]
- Bhowmik, S. Document Image Binarization. In Document Layout Analysis; SpringerBriefs in Computer Science; Springer: Singapore, 2019; pp. 11–30. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A. Image-to-Image Translation with Conditional Adversarial Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhang, H.; Goodfellow, I.; Metaxas, D.; Odena, A. Self-Attention Generative Adversarial Networks. In Proceedings of the 36th International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 7354–7363. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30, 6629–6640. [Google Scholar]
- Starynska, A.; Messinger, D.; Kong, Y. Revealing a history: Palimpsest text separation with generative networks. Int. J. Doc. Anal. Recognit. 2021, 24, 181–195. [Google Scholar] [CrossRef]
- Bird, R.B.; Stewart, W.E.; Lightfoot, E.N. Transport Phenomena, 2nd ed.; Contains Derivation and Discussion of Fick’s Second Law of Diffusion; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]
- Han, Y.; Hamon, F.P.; Jiang, S.; Durlofsky, L.J. Surrogate model for geological CO storage and its use in hierarchical MCMC history matching. Adv. Water Resour. 2024, 187, 104678. [Google Scholar] [CrossRef]
- Zhang, Y.; Araya-Polo, M.; Mukerji, T. Synthetic Data Generation for Deep Learning-Based Seismic Inversion: From 1D to Complex 2D Models. Geophysics 2022, 87, R507–R522. [Google Scholar]
- Karniadakis, G.E.; Kevrekidis, I.G.; Lu, L.; Perdikaris, P.; Wang, S.; Yang, L. Physics-informed machine learning. Nat. Rev. Phys. 2021, 3, 422–440. [Google Scholar] [CrossRef]
- Arridge, S.; de Hoop, M.; Maass, P.; Öktem, O.; Schönlieb, C.; Unser, M. Deep Learning and Inverse Problems. Snapshots Mod. Math. Oberwolfach 2019, 15. [Google Scholar] [CrossRef]
- Zheng, X.; Xu, Z.; Yin, Q.; Bao, Z.; Chen, Z.; Wang, S. A Transformer-Unet Generative Adversarial Network for the Super-Resolution Reconstruction of DEMs. Remote Sens. 2024, 16, 3676. [Google Scholar] [CrossRef]
- Burgess, C.P.; Higgins, I.; Pal, A.; Matthey, L.; Watters, N.; Desjardins, G.; Lerchner, A. Understanding disentangling in β-VAE. arXiv 2018, arXiv:1804.03599. [Google Scholar]
- Razavi, A.; van den Oord, A.; Vinyals, O. Preventing Posterior Collapse with δ-VAEs. In Proceedings of the International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Zhao, S.; Song, J.; Ermon, S. Towards Deeper Understanding of Variational Autoencoding Models. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 3981–3990. [Google Scholar]
- Seo, H.j.; Kim, D.; Chung, H.; Lee, S. Handwritten text segmentation via end-to-end learning of convolutional neural network. Pattern Recognit. 2020, 107, 107473. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2016, arXiv:1607.08022. [Google Scholar]
- Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Bowman, S.R.; Vilnis, L.; Vinyals, O.; Dai, A.M.; Jozefowicz, R.; Bengio, S. Generating sentences from a continuous space. In Proceedings of the CoNLL, Beijing, China, 30–31 July 2015. [Google Scholar]
- Sudre, C.H.; Li, W.; Vercauteren, T.; Ourselin, S.; Cardoso, M.J. Generalised dice overlap as a deep learning loss function for highly unbalanced segmentations. In Proceedings of the Internation Conference of Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support, DLMIA ML-CDS 2017, Québec City, QC, Canada, 14 September 2017; pp. 240–248. [Google Scholar] [CrossRef]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. In Proceedings of the Advances in Neural Information Processing Systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
- Mao, X.; Li, Q.; Xie, H.; Lau, R.; Wang, Z. Least Squares Generative Adversarial Networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Gulrajani, I.; Ahmed, F.; Arjovsky, M.; Dumoulin, V.; Courville, A. Improved Training of Wasserstein GANs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- You, Y.; Gitman, I.; Ginsburg, B. Large batch training of convolutional networks. arXiv 2017, arXiv:1708.03888. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Delving deep into rectifiers: Surpassing human-level performance on ImageNet classification. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Washington, DC, USA, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
- Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Rabaev, I.; Litvak, M. Recent advances in text line segmentation and baseline detection in historical document images: A systematic review. Int. J. Doc. Anal. Recognit. 2025. [Google Scholar] [CrossRef]
- Powers, D.M. Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation. J. Mach. Learn. Technol. 2011, 2, 37–63. [Google Scholar]
- Otsu, N. A threshold selection method from gray-level histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Model | MSE ↓ | IoU ↑ | F1 ↑ | Precision ↑ | Recall ↑ |
---|---|---|---|---|---|
Baseline VAE | 0.0170 ** | 0.3500 ** | 0.5181 ** | 0.5407 ** | 0.4981 ** |
Improved VAE | 0.0139 * | 0.4323 ** | 0.6030 ** | 0.5537 * | 0.6645 ** |
Proposed GAN | 0.0110 | 0.5823 | 0.7357 | 0.6808 | 0.8006 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Salmeron, J.L.; Fernandez-Palop, E. Physically Informed Synthetic Data Generation and U-Net Generative Adversarial Network for Palimpsest Reconstruction. Mathematics 2025, 13, 2304. https://doi.org/10.3390/math13142304
Salmeron JL, Fernandez-Palop E. Physically Informed Synthetic Data Generation and U-Net Generative Adversarial Network for Palimpsest Reconstruction. Mathematics. 2025; 13(14):2304. https://doi.org/10.3390/math13142304
Chicago/Turabian StyleSalmeron, Jose L., and Eva Fernandez-Palop. 2025. "Physically Informed Synthetic Data Generation and U-Net Generative Adversarial Network for Palimpsest Reconstruction" Mathematics 13, no. 14: 2304. https://doi.org/10.3390/math13142304
APA StyleSalmeron, J. L., & Fernandez-Palop, E. (2025). Physically Informed Synthetic Data Generation and U-Net Generative Adversarial Network for Palimpsest Reconstruction. Mathematics, 13(14), 2304. https://doi.org/10.3390/math13142304