Saved Queries

Artificial intelligence technologies, particularly machine learning and computer vision, are being increasingly utilized to preserve, restore, and create immersive virtual experiences with cultural artifacts and sites, thus aiding in conserving cultural heritage and making it accessible to a global audience. This paper examines the performance of Generative Adversarial Networks (GAN), especially Style-Based Generator Architecture (StyleGAN), as a deep learning approach for producing realistic images of Egyptian monuments. We used Sigmoid loss for Language–Image Pre-training (SigLIP) as a unique image–text alignment system to guide monument generation through semantic elements. We also studied truncation methods to regulate the generated image noise and identify the most effective parameter settings based on architectural representation versus diverse output creation. An improved discriminator design that combined noise addition with squeeze-and-excitation blocks and a modified MinibatchStdLayer produced 27.5% better Fréchet Inception Distance performance than the original discriminator models. Moreover, differential evolution for latent-space optimization reduced alignment mistakes during specific monument construction tasks by about 15%. We checked a wide range of truncation values from 0.1 to 1.0 and found that somewhere between 0.4 and 0.7 was the best range because it allowed for good accuracy while retaining many different architectural elements. Our findings indicate that specific model optimization strategies produce superior outcomes by creating better-quality and historically correct representations of diverse Egyptian monuments. Thus, the developed technology may be instrumental in generating educational and archaeological visualization assets while adding virtual tourism capabilities. Full article

(This article belongs to the Special Issue Novel Applications of Machine Learning and Bayesian Optimization)

►▼ Show Figures

Figure 1

26 pages, 9773 KiB

Open AccessReview

A Narrative Review of the Clinical Applications of Echocardiography in Right Heart Failure

by North J. Noelck, Heather A. Perry, Phyllis L. Talley and D. Elizabeth Le

J. Clin. Med. 2025, 14(15), 5505; https://doi.org/10.3390/jcm14155505 - 5 Aug 2025

Abstract

Background/Objectives: Historically, echocardiographic imaging of the right heart has been challenging because its abnormal geometry is not conducive to reproducible anatomical and functional assessment. With the development of advanced echocardiographic techniques, it is now possible to complete an integrated assessment of the right heart that has fewer assumptions, resulting in increased accuracy and precision. Echocardiography continues to be the first-line imaging modality for diagnostic analysis and the management of acute and chronic right heart failure because of its portability, versatility, and affordability compared to cardiac computed tomography, magnetic resonance imaging, nuclear scintigraphy, and positron emission tomography. Virtually all echocardiographic parameters have been well-validated and have demonstrated prognostic significance. The goal of this narrative review of the echocardiographic parameters of the right heart chambers and hemodynamic alterations associated with right ventricular dysfunction is to present information that must be acquired during each examination to deliver a comprehensive assessment of the right heart and to discuss their clinical significance in right heart failure. Methods: Using a literature search in the PubMed database from 1985 to 2025 and the Cochrane database, which included but was not limited to terminology that are descriptive of right heart anatomy and function, disease states involving acute and chronic right heart failure and pulmonary hypertension, and the application of conventional and advanced echocardiographic modalities that strive to elucidate the pathophysiology of right heart failure, we reviewed randomized control trials, observational retrospective and prospective cohort studies, societal guidelines, and systematic review articles. Conclusions: In addition to the conventional 2-dimensional echocardiography and color, spectral, and tissue Doppler measurements, a contemporary echocardiographic assessment of a patient with suspected or proven right heart failure must include 3-dimensional echocardiographic-derived measurements, speckle-tracking echocardiography strain analysis, and hemodynamics parameters to not only characterize the right heart anatomy but to also determine the underlying pathophysiology of right heart failure. Complete and point-of-care echocardiography is available in virtually all clinical settings for routine care, but this imaging tool is particularly indispensable in the emergency department, intensive care units, and operating room, where it can provide an immediate assessment of right ventricular function and associated hemodynamic changes to assist with real-time management decisions. Full article

(This article belongs to the Special Issue Cardiac Imaging in the Diagnosis and Management of Heart Failure)

►▼ Show Figures

Figure 1

14 pages, 221 KiB

Open AccessReview

Metabolic Dysfunction-Associated Steatotic Liver Disease in People with Type 1 Diabetes

by Brynlee Vermillion and Yuanjie Mao

J. Clin. Med. 2025, 14(15), 5502; https://doi.org/10.3390/jcm14155502 - 5 Aug 2025

Abstract

Metabolic dysfunction-associated steatotic liver disease (MASLD) is increasingly recognized as a significant comorbidity in individuals with type 1 diabetes (T1D), despite its historical association with type 2 diabetes. This review focuses on summarizing current findings regarding the role of insulin resistance in the development of MASLD in T1D, as well as examining the relationship between MASLD and diabetes-related complications. We will also briefly discuss the prevalence, diagnostic challenges, associated complications, and potential mechanisms underlying MASLD in T1D. Although insulin resistance is well established in MASLD among those with type 2 diabetes, its role in T1D requires further clarification. Emerging markers, such as the estimated glucose disposal rate, offer early insight into this relationship. MASLD in T1D is linked to both microvascular and macrovascular complications, including nephropathy, retinopathy, neuropathy, and cardiovascular disease. Variability in prevalence estimates reflects inconsistencies among imaging modalities, emphasizing the need for standardized, non-invasive diagnostic approaches. Recognizing and addressing MASLD and its links to insulin resistance and diabetes complications in T1D is vital for mitigating long-term complications and enhancing clinical outcomes. Full article

(This article belongs to the Section Endocrinology & Metabolism)

16 pages, 1618 KiB

Open AccessArticle

Multimodal Temporal Knowledge Graph Embedding Method Based on Mixture of Experts for Recommendation

by Bingchen Liu, Guangyuan Dong, Zihao Li, Yuanyuan Fang, Jingchen Li, Wenqi Sun, Bohan Zhang, Changzhi Li and Xin Li

Mathematics 2025, 13(15), 2496; https://doi.org/10.3390/math13152496 - 3 Aug 2025

Viewed by 225

Abstract

Knowledge-graph-based recommendation aims to provide personalized recommendation services to users based on their historical interaction information, which is of great significance for shopping transaction rates and other aspects. With the rapid growth of online shopping, the knowledge graph constructed from users’ historical interaction data now incorporates multiattribute information, including timestamps, images, and textual content. The information of multiple modalities is difficult to effectively utilize due to their different representation structures and spaces. The existing methods attempt to utilize the above information through simple embedding representation and aggregation, but ignore targeted representation learning for information with different attributes and learning effective weights for aggregation. In addition, existing methods are not sufficient for effectively modeling temporal information. In this article, we propose MTR, a knowledge graph recommendation framework based on mixture of experts network. To achieve this goal, we use a mixture-of-experts network to learn targeted representations and weights of different product attributes for effective modeling and utilization. In addition, we effectively model the temporal information during the user shopping process. A thorough experimental study on popular benchmarks validates that MTR can achieve competitive results. Full article

(This article belongs to the Special Issue Data-Driven Decentralized Learning for Future Communication Networks)

►▼ Show Figures

Figure 1

23 pages, 1693 KiB

Open AccessReview

From Vision to Illumination: The Promethean Journey of Optical Coherence Tomography in Cardiology

by Angela Buonpane, Giancarlo Trimarchi, Francesca Maria Di Muro, Giulia Nardi, Marco Ciardetti, Michele Alessandro Coceani, Luigi Emilio Pastormerlo, Umberto Paradossi, Sergio Berti, Carlo Trani, Giovanna Liuzzo, Italo Porto, Antonio Maria Leone, Filippo Crea, Francesco Burzotta, Rocco Vergallo and Alberto Ranieri De Caterina

J. Clin. Med. 2025, 14(15), 5451; https://doi.org/10.3390/jcm14155451 - 2 Aug 2025

Viewed by 257

Abstract

Optical Coherence Tomography (OCT) has evolved from a breakthrough ophthalmologic imaging tool into a cornerstone technology in interventional cardiology. After its initial applications in retinal imaging in the early 1990s, OCT was subsequently envisioned for cardiovascular use. In 1995, its ability to visualize atherosclerotic plaques was demonstrated in an in vitro study, and the following year marked the acquisition of the first in vivo OCT image of a human coronary artery. A major milestone followed in 2000, with the first intracoronary imaging in a living patient using time-domain OCT. However, the real inflection point came in 2006 with the advent of frequency-domain OCT, which dramatically improved acquisition speed and image quality, enabling safe and routine imaging in the catheterization lab. With the advent of high-resolution, second-generation frequency-domain systems, OCT has become clinically practical and widely adopted in catheterization laboratories. OCT progressively entered interventional cardiology, first proving its safety and feasibility, then demonstrating superiority over angiography alone in guiding percutaneous coronary interventions and improving outcomes. Today, it plays a central role not only in clinical practice but also in cardiovascular research, enabling precise assessment of plaque biology and response to therapy. With the advent of artificial intelligence and hybrid imaging systems, OCT is now evolving into a true precision-medicine tool—one that not only guides today’s therapies but also opens new frontiers for discovery, with vast potential still waiting to be explored. Tracing its historical evolution from ophthalmology to cardiology, this narrative review highlights the key technological milestones, clinical insights, and future perspectives that position OCT as an indispensable modality in contemporary interventional cardiology. As a guiding thread, the myth of Prometheus is used to symbolize the evolution of OCT—from its illuminating beginnings in ophthalmology to its transformative role in cardiology—as a metaphor for how light, innovation, and knowledge can reveal what was once hidden and redefine clinical practice. Full article

(This article belongs to the Section Cardiology)

►▼ Show Figures

Graphical abstract

20 pages, 4847 KiB

Open AccessArticle

FCA-STNet: Spatiotemporal Growth Prediction and Phenotype Extraction from Image Sequences for Cotton Seedlings

by Yiping Wan, Bo Han, Pengyu Chu, Qiang Guo and Jingjing Zhang

Plants 2025, 14(15), 2394; https://doi.org/10.3390/plants14152394 - 2 Aug 2025

Viewed by 234

Abstract

To address the limitations of the existing cotton seedling growth prediction methods in field environments, specifically, poor representation of spatiotemporal features and low visual fidelity in texture rendering, this paper proposes an algorithm for the prediction of cotton seedling growth from images based on FCA-STNet. The model leverages historical sequences of cotton seedling RGB images to generate an image of the predicted growth at time t + 1 and extracts 37 phenotypic traits from the predicted image. A novel STNet structure is designed to enhance the representation of spatiotemporal dependencies, while an Adaptive Fine-Grained Channel Attention (FCA) module is integrated to capture both global and local feature information. This attention mechanism focuses on individual cotton plants and their textural characteristics, effectively reducing the interference from common field-related challenges such as insufficient lighting, leaf fluttering, and wind disturbances. The experimental results demonstrate that the predicted images achieved an MSE of 0.0086, MAE of 0.0321, SSIM of 0.8339, and PSNR of 20.7011 on the test set, representing improvements of 2.27%, 0.31%, 4.73%, and 11.20%, respectively, over the baseline STNet. The method outperforms several mainstream spatiotemporal prediction models. Furthermore, the majority of the predicted phenotypic traits exhibited correlations with actual measurements with coefficients above 0.8, indicating high prediction accuracy. The proposed FCA-STNet model enables visually realistic prediction of cotton seedling growth in open-field conditions, offering a new perspective for research in growth prediction. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence for Plant Research)

►▼ Show Figures

Figure 1

26 pages, 4349 KiB

Open AccessArticle

Palazzo Farnese and Dong’s Fortified Compound: An Art-Anthropological Cross-Cultural Analysis of Architectural Form, Symbolic Ornamentation, and Public Perception

by Liyue Wu, Qinchuan Zhan, Yanjun Li and Chen Chen

Buildings 2025, 15(15), 2720; https://doi.org/10.3390/buildings15152720 - 1 Aug 2025

Viewed by 124

Abstract

This study presents a cross-cultural comparison of two fortified residences—Palazzo Farnese in Italy and Dong’s Fortified Compound in China—through a triadic analytical framework encompassing architectural form, symbolic ornamentation, and public perception. By combining field observation, iconographic interpretation, and digital ethnography, the research investigates how heritage meaning is constructed, encoded, and reinterpreted across distinct sociocultural contexts. Empirical materials include architectural documentation, decorative analysis, and a curated dataset of 4947 user-generated images and 1467 textual comments collected from Chinese and international platforms between 2020 and 2024. Methods such as CLIP-based visual clustering and BERTopic-enabled sentiment modelling were applied to extract patterns of perception and symbolic emphasis. The findings reveal contrasting representational logics: Palazzo Farnese encodes dynastic authority and Renaissance cosmology through geometric order and immersive frescoes, while Dong’s Compound conveys Confucian ethics and frontier identity via nested courtyards and traditional ornamentation. Digital responses diverge accordingly: international users highlight formal aesthetics and photogenic elements; Chinese users engage with symbolic motifs, family memory, and ritual significance. This study illustrates how historically fortified residences are reinterpreted through culturally specific digital practices, offering an interdisciplinary approach that bridges architectural history, symbolic analysis, and digital heritage studies. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

►▼ Show Figures

Figure 1

20 pages, 3857 KiB

Open AccessReview

Utility of Enabling Technologies in Spinal Deformity Surgery: Optimizing Surgical Planning and Intraoperative Execution to Maximize Patient Outcomes

by Nora C. Kim, Eli Johnson, Christopher DeWald, Nathan Lee and Timothy Y. Wang

J. Clin. Med. 2025, 14(15), 5377; https://doi.org/10.3390/jcm14155377 - 30 Jul 2025

Viewed by 398

Abstract

The management of adult spinal deformity (ASD) has evolved dramatically over the past century, transitioning from external bracing and in situ fusion to complex, technology-driven surgical interventions. This review traces the historical development of spinal deformity correction and highlights contemporary enabling technologies that are redefining the surgical landscape. Advances in stereoradiographic imaging now allow for precise, low-dose three-dimensional assessment of spinopelvic parameters and segmental bone density, facilitating individualized surgical planning. Robotic assistance and intraoperative navigation improve the accuracy and safety of instrumentation, while patient-specific rods and interbody implants enhance biomechanical conformity and alignment precision. Machine learning and predictive modeling tools have emerged as valuable adjuncts for risk stratification, surgical planning, and outcome forecasting. Minimally invasive deformity correction strategies, including anterior column realignment and circumferential minimally invasive surgery (cMIS), have demonstrated equivalent clinical and radiographic outcomes to traditional open surgery with reduced perioperative morbidity in select patients. Despite these advancements, complications such as proximal junctional kyphosis and failure remain prevalent. Adjunctive strategies—including ligamentous tethering, modified proximal fixation, and vertebral cement augmentation—offer promising preventive potential. Collectively, these innovations signal a paradigm shift toward precision spine surgery, characterized by data-informed decision-making, individualized construct design, and improved patient-centered outcomes in spinal deformity care. Full article

(This article belongs to the Special Issue Clinical New Insights into Management of Scoliosis)

►▼ Show Figures

Figure 1

23 pages, 7839 KiB

Open AccessArticle

Automated Identification and Analysis of Cracks and Damage in Historical Buildings Using Advanced YOLO-Based Machine Vision Technology

by Kui Gao, Li Chen, Zhiyong Li and Zhifeng Wu

Buildings 2025, 15(15), 2675; https://doi.org/10.3390/buildings15152675 - 29 Jul 2025

Viewed by 195

Abstract

Structural cracks significantly threaten the safety and longevity of historical buildings, which are essential parts of cultural heritage. Conventional inspection techniques, which depend heavily on manual visual evaluations, tend to be inefficient and subjective. This research introduces an automated framework for crack and damage detection using advanced YOLO (You Only Look Once) models, aiming to improve both the accuracy and efficiency of monitoring heritage structures. A dataset comprising 2500 high-resolution images was gathered from historical buildings and categorized into four levels of damage: no damage, minor, moderate, and severe. Following preprocessing and data augmentation, a total of 5000 labeled images were utilized to train and evaluate four YOLO variants: YOLOv5, YOLOv8, YOLOv10, and YOLOv11. The models’ performances were measured using metrics such as precision, recall, mAP@50, mAP@50–95, as well as losses related to bounding box regression, classification, and distribution. Experimental findings reveal that YOLOv10 surpasses other models in multi-target detection and identifying minor damage, achieving higher localization accuracy and faster inference speeds. YOLOv8 and YOLOv11 demonstrate consistent performance and strong adaptability, whereas YOLOv5 converges rapidly but shows weaker validation results. Further testing confirms YOLOv10’s effectiveness across different structural components, including walls, beams, and ceilings. This study highlights the practicality of deep learning-based crack detection methods for preserving building heritage. Future advancements could include combining semantic segmentation networks (e.g., U-Net) with attention mechanisms to further refine detection accuracy in complex scenarios. Full article

(This article belongs to the Special Issue Structural Safety Evaluation and Health Monitoring)

►▼ Show Figures

Figure 1

21 pages, 3448 KiB

Open AccessArticle

A Welding Defect Detection Model Based on Hybrid-Enhanced Multi-Granularity Spatiotemporal Representation Learning

by Chenbo Shi, Shaojia Yan, Lei Wang, Changsheng Zhu, Yue Yu, Xiangteng Zang, Aiping Liu, Chun Zhang and Xiaobing Feng

Sensors 2025, 25(15), 4656; https://doi.org/10.3390/s25154656 - 27 Jul 2025

Viewed by 388

Abstract

Real-time quality monitoring using molten pool images is a critical focus in researching high-quality, intelligent automated welding. To address interference problems in molten pool images under complex welding scenarios (e.g., reflected laser spots from spatter misclassified as porosity defects) and the limited interpretability of deep learning models, this paper proposes a multi-granularity spatiotemporal representation learning algorithm based on the hybrid enhancement of handcrafted and deep learning features. A MobileNetV2 backbone network integrated with a Temporal Shift Module (TSM) is designed to progressively capture the short-term dynamic features of the molten pool and integrate temporal information across both low-level and high-level features. A multi-granularity attention-based feature aggregation module is developed to select key interference-free frames using cross-frame attention, generate multi-granularity features via grouped pooling, and apply the Convolutional Block Attention Module (CBAM) at each granularity level. Finally, these multi-granularity spatiotemporal features are adaptively fused. Meanwhile, an independent branch utilizes the Histogram of Oriented Gradient (HOG) and Scale-Invariant Feature Transform (SIFT) features to extract long-term spatial structural information from historical edge images, enhancing the model’s interpretability. The proposed method achieves an accuracy of 99.187% on a self-constructed dataset. Additionally, it attains a real-time inference speed of 20.983 ms per sample on a hardware platform equipped with an Intel i9-12900H CPU and an RTX 3060 GPU, thus effectively balancing accuracy, speed, and interpretability. Full article

(This article belongs to the Topic Applied Computing and Machine Intelligence (ACMI))

►▼ Show Figures

Figure 1

31 pages, 15992 KiB

Open AccessArticle

Multi-Temporal Mineral Mapping in Two Torrential Basins Using PRISMA Hyperspectral Imagery

by Inés Pereira, Eduardo García-Meléndez, Montserrat Ferrer-Julià, Harald van der Werff, Pablo Valenzuela and Juncal A. Cruz

Remote Sens. 2025, 17(15), 2582; https://doi.org/10.3390/rs17152582 - 24 Jul 2025

Viewed by 291

Abstract

The Sierra Minera de Cartagena-La Unión, located in southeast of the Iberian Peninsula, has been significantly impacted by historical mining activities, which resulted in environmental degradation, including acid mine drainage (AMD) and heavy metal contamination. This study evaluates the potential of PRISMA hyperspectral imagery for multi-temporal mapping of AMD-related minerals in two mining-affected drainage basins: Beal and Gorguel. Key minerals indicative of AMD—iron oxides and hydroxides (hematite, jarosite, goethite), gypsum, and aluminium-bearing clays—were identified and mapped using band ratios applied to PRISMA data acquired over five dates between 2020 and 2024. Additionally, Sentinel-2 data were incorporated in the analysis due to their higher temporal resolution to complement iron oxide and hydroxide evolution from PRISMA. Results reveal distinct temporal and spatial patterns in mineral distribution, influenced by seasonal precipitation and climatic factors. Jarosite was predominant after torrential precipitation events, reflecting recent AMD deposition, while gypsum exhibited seasonal variability linked to evaporation cycles. Goethite and hematite increased in drier conditions, indicating transitions in oxidation states. Validation using X-ray diffraction (XRD), laboratory spectral curves, and a larger time-series of Sentinel-2 imagery demonstrated strong correlations, confirming PRISMA’s effectiveness for iron oxides and hydroxides and gypsum identification and monitoring. However, challenges such as noise, striping effects, and limited image availability affected the accuracy of aluminium-bearing clay mapping and limited long-term trend analysis. Full article

(This article belongs to the Section Remote Sensing in Geology, Geomorphology and Hydrology)

►▼ Show Figures

Graphical abstract

17 pages, 3726 KiB

Open AccessArticle

LEAD-Net: Semantic-Enhanced Anomaly Feature Learning for Substation Equipment Defect Detection

by Linghao Zhang, Junwei Kuang, Yufei Teng, Siyu Xiang, Lin Li and Yingjie Zhou

Processes 2025, 13(8), 2341; https://doi.org/10.3390/pr13082341 - 23 Jul 2025

Viewed by 270

Abstract

Substation equipment defect detection is a critical aspect of ensuring the reliability and stability of modern power grids. However, existing deep-learning-based detection methods often face significant challenges in real-world deployment, primarily due to low detection accuracy and inconsistent anomaly definitions across different substation environments. To address these limitations, this paper proposes the Language-Guided Enhanced Anomaly Power Equipment Detection Network (LEAD-Net), a novel framework that leverages text-guided learning during training to significantly improve defect detection performance. Unlike traditional methods, LEAD-Net integrates textual descriptions of defects, such as historical maintenance records or inspection reports, as auxiliary guidance during training. A key innovation is the Language-Guided Anomaly Feature Enhancement Module (LAFEM), which refines channel attention using these text features. Crucially, LEAD-Net operates solely on image data during inference, ensuring practical applicability. Experiments on a real-world substation dataset, comprising 8307 image–text pairs and encompassing a diverse range of defect categories encountered in operational substation environments, demonstrate that LEAD-Net significantly outperforms state-of-the-art object detection methods (Faster R-CNN, YOLOv9, DETR, and Deformable DETR), achieving a mean Average Precision (mAP) of 79.51%. Ablation studies confirm the contributions of both LAFEM and the training-time text guidance. The results highlight the effectiveness and novelty of using training-time defect descriptions to enhance visual anomaly detection without requiring text input at inference. Full article

(This article belongs to the Special Issue Smart Optimization Techniques for Microgrid Management)

►▼ Show Figures

Figure 1

19 pages, 2016 KiB

Open AccessArticle

A Robust and Energy-Efficient Control Policy for Autonomous Vehicles with Auxiliary Tasks

by Yabin Xu, Chenglin Yang and Xiaoxi Gong

Electronics 2025, 14(15), 2919; https://doi.org/10.3390/electronics14152919 - 22 Jul 2025

Viewed by 268

Abstract

We present a lightweight autonomous driving method that uses a low-cost camera, a simple end-to-end convolutional neural network architecture, and smoother driving techniques to achieve energy-efficient vehicle control. Instead of directly constructing a mapping from raw sensory input to the action, our network takes the frame-to-frame visual difference as one of the crucial inputs to produce control commands, including the steering angle and the speed value at each time step. This choice of input allows highlighting the most relevant parts on raw image pairs to decrease the unnecessary visual complexity caused by different road and weather conditions. Additionally, our network achieves the prediction of the vehicle’s upcoming control commands by incorporating a view synthesis component into the model. The view synthesis, as an auxiliary task, aims to infer a novel view for the future from the historical environment transformation cue. By combining both the current and upcoming control commands, our framework achieves driving smoothness, which is highly associated with energy efficiency. We perform experiments on benchmarks to evaluate the reliability under different driving conditions in terms of control accuracy. We deploy a mobile robot outdoors to evaluate the power consumption of different control policies. The quantitative results demonstrate that our method can achieve energy efficiency in the real world. Full article

(This article belongs to the Special Issue Simultaneous Localization and Mapping (SLAM) of Mobile Robots)

►▼ Show Figures

Figure 1

28 pages, 2518 KiB

Open AccessArticle

Enhancing Keyword Spotting via NLP-Based Re-Ranking: Leveraging Semantic Relevance Feedback in the Handwritten Domain

by Stergios Papazis, Angelos P. Giotis and Christophoros Nikou

Electronics 2025, 14(14), 2900; https://doi.org/10.3390/electronics14142900 - 20 Jul 2025

Viewed by 341

Abstract

Handwritten Keyword Spotting (KWS) remains a challenging task, particularly in segmentation-free scenarios where word images must be retrieved and ranked based on their similarity to a query without relying on prior page-level segmentation. Traditional KWS methods primarily focus on visual similarity, often overlooking the underlying semantic relationships between words. In this work, we propose a novel NLP-driven re-ranking approach that refines the initial ranked lists produced by state-of-the-art KWS models. By leveraging semantic embeddings from pre-trained BERT-like Large Language Models (LLMs, e.g., RoBERTa, MPNet, and MiniLM), we introduce a relevance feedback mechanism that improves both verbatim and semantic keyword spotting. Our framework operates in two stages: (1) projecting retrieved word image transcriptions into a semantic space via LLMs and (2) re-ranking the retrieval list using a weighted combination of semantic and exact relevance scores based on pairwise similarities with the query. We evaluate our approach on the widely used George Washington (GW) and IAM collections using two cutting-edge segmentation-free KWS models, which are further integrated into our proposed pipeline. Our results show consistent gains in Mean Average Precision (mAP), with improvements of up to

2.3 %

(from

94.3 %

96.6 %

) on GW and

3 %

(from

79.15 %

82.12 %

) on IAM. Even when mAP gains are smaller, qualitative improvements emerge: semantically relevant but inexact matches are retrieved more frequently without compromising exact match recall. We further examine the effect of fine-tuning transformer-based OCR (TrOCR) models on historical GW data to align textual and visual features more effectively. Overall, our findings suggest that semantic feedback can enhance retrieval effectiveness in KWS pipelines, paving the way for lightweight hybrid vision-language approaches in handwritten document analysis. Full article

(This article belongs to the Special Issue AI Synergy: Vision, Language, and Modality)

►▼ Show Figures

Figure 1

21 pages, 2143 KiB

Open AccessFeature PaperArticle

Physically Informed Synthetic Data Generation and U-Net Generative Adversarial Network for Palimpsest Reconstruction

by Jose L. Salmeron and Eva Fernandez-Palop

Mathematics 2025, 13(14), 2304; https://doi.org/10.3390/math13142304 - 18 Jul 2025

Viewed by 244

Abstract

This paper introduces a novel adversarial learning framework for reconstructing hidden layers in historical palimpsests. Recovering text hidden in historical palimpsests is complicated by various artifacts, such as ink diffusion, degradation of the writing substrate, and interference between overlapping layers. To address these challenges, the authors of this paper combine a synthetic data generator grounded in physical modeling with three generative architectures: a baseline VAE, an improved variant with stronger regularization, and a U-Net-based GAN that incorporates residual pathways and a mixed loss strategy. The synthetic data engine aims to emulate key degradation effects—such as ink bleeding, the irregularity of parchment fibers, and multispectral layer interactions—using stochastic approximations of underlying physical processes. The quantitative results suggest that the U-Net-based GAN architecture outperforms the VAE-based models by a notable margin, particularly in scenarios with heavy degradation or overlapping ink layers. By relying on synthetic training data, the proposed method facilitates the non-invasive recovery of lost text in culturally important documents, and does so without requiring costly or specialized imaging setups. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 33.

Go to page 1 2 3 4 5

Search Results (1,645)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI