Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (687)

Search Parameters:
Keywords = Generative Pre-training Transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 2158 KB  
Review
Augmenting Large Language Models with External Data Sources: A Systematic Review of Methodologies, Performance Metrics, and Information Fidelity
by Soham Mukherjee, John Le and Chau Nguyen
Knowledge 2026, 6(3), 13; https://doi.org/10.3390/knowledge6030013 (registering DOI) - 25 Jun 2026
Abstract
Large Language Models (LLMs) have emerged as transformative tools across various domains, exhibiting remarkable capabilities in natural language processing and generation. However, their reliance on static pre-training data limits their ability to access up-to-date and domain-specific information. The existing research often treats augmentation [...] Read more.
Large Language Models (LLMs) have emerged as transformative tools across various domains, exhibiting remarkable capabilities in natural language processing and generation. However, their reliance on static pre-training data limits their ability to access up-to-date and domain-specific information. The existing research often treats augmentation strategies in isolation, and limited efforts have been made to systematically compare them through the lens of information integrity. This review focuses specifically on Retrieval-Augmented Generation (RAG) and fine-tuning, identifying them as the two dominant paradigms for integrating external knowledge: RAG for retrieval-based context injection and fine-tuning for parametric knowledge adaptation. While existing surveys predominantly focus on performance metrics like accuracy or latency, this paper addresses the critical gap of data fidelity—the preservation of truthfulness, integrity, and fairness during augmentation. We systematically synthesize empirical findings from diverse methodologies to determine how each approach mitigates hallucinations and bias. By comparing the trade-offs between retrieval-based context injection and parametric knowledge adaptation, this survey provides unique value to readers by providing a structured taxonomy, a unified evaluation framework, and actionable insights to guide future research and practical deployment of robust, high-fidelity LLMs. Full article
Show Figures

Figure 1

25 pages, 8611 KB  
Article
Enhancing Plunger Lift Anomaly Detection: A Vision Transformer-Based Approach Leveraging Pretrained Models and Graphic Data Augmentation
by Jianjun Zhu, Yujun Liu, Haoyu Wang, Mai Chen, Nan Li, Guangqiang Cao, Ruizhi Zhong and Haiwen Zhu
Processes 2026, 14(13), 2045; https://doi.org/10.3390/pr14132045 (registering DOI) - 24 Jun 2026
Abstract
Plunger lift systems are vital for optimizing production in gas wells, but their performance can be compromised by various operational anomalies. Traditional diagnostic methods and conventional convolutional neural network (CNN) approaches often struggle with the complex, transient data from these systems, particularly in [...] Read more.
Plunger lift systems are vital for optimizing production in gas wells, but their performance can be compromised by various operational anomalies. Traditional diagnostic methods and conventional convolutional neural network (CNN) approaches often struggle with the complex, transient data from these systems, particularly in capturing long-range temporal dependencies and generalizing from limited, imbalanced datasets. This study presents an enhanced diagnostic framework for plunger lift anomaly detection by leveraging the strengths of a pre-trained Vision Transformer (ViT). The methodology transforms one-dimensional time-series pressure data into two-dimensional image representations using the element-wise summation of Gramian Angular Summation Field (GASF) and Gramian Angular Difference Field (GADF), which simultaneously preserves global operational trends and local transient dynamics for vision model analysis. The ViT model, initialized with pre-trained weights, is further optimized using Bayesian optimization (BO) for hyperparameter tuning, and a tailored data augmentation pipeline is employed to improve robustness. Comparative evaluations demonstrate that the proposed ViT-based approach, particularly the ViT + GAF + BO model, significantly outperforms baseline CNN models and their optimized variants, achieving the highest Precision, Recall, and F1-score, with an F1-score of 0.93. Visualizations using t-SNE confirm the ViT’s superior capability in learning discriminative features, showcasing well-separated clusters for different operational conditions compared to CNNs. This research underscores the potential of pre-trained ViTs combined with appropriate data representation and optimization techniques for achieving accurate and reliable anomaly detection in plunger lift systems. Full article
(This article belongs to the Special Issue Hybrid Artificial Intelligence for Smart Process Control)
Show Figures

Figure 1

23 pages, 1471 KB  
Article
Transformer-Based Clinical Annotation of Lung Cancer Reports: A Benchmark and Fine-Tuning Study on a Novel Tunisian Corpus
by Ranim Yahyaoui, Ismail Dergaa, Jean Noël Nikiema, Halil İbrahim Ceylan, Nicola Luigi Bragazzi, Saoussen Hantous-Zannad and Hanene Boussi Rahmouni
Bioengineering 2026, 13(7), 724; https://doi.org/10.3390/bioengineering13070724 (registering DOI) - 24 Jun 2026
Abstract
Background: Lung cancer causes more deaths than any other malignancy worldwide, accounting for 2.2 million new cases and 1.8 million deaths in 2020. Extracting structured clinical knowledge from unstructured French-language oncology records remains methodologically unresolved in Tunisian and Francophone healthcare systems, where validated [...] Read more.
Background: Lung cancer causes more deaths than any other malignancy worldwide, accounting for 2.2 million new cases and 1.8 million deaths in 2020. Extracting structured clinical knowledge from unstructured French-language oncology records remains methodologically unresolved in Tunisian and Francophone healthcare systems, where validated natural language processing tools do not yet exist. This study examined the effectiveness of transformer-based named-entity recognition for automated clinical annotation of Tunisian lung cancer reports. Aim: The study aimed to (i) establish performance baselines for four transformer-based models on a publicly available thoracic radiology dataset, (ii) evaluate five models, including a French biomedical specialist, on a newly constructed Tunisian clinical corpus, and (iii) demonstrate prototype deployment feasibility for structured clinical decision support. Methods: An initial comparative study evaluated BERT, RoBERTa, BioClinicalBERT, and CamemBERT using the official RadGraph dataset partitions, which natively comprise a total of 600 annotated thoracic radiology reports distributed across a standardized 80/10/10 split. Subsequently, five models were evaluated on 200 manually annotated diagnostic reports from Mami Pneumo-Phthisiology Hospital, Tunis. For the Tunisian corpus, a five-fold cross-validation approach was implemented to ensure robust performance estimation, followed by final evaluation on a dedicated hold-out test set. All models were trained for a maximum of 10 epochs, with a learning rate of 5 × 10−5 and a batch size of 16. Results: Based on the initial comparative study on the RadGraph dataset, where RoBERTa was the top performer and achieved the highest F1-score of 0.873 (precision: 0.869, recall: 0.877), we evaluated its specialized biomedical variant, DR-BERT, on our Tunisian clinical dataset. DR-BERT demonstrated strong generalization on the hold-out test set with an F1-score of 0.824, outperforming the baseline RoBERTa (test F1: 0.791) and showing competitive performance relative to multilingual BERT (0.843 ± 0.005 in five-fold cross-validation). A prototype interface generated structured clinical summaries encompassing prior conditions, imaging modalities, and TNM staging. Conclusion: Language- and domain-adapted transformer models effectively extract structured clinical entities from French-language Tunisian lung cancer reports. DR-BERT’s superior generalization on unseen data confirms that biomedical pretraining in the target language is a key driver of robust performance in specialized French oncology text. This work establishes foundational infrastructure for NLP-driven oncology data management in Tunisia and comparable Francophone settings. Full article
(This article belongs to the Special Issue Biomedical Data Mining: Emerging Methods and Applications)
Show Figures

Graphical abstract

17 pages, 14712 KB  
Article
LLM-Integrated Semantic Deep Learning Framework for Automated Floor Plan Analysis, Area Estimation, and Compliance Assessment of Existing Buildings
by Yuxuan Guo, Xiaodeng Zhou and Su-Kit Tang
Appl. Sci. 2026, 16(13), 6290; https://doi.org/10.3390/app16136290 (registering DOI) - 23 Jun 2026
Viewed by 65
Abstract
The digitization of existing building stock often depends on legacy 2D raster floor plans (scanned drawings, PDF exports, or photographs) because structured building information models are frequently unavailable for older properties. Manual measurement and visual inspection of such documents are time consuming and [...] Read more.
The digitization of existing building stock often depends on legacy 2D raster floor plans (scanned drawings, PDF exports, or photographs) because structured building information models are frequently unavailable for older properties. Manual measurement and visual inspection of such documents are time consuming and error prone. This paper presents an integrated deep learning pipeline that extracts semantic information from unstructured two-dimensional floor plan images of existing structures and supports preliminary compliance screening via locally deployed large language models. The pipeline employs YOLOv8 for the localization and classification of 18 architectural symbols and furniture items, and a U-Net with a ResNet34 encoder for the semantic segmentation of walls and interior room spaces. To translate pixel-level predictions into physical metrics, we implement an area calculation module based on user-defined reference scale calibration. An LLM evaluation module, deployed locally via Ollama with a retrieval-augmented generation pipeline, interprets extracted room metrics and flags potential non-compliance against referenced residential design guidelines; it is intended for the assessment of existing layouts rather than generative co-design. We expand a core dataset of 101 manually annotated source floor plans to 303 augmented instances using label-aligned geometric transformations, while reporting generalization in terms of the 101 unique source plans. On the held-out validation split (10 source plans), YOLOv8 achieves 92.3% mAP50 versus 87.2% for a Faster R-CNN reference model on the same data split (detection baselines differ in training epochs and pretraining; see Experiments); U-Net achieves 95.71% mIoU, surpassing DeepLabv3+ (93.2%) under matched segmentation training settings. The system is deployed as an interactive web application for legacy building survey and preliminary regulatory review when only two-dimensional documentation is available. Full article
(This article belongs to the Topic AI Agents: Progress, Architecture, and Applications)
Show Figures

Figure 1

14 pages, 4300 KB  
Article
DeepFlare: Weakly Supervised Cross-Modality Translation and Segmentation for Immunohistochemistry and Immunofluorescence Imaging
by Md. Tamim, Aditto Rahman, Redwan Hossain, Tausib Abrar and Riasat Khan
BioMedInformatics 2026, 6(3), 37; https://doi.org/10.3390/biomedinformatics6030037 (registering DOI) - 22 Jun 2026
Viewed by 327
Abstract
Immunohistochemistry (IHC) is a widely used method for detecting specific proteins in tissue samples, helping diagnose diseases such as cancer. Traditional analysis methods rely heavily on human interpretation, which can lead to inconsistencies. In this study, we propose DeepFlare, a weakly supervised deep [...] Read more.
Immunohistochemistry (IHC) is a widely used method for detecting specific proteins in tissue samples, helping diagnose diseases such as cancer. Traditional analysis methods rely heavily on human interpretation, which can lead to inconsistencies. In this study, we propose DeepFlare, a weakly supervised deep learning framework for cross-modality translation and segmentation of immunofluorescence and immunohistochemistry images. The proposed method utilizes multiplex immunofluorescence (mpIF) and co-registered IHC images, combined with preprocessing techniques such as affine transformation, stain normalization, noise reduction, and artifact removal. Multiple imaging channels, including hematoxylin, DAPI, Lap2, and nuclear envelope signals, are leveraged to generate segmentation masks using a U-Net++ architecture. The final segmentation mask is obtained through weighted fusion of modality-specific outputs. A generative adversarial network (GAN) is employed to measure translation fidelity between generated and real images. Weakly supervised learning techniques, including image-level supervision and consistency constraints, are applied to enhance performance under limited annotation scenarios. Pretrained pathology foundation encoders such as UNI and Virchow are integrated to extract multi-scale morphological and contextual features. Explainable AI techniques are incorporated to highlight critical regions and refine model attention. Experimental results demonstrate strong performance, achieving an SSIM of 0.7077 for image translation and a Dice score of 0.7424 for segmentation. The integration of the UNI encoder provides marginal improvement over the baseline (0.72 Dice score), indicating limited domain adaptation without fine-tuning on the dataset of 1264 training samples. Full article
(This article belongs to the Section Imaging Informatics)
Show Figures

Figure 1

30 pages, 7012 KB  
Article
TerrainFormer: World Model-Guided Decision Transformer for Autonomous Off-Road Navigation
by Yongzhi Yang and Kenneth Ricks
Sensors 2026, 26(12), 3795; https://doi.org/10.3390/s26123795 - 14 Jun 2026
Viewed by 427
Abstract
Autonomous navigation in unstructured off-road environments presents fundamental challenges due to terrain heterogeneity, the absence of structured road markings, and the necessity for real-time traversability reasoning from raw sensory observations. We present TerrainFormer, a hierarchical framework that integrates a world model for terrain [...] Read more.
Autonomous navigation in unstructured off-road environments presents fundamental challenges due to terrain heterogeneity, the absence of structured road markings, and the necessity for real-time traversability reasoning from raw sensory observations. We present TerrainFormer, a hierarchical framework that integrates a world model for terrain dynamics prediction with a temporal decision transformer for action selection. Our methodology employs a two-phase training paradigm: (1) self-supervised world model pretraining on LiDAR point clouds to learn terrain representations encompassing traversability, elevation, and semantic segmentation; (2) behavioral cloning of the decision transformer conditioned on frozen world model features with temporally derived goal directions. The world model processes raw 3D LiDAR point clouds through a PointPillars encoder for real-time bird’s-eye-view (BEV) projection, followed by a Vision Transformer backbone that produces latent terrain representations. A principal contribution is our cross-dataset generalization paradigm: the world model is trained on separate datasets while the decision transformer is trained on separate sequences, ensuring zero data overlap between training phases. We introduce automatic goal direction computation from vehicle pose trajectories, enabling the model to learn directionally conditioned navigation policies. To address the class imbalance inherent in off-road driving data, we employ focal loss with inverse-frequency class weighting and action-chunk supervision. Experimental evaluation on the RELLIS-3D dataset achieves 87.31% test accuracy with 0.7948 macro F1 across all 12 action classes. The world model’s predicted future frames produce only a 0.79% accuracy drop versus ground-truth observations, with 98.82% action agreement, demonstrating effective cross-dataset generalization for real-time off-road navigation. Full article
(This article belongs to the Special Issue Intelligent Sensors for Smart and Autonomous Vehicles: 2nd Edition)
Show Figures

Figure 1

18 pages, 6871 KB  
Article
Series Arc Fault Detection Using Differential Higher-Order Cumulants and Symmetric Stacked Autoencoder
by Zhicong Su, Schweitzer Patrick, Haoyong Chen and Ruobo Chu
Symmetry 2026, 18(6), 1003; https://doi.org/10.3390/sym18061003 - 11 Jun 2026
Viewed by 191
Abstract
In low-voltage distribution systems, series arc faults caused by poor contact and loose connections are a leading cause of electrical fires. Due to the negative resistance characteristics of arcs, such faults are difficult to detect using conventional overcurrent or leakage protectors. Existing detection [...] Read more.
In low-voltage distribution systems, series arc faults caused by poor contact and loose connections are a leading cause of electrical fires. Due to the negative resistance characteristics of arcs, such faults are difficult to detect using conventional overcurrent or leakage protectors. Existing detection methods predominantly rely on wavelet-based feature extraction or threshold-based classifiers. Wavelet transforms require predefined basis functions and lack adaptability to non-stationary current signals from appliances such as induction cookers. Threshold-based classifiers produce excessive false alarms under varying load conditions, as normal non-stationary load waveforms share high-frequency characteristics with arc fault signatures. As a result, existing arc fault protectors exhibit high false alarm rates, limiting practical deployment. To address these limitations, this study proposes a method for diagnosing low-voltage series arc faults based on differential-sliding window higher-order cumulants (HoCs) and stacked autoencoders (SAEs). The method first employs a differential-sliding time window approach to extract HoC features from current signals across seven typical loads, establishing a feature vector database for arc fault patterns. A symmetric stacked autoencoder (SAE) is constructed, trained using layer-wise pretraining to optimize hyperparameters and select the model with the best generalization performance. Experimental results demonstrate that the proposed method achieves a detection accuracy of 96.4% with a false alarm rate of 0% across all tested loads. Full article
(This article belongs to the Special Issue Symmetry in Fault Detection and Diagnosis for Dynamic Systems)
Show Figures

Figure 1

21 pages, 21987 KB  
Article
A Spatial Distribution Probability-Guided Detection Framework for Underwater Sonar Imagery
by Dayu Jia, Yan Huang, Jianan Qiao, Zhenyu Wang, Hao Feng and Jiancheng Yu
Remote Sens. 2026, 18(12), 1906; https://doi.org/10.3390/rs18121906 - 9 Jun 2026
Viewed by 183
Abstract
Underwater target detection via side-scan sonar is vital for defense and economy but hindered by sparse targets, high data costs, and feature extraction difficulties due to textureless acoustic data and limited samples. To overcome these limitations, particularly for few-shot, small-object detection, we propose [...] Read more.
Underwater target detection via side-scan sonar is vital for defense and economy but hindered by sparse targets, high data costs, and feature extraction difficulties due to textureless acoustic data and limited samples. To overcome these limitations, particularly for few-shot, small-object detection, we propose a Spatial Distribution Probability-Guided Detection Framework to aid Unmanned Underwater Vehicles (UUVs) in precise localization and clustering. The framework features a novel module that leverages a pre-trained Vision Foundation Model (DINOv3) to generate spatial distribution probability maps, guiding a Transformer-based network for accurate detection with scarce data. Additionally, it incorporates a Target Position Calculation Module and a DBSCAN-based post-processing module to determine global geographic coordinates and cluster discrete points, respectively. Experiments were conducted on both a Public Mine Detection Dataset and a self-collected dataset containing simulated mines and buoys. Ablation studies and comparison experiments demonstrated that the proposed guidance mechanism significantly improves detection performance. Furthermore, two comb-search missions verified that the system could accurately locate and cluster targets, distinguishing real targets from false detections (noise). These results confirm the framework’s efficacy in enabling high-precision perception and autonomous operations for complex underwater inspection tasks. Full article
Show Figures

Figure 1

22 pages, 8252 KB  
Article
Event-Based Sentiment Analysis of Financial News Using Large Language Models: A Comprehensive Framework Integrating RAG, GNNs, and Multi-Agent Systems
by Amit Kulkarni and Varun Dogra
Information 2026, 17(6), 558; https://doi.org/10.3390/info17060558 - 5 Jun 2026
Viewed by 333
Abstract
The proliferation of digital financial news offers unprecedented opportunities for automated analysis of market-moving events. This paper presents a framework for event-based sentiment analysis of financial news that leverages Large Language Models (LLMs). The approach brings together three complementary ideas: Retrieval-Augmented Generation (RAG) [...] Read more.
The proliferation of digital financial news offers unprecedented opportunities for automated analysis of market-moving events. This paper presents a framework for event-based sentiment analysis of financial news that leverages Large Language Models (LLMs). The approach brings together three complementary ideas: Retrieval-Augmented Generation (RAG) for contextual enhancement, Graph Neural Networks (GNNs) for modeling relationships between events, and a multi-agent ensemble for orchestrated reasoning. The methodology targets well-known difficulties in financial text processing, including domain-specific terminology, implicit event detection, and temporal reasoning, and it combines transformer-based event extraction with sentiment classification enhanced by external knowledge retrieval. We evaluate six model configurations on an aggregated corpus of 14,851 financial news samples. On the event-detection task, every configuration reaches a weighted F1-score of 100%; we show that this is a ceiling effect produced by a binary event/no-event formulation over a highly imbalanced dataset rather than evidence of a difficult problem being solved, and we discuss what it implies for how such systems should be evaluated. On three-way sentiment classification, the strongest configuration—the multi-agent ensemble—reaches 87.4% accuracy, narrowly ahead of a RoBERTa (Robustly Optimized BERT Pretraining Approach) baseline at 87.2%; however, because the gaps reported between models are small and we did not run significance testing, we report them as indicative rather than definitive. The GNN component is described as part of the proposed design, but it has not yet been validated experimentally, and we state this limitation explicitly. The framework produces interpretable, structured outputs suited to downstream use in algorithmic trading, risk assessment, and investment decision support, and the paper contributes a reusable financial NLP pipeline together with a candid account of where the current evidence is, and is not, convincing. Full article
Show Figures

Figure 1

14 pages, 1805 KB  
Proceeding Paper
Sentiment Analysis on Platform X Regarding the Impact of Generative AI
by Ronald Sukwadi, Riana Magdalena Silitonga, Kil Dong A, Davin Givson Saptianus, Jason Adrian Gotama, Samuel, Nicholas Evan Gunawan and Eka Rizqy Mahardika
Eng. Proc. 2026, 141(1), 6; https://doi.org/10.3390/engproc2026141006 - 4 Jun 2026
Viewed by 117
Abstract
In the rapidly evolving era, with the advancement of AI technology in education, Chat Generative Pre-trained Transformer (ChatGPT) is widely used in education to help students simplify the learning process. In other words, the implementation of ChatGPT makes the learning process more efficient [...] Read more.
In the rapidly evolving era, with the advancement of AI technology in education, Chat Generative Pre-trained Transformer (ChatGPT) is widely used in education to help students simplify the learning process. In other words, the implementation of ChatGPT makes the learning process more efficient and relevant. This study was conducted to analyze sentiment from social media platforms such as X to determine the impact of ChatGPT’s use in higher education in Indonesia. The research method involves data collection using the data crawling method for the X platform, which is integrated with the RapidMiner application. This sentiment analysis aims to identify trends in positive, negative, and neutral sentiment towards the use of ChatGPT in higher education in Indonesia and Thailand by using the Naive Bayes Classifier classification method and the Cross-Industry Standard Process for Data Mining method to design, execute, and evaluate data analytics projects. This analysis is expected to provide an initial overview of emerging sentiment trends as well as insights into how ChatGPT is perceived in the higher education environment. Overall, the results of this study provide an overview of public perception regarding the influence of ChatGPT in higher education in Indonesia and serve as a foundation for developing policies related to more responsible AI implementation in the academic environment. Full article
Show Figures

Figure 1

32 pages, 601 KB  
Article
BioHARP: A Feasibility Framework Toward Bio-Adaptive Human Risk Profiling for Phishing with Cost-Sensitive Learning and Scenario-Based Physiological Fusion Design
by Seydanur Ahi Duman, Rukiye Hayran and Ibrahim Sogukpinar
Appl. Sci. 2026, 16(11), 5665; https://doi.org/10.3390/app16115665 - 4 Jun 2026
Viewed by 199
Abstract
Phishing susceptibility reflects both stable psychological traits and transient user states, but confirmed victim cases remain rare in survey studies. This study evaluated BioHARP, a feasibility framework that pairs an outcome-independent psychometric prior with a prospective bio-adaptive fusion design. Using N=136 [...] Read more.
Phishing susceptibility reflects both stable psychological traits and transient user states, but confirmed victim cases remain rare in survey studies. This study evaluated BioHARP, a feasibility framework that pairs an outcome-independent psychometric prior with a prospective bio-adaptive fusion design. Using N=136 anonymized respondents (12 strict victims), we constructed 69 pre-incident predictors after excluding administrative metadata, exposure indicators, and post-incident response items. A cost-sensitive TabTransformer was trained without synthetic minority generation and benchmarked against six conventional tabular baselines and FT-Transformer under identical splits, unified preprocessing, and model-appropriate cost-sensitive imbalance handling. Out-of-sample performance was primarily assessed with a 60-seed repeated stratified hold-out protocol with fixed four-positive/thirty-negative test composition. Across the sixty splits, TabTransformer yielded a mean AUC of 0.534±0.157, whereas CatBoost yielded 0.736±0.108. On fixed Seed 100, TabTransformer reached AUC =0.8167 and CatBoost AUC =0.775; for the single-init TabTransformer, this was the best-observed split and was therefore interpreted as an optimistic upper-end point estimate. Threshold-dependent metrics were reported separately as an exploratory analysis with explicit leakage labeling. The physiological fusion layer was evaluated as an outcome-informed oracle upper bound, reaching AUC =0.944 on Seed 100 and 0.878±0.058, range [0.73, 0.98], across 70 alternative scenario RNG seeds. This result was interpreted strictly as theoretical headroom rather than deployment-calibrated performance. Overall, BioHARP was framed as a feasibility framework with a clearly bounded physiological-fusion design and explicit calibration and sensor requirements for future deployment-ready bio-adaptive detectors. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

36 pages, 10912 KB  
Article
Waterbody Extraction from the Perspective of RGB+X Semantic Segmentation
by Zhechen Yang, Wangrui Zhang, Qi Zhang, Zongbao Hong, Danjie Cheng, Qiao Xu, Yan Meng, Yangjie Sun and Yuxuan Liu
Remote Sens. 2026, 18(11), 1824; https://doi.org/10.3390/rs18111824 - 3 Jun 2026
Viewed by 397
Abstract
Waterbody extraction is of great significance for water resource investigation and monitoring. In addition to RGB bands, most common satellite images have a near-infrared (NIR) band. By combining these RGB-NIR bands, certain water, vegetation, and shadow indices can be calculated. The near-infrared band [...] Read more.
Waterbody extraction is of great significance for water resource investigation and monitoring. In addition to RGB bands, most common satellite images have a near-infrared (NIR) band. By combining these RGB-NIR bands, certain water, vegetation, and shadow indices can be calculated. The near-infrared band and these indices are very similar to the X modality in RGB+X data (common examples include RGB-D and RGB-Thermal). However, at present, no studies have thoroughly examined multimodal feature fusion from the RGB+X perspective in order to extract waterbodies with high precision. As a result, existing algorithms do not fully utilize satellite image information and have limited generalization ability. To overcome this limitation, we propose a dual-complexity backbone for waterbody extraction from the perspective of RGB+X data semantic segmentation. Its complex Transformer branch is used to extract RGB modality features, while its simple CNN branch is used to extract X modality features. This network structure can effectively capture multimodal, global, and local features in remote sensing images. It can also fully leverage the fact that the scale of RGB image datasets in computer vision is significantly larger than that of remote sensing waterbody extraction datasets. If a large pretrained model is used in the RGB branch, it is unnecessary to freeze the weights. Instead, both branches can be trained jointly, allowing the RGB branch to better adapt to the remote sensing waterbody extraction task without raising concerns that fine-tuning might undermine the pretrained model’s strong representation capability. We also propose two X modality configurations with strong generalization performance. To fully fuse multimodal features, we design a hybrid fusion module combining a CNN and a cross-attention mechanism. To integrate the multi-scale features, we employ a multi-scale Transformer structure in the RGB branch and design a multi-scale decoder. Our algorithm achieves state-of-the-art performance on the GID-5 dataset and competitive performance on the S1S2-Water dataset. Furthermore, it significantly outperforms existing methods in cross-dataset zero-shot transfer between the two datasets, with IoU/F1-score gains of 26.08%/27.33% on GID-5 and 38.74%/31.37% on S1S2-Water over previous SOTA methods. Our processing paradigm of modeling RGB-NIR remote sensing images as RGB+X data shows potential for generalization to other multi-modal remote sensing tasks. The dual-complexity backbone we design also has potential to be extended to other tasks that transfer large pretrained RGB models to remote sensing imagery with RGB-NIR four bands or even more spectral bands. We have open-sourced the code and trained models used in this research. Full article
(This article belongs to the Special Issue Foundation Model-Based Multi-Modal Data Fusion in Remote Sensing)
Show Figures

Figure 1

35 pages, 5194 KB  
Article
GRASP: Graph-Enhanced Retrieval for Accurate Schema Pruning in Text-to-SQL
by Xiangjun Cheng, Hongmei Zhang, Chao Li and Sining Xu
ISPRS Int. J. Geo-Inf. 2026, 15(6), 248; https://doi.org/10.3390/ijgi15060248 - 2 Jun 2026
Viewed by 220
Abstract
Recent advances in land system research depend heavily on efficient access to large-scale, multi-source remote sensing spatiotemporal databases. Although Text-to-SQL provides natural language interfaces, the scale and spatial complexity of remote sensing schemas generate significant noise for large language models, increasing inference costs [...] Read more.
Recent advances in land system research depend heavily on efficient access to large-scale, multi-source remote sensing spatiotemporal databases. Although Text-to-SQL provides natural language interfaces, the scale and spatial complexity of remote sensing schemas generate significant noise for large language models, increasing inference costs and latency. This study presents graph-enhanced retrieval for accurate schema pruning (GRASP), a graph-based framework for schema pruning in remote sensing information systems. GRASP frames schema pruning as a semantic retrieval task and constructs a heterogeneous graph that represents both question semantics and database structure. By integrating a relation-aware transformer, a relational graph attention network, and pre-trained BERT representations, GRASP enhances schema understanding and supports joint table-column prediction through entity-level cross-attention. A dual-task objective combining contrastive learning with dynamic-threshold prediction mitigates class imbalance, while database value sampling and demonstration retrieval optimize inference performance. Experiments show that GRASP substantially improves schema pruning in spatiotemporal query scenarios: a 7B open-source LLM with GRASP surpasses an unaugmented 32B model on Spider; meanwhile, the framework also yields promising results on SpatialSQL, achieving a favorable balance among accuracy, cost, and deployment flexibility. GRASP provides a practical pathway for interdisciplinary researchers to query remote sensing databases in natural language, aiding spatiotemporal analysis. Full article
(This article belongs to the Special Issue LLM4GIS: Large Language Models for GIS)
Show Figures

Figure 1

36 pages, 30361 KB  
Article
From Local Training to Large-Scale Mapping: A Comparative Assessment of Machine Learning and Deep Learning for Transferable Satellite-Derived Bathymetry
by Hsiao-Jou Hsu and Joachim Moortgat
Remote Sens. 2026, 18(11), 1768; https://doi.org/10.3390/rs18111768 - 1 Jun 2026
Viewed by 316
Abstract
Satellite-derived bathymetry (SDB) provides a cost-effective means for mapping shallow-water depths, yet its scalability and cross-regional generalizability remain challenging in optically complex coastal environments. This study systematically evaluates machine learning (ML) and deep learning (DL) approaches for transferable SDB over the 0–20 m [...] Read more.
Satellite-derived bathymetry (SDB) provides a cost-effective means for mapping shallow-water depths, yet its scalability and cross-regional generalizability remain challenging in optically complex coastal environments. This study systematically evaluates machine learning (ML) and deep learning (DL) approaches for transferable SDB over the 0–20 m depth range using multispectral Sentinel-2 imagery. A Random Forest model and four deep learning architectures–ResNet-50, ResNet-101, EfficientNet-B4, and ConvNeXt-Large–are developed and trained using data from Pratas Island (South China Sea) and selected reef regions of the Great Barrier Reef (GBR), and subsequently evaluated on spatially independent intra-regional and cross-regional test areas to assess generalization performance. Model sensitivity is investigated with respect to key training configurations, including loss-function design and data-splitting strategy. To enhance shallow-water learning, we introduce a Smooth Weight Function (SWF)-weighted RMSE loss that emphasizes near-surface depths and compare it with conventional RMSE and relative percentage error (RPE) objectives. In terms of training data, preserving spatial continuity during training substantially improves both numerical accuracy and structural consistency of predictions compared with random patch splitting. While the Random Forest model performs competitively in intra-regional tests, its accuracy degrades under cross-regional transfer (RMSE increasing from 1.53 m to 2.99–3.78 m). Deep learning models, although not always outperforming Random Forest in intra-regional settings, exhibit greater robustness to geographic shift. Using the spatially continuous training strategy, intra-regional RMSE ranges from 1.15 to 1.92 m over the full 0–20 m range, with shallow-water RMSE as low as 0.26 m for depths ≤ 3 m. Cross-regional transfer to geographically independent reefs yields moderate RMSE values of approximately 2.46–2.98 m (0–20 m range), indicating that geographic transfer remains challenging despite meaningful improvements over Random Forest. We further benchmark the proposed architectures against a task-specific bathymetry network using the public MagicBathyNet dataset. Under a unified 0–16 m shallow-water configuration using aerial RGB imagery, the proposed models achieve RMSE values between 0.19 and 0.22 m, outperforming both the baseline U-Net and the transformer-based bathymetry architecture while using substantially fewer parameters. In addition, we exploit multi-temporal repeat imagery for both training and inference, which increases training diversity and improves robustness to temporal variability arising from changing sun angles, atmospheric conditions, water properties, and tides. During inference, predictions from multiple repeat images are aggregated using the median to reduce noise and improve stability. Finally, we release optimized network architectures and pretrained weights to facilitate scalable application to new sites. This work demonstrates a practical pathway toward transferable, large-area SDB from multispectral satellite imagery using deep learning. Full article
(This article belongs to the Special Issue Underwater Remote Sensing: Status, New Challenges and Opportunities)
Show Figures

Figure 1

37 pages, 1956 KB  
Article
Causality-Aware and Explainable Self-Supervised Spatio-Temporal Graph Learning for Hardware Trojan Detection
by Khalil M. Abdelnaby
Symmetry 2026, 18(6), 939; https://doi.org/10.3390/sym18060939 - 29 May 2026
Viewed by 195
Abstract
As hardware Trojans (HTs) are becoming increasingly stealthy in global semiconductor supply chains, the need for both robust and explainable detection methods is pressing. The use of deep learning models (e.g., Siamese networks, Transformer models) in side-channel signals has shown promising detection accuracy. [...] Read more.
As hardware Trojans (HTs) are becoming increasingly stealthy in global semiconductor supply chains, the need for both robust and explainable detection methods is pressing. The use of deep learning models (e.g., Siamese networks, Transformer models) in side-channel signals has shown promising detection accuracy. Yet, they are black-box, data-intensive, and do not expose the causal, structural, and temporal relationships that indicate the presence of HTs. In this paper, we present a causality-focused and explainable detection framework that goes beyond pattern matching. We develop a Self-Supervised Spatio-Temporal Graph Neural Network (SST-GNN) that embeds spatio-temporal side-channel information. Our approach builds a graph that models gate-level components as nodes with temporal power and electromagnetic (EM) features, and functional and physical connections as edges. To address label scarcity, a common problem in real-world applications, we leverage a self-supervised pretraining approach. In particular, a context-aware contrastive loss allows the model to differentiate valid augmentations of benign subgraphs and their side-channel signatures, thus capturing general representations of benign components without Trojan labels. This involves a Causality-Aware GNN (CA-GNN) layer, which embeds differentiable causal discovery into graph learning. This process decouples correlation from causation, identifying the pathways potentially affected by HT trigger and payload. To explain decision making, a gradient-based graph explainer localizes minimal decisive subcircuits and pivotal time windows, generating intuitive detection reports. We evaluated our method on the IEEE Hardware Trojan Side-Channel Dataset (with netlist data), achieving state-of-the-art results (F1 > 0.98). In particular, the model achieves over 60% improvement in Trojan localization precision and false-positive rate, compared to Transformer-based approaches, with high label efficiency and adversarial robustness. Full article
(This article belongs to the Section Computer)
Show Figures

Figure 1

Back to TopTop