Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (90)

Search Parameters:
Keywords = scene parsing

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 3434 KB  
Article
Large Language Model with Integrated Ontology and Inference Chain Constraints for Generative Information Extraction from Metallurgical Lifting Equipment Failure Reports
by Bin Zhou, Xingwang Shen and Jinsong Bao
Appl. Sci. 2026, 16(12), 6178; https://doi.org/10.3390/app16126178 - 18 Jun 2026
Viewed by 295
Abstract
Metallurgical lifting equipment operates under prolonged heavy-load, high-impact, and complex working conditions. The resulting failure reports contain rich field knowledge applicable to fault diagnosis and predictive maintenance. Nevertheless, reliably extracting traceable, structured knowledge from procedural and implicit maintenance records remains a significant challenge. [...] Read more.
Metallurgical lifting equipment operates under prolonged heavy-load, high-impact, and complex working conditions. The resulting failure reports contain rich field knowledge applicable to fault diagnosis and predictive maintenance. Nevertheless, reliably extracting traceable, structured knowledge from procedural and implicit maintenance records remains a significant challenge. To address this, the paper proposes a generative information extraction method for large language models (LLMs) that integrates ontology schema with inference chain constraints, targeting knowledge extraction and knowledge graph construction from failure reports of metallurgical lifting equipment, named generative constrained information extraction for operations and maintenance (GCIE-OM). A domain ontology schema is first constructed, defining seven entity types and nine relation types to establish explicit knowledge boundaries for structured LLM generation. An inference chain-assisted structured parsing method, termed IC-ASP, is then designed to guide the model through a sequential extraction pipeline comprising scene identification, scope of entity boundary, inference of relation type, evidence traceability with localization, and triple output. This stepwise process strengthens the model’s capacity to comprehend equipment hierarchies, fault evolution chains, and maintenance action logic. Building on this, ChatGLM or LLaMA serves as the backbone model and is adapted to the target domain via LoRA fine-tuning. Entity alignment and character-level source localization mechanisms are further introduced to establish precise mappings between generated outputs and their textual evidence in the source documents. The extracted results are ultimately converted into standardized knowledge triples and stored in a Neo4j graph database. Based on this, a prototype system for generative information extraction is designed and implemented to demonstrate the practical effectiveness and adaptability of the proposed method. Experimental results show that the proposed method outperforms baseline methods across entity recognition, relation extraction, and structured output quality, providing robust knowledge support for fault tracing and predictive maintenance of metallurgical lifting equipment. Full article
Show Figures

Figure 1

23 pages, 7625 KB  
Article
MultiDecNet: An Ensemble-Based Semantic Segmentation Architecture for Urban Scene Understanding
by Büşra Emek Soylu and Mehmet Serdar Güzel
Information 2026, 17(6), 540; https://doi.org/10.3390/info17060540 - 1 Jun 2026
Viewed by 362
Abstract
Semantic segmentation is a fundamental task in computer vision that aims to assign a categorical label to each pixel in an image, facilitating dense and detailed scene understanding. This pixel-level classification is especially crucial in autonomous driving, where accurate environmental perception is vital [...] Read more.
Semantic segmentation is a fundamental task in computer vision that aims to assign a categorical label to each pixel in an image, facilitating dense and detailed scene understanding. This pixel-level classification is especially crucial in autonomous driving, where accurate environmental perception is vital for dependable object detection and safe decision-making. In this study, we propose MultiDecNet, a novel multi-decoder semantic segmentation framework designed to capture both macroscopic scene layouts and fine-grained spatial boundaries in complex urban environments. Drawing inspiration from classical networks, MultiDecNet incorporates a parallel dual-branch decoding strategy that simultaneously leverages the multi-scale context modeling of the Pyramid Pooling Module (PPM) and the structural refinement capabilities of Atrous Spatial Pyramid Pooling (ASPP). To explore the impact of modern backbone representations, we structurally modernize the feature extraction pipeline by introducing the contemporary ConvNeXt convolutional architecture as an alternative to traditional ResNet101 backbones. We extensively evaluate and compare the baseline configurations alongside our proposed MultiDecNet using both ResNet101 and ConvNeXt-Large backbones on the benchmark Cityscapes dataset. The quantitative assessments demonstrate that the MultiDecNet architecture consistently provides highly competitive performance within the scope of this comparative study, with the MultiDecNet-ConvNeXt variant achieving favorable overall scores among the evaluated methods. Furthermore, a granular, class-wise IoU and training dynamics analysis reveals that while traditional networks retain competitive boundaries for localized minority targets, the modern ConvNeXt backbone ensures faster convergence stability and balanced contextual mastery over large-scale driving layouts. Ultimately, these findings offer critical insights into architectural synergy and backbone selection, presenting a robust, scalable, and well-balanced solution for advanced autonomous navigation systems. Full article
(This article belongs to the Special Issue Computer Vision for Security Applications, 2nd Edition)
Show Figures

Graphical abstract

28 pages, 7499 KB  
Article
HOSG-Nav: Hierarchical Open-Vocabulary Semantic Graph Navigation for Language-Guided Global Planning in 3D Gaussian Scenes
by Yuchen Li, Kai Qin, Weiyi Chen and Haitao Wu
Electronics 2026, 15(10), 2179; https://doi.org/10.3390/electronics15102179 - 19 May 2026
Viewed by 450
Abstract
Natural-language-driven robot navigation in complex indoor environments requires the joint capability of high-fidelity scene representation, structured semantic reasoning, and executable path planning. To address this challenge, this paper proposes HOSG-Nav, a unified framework for natural-language-driven global navigation that integrates open-vocabulary 3D Gaussian scene [...] Read more.
Natural-language-driven robot navigation in complex indoor environments requires the joint capability of high-fidelity scene representation, structured semantic reasoning, and executable path planning. To address this challenge, this paper proposes HOSG-Nav, a unified framework for natural-language-driven global navigation that integrates open-vocabulary 3D Gaussian scene representation, hierarchical semantic scene graph construction, and large-language-model-driven planning. First, an open-vocabulary 3D Gaussian field is constructed to jointly encode scene geometry, appearance, and semantic information, where compressed CLIP features are lifted into continuous 3D space and depth supervision is introduced to enhance geometric stability and metric-scale consistency. Second, the optimized Gaussian primitives are further abstracted into a semantic scene graph with a region–object hierarchical structure and traversable topological relations to support structured environment understanding. Finally, for natural language instructions, hierarchical semantic parsing is performed with the assistance of a large language model, and executable global navigation paths are generated through cross-modal target retrieval and graph-search-based planning. Experimental results on the Replica dataset demonstrate that HOSG-Nav achieves competitive performance in scene representation, semantic target retrieval, and global navigation, validating the effectiveness of jointly integrating multimodal 3D representation, hierarchical semantic abstraction, and language-guided planning. Full article
Show Figures

Figure 1

20 pages, 6641 KB  
Article
Topology-Aware Road Extraction from Remote Sensing Images Using Deep Learning and Graph-Based Connectivity Refinement
by Zixuan Teng, Zezhong Zheng, Xiangyang Sun and Hao Xue
ISPRS Int. J. Geo-Inf. 2026, 15(5), 208; https://doi.org/10.3390/ijgi15050208 - 9 May 2026
Viewed by 783
Abstract
Road networks are fundamental components of transportation infrastructure and play a crucial role in various geospatial applications. Although deep learning-based semantic segmentation models have achieved promising results in extracting roads from high-resolution remote sensing imagery, the resulting networks often suffer from topological fragmentation [...] Read more.
Road networks are fundamental components of transportation infrastructure and play a crucial role in various geospatial applications. Although deep learning-based semantic segmentation models have achieved promising results in extracting roads from high-resolution remote sensing imagery, the resulting networks often suffer from topological fragmentation due to occlusions and shadows. To address this issue, we propose a topology-aware road extraction method that integrates deep learning-based segmentation with a graph-based connectivity refinement strategy. Specifically, a Pyramid Scene Parsing Network (PSPNet) is first employed to generate initial road probability maps. Subsequently, a connectivity-oriented post-processing pipeline is introduced, which incorporates a multi-source cost function strategy and a direction-aware Dijkstra search algorithm. By utilizing endpoint tangent vectors as inertial weights, the algorithm effectively reconstructs fragmented segments while ensuring geometric smoothness and topological consistency. Furthermore, a dynamic road width restoration strategy is applied to transform refined skeletons into physically consistent road entities. Experiments conducted on two publicly available datasets, CHN6-CUG and DeepGlobe, demonstrate the effectiveness of the proposed method. Quantitative results show that the refinement process significantly enhances road connectivity with a minimal trade-off in pixel-level accuracy. Specifically, the Conn metric increases by 0.1989 on the CHN6-CUG dataset and 0.3055 on the DeepGlobe dataset, while MIoU remains high with only marginal decreases of 1.07% and 0.45%, respectively. These findings indicate that the method effectively restores structural continuity, helping with reliable road network generation and subsequent integration into Geographic Information System (GIS)-based applications such as urban planning and autonomous navigation. Full article
Show Figures

Figure 1

45 pages, 11221 KB  
Article
MS-DARNet: A Lightweight Multi-Scale Selective Dilated Attention Residual Network for Remote Sensing Scene Classification
by Jiawei Huang and Chengjun Xu
Remote Sens. 2026, 18(8), 1235; https://doi.org/10.3390/rs18081235 - 19 Apr 2026
Viewed by 443
Abstract
High-resolution remote sensing image (HRRSI) scene classification faces challenges such as significant target scale variations, complex background interference, and the difficult spatial parsing of dense objects (such as tightly packed buildings in dense residential areas or scattered aircraft on aprons), while existing models [...] Read more.
High-resolution remote sensing image (HRRSI) scene classification faces challenges such as significant target scale variations, complex background interference, and the difficult spatial parsing of dense objects (such as tightly packed buildings in dense residential areas or scattered aircraft on aprons), while existing models struggle to balance computational efficiency and classification accuracy. To address these issues, this paper proposes a lightweight Multi-Scale Selective Dilated Attention Residual Network (MS-DARNet). The model utilizes a Multi-branch Dilated Feature Extraction (MDFE) module, employing parallel convolutional branches with varying dilation rates to dynamically expand the receptive field and collaboratively extract multi-scale features without increasing parameter counts. Furthermore, a Context-Position Aware Attention (CPAA) module is introduced, combining a large kernel decomposition strategy to suppress irrelevant background noise with direction-aware feature aggregation to retain precise spatial coordinates for dense objects. Extensive experiments on the AID, NWPU-RESISC45, and RSD-WHU46 datasets show that MS-DARNet achieves superior classification accuracies of 97.78%, 94.53%, and 94.55%, respectively. Concurrently, it maintains a significantly low complexity of just 2.50 M parameters and 0.5940 GMACs. These findings demonstrate that MS-DARNet effectively achieves an optimal balance between lightweight architecture and exceptional classification performance for complex remote sensing scenes. Full article
Show Figures

Figure 1

9 pages, 1667 KB  
Proceeding Paper
Cost-Effective Device with Semantic Segmentation Capability for Real-Time Detection and Classification of Marine Litter in Benthic Coastal Areas
by John Paul T. Cruz, Josiah Izaak D. Lopez, Marlon V. Maddara, Karl Justin B. Nacito, Marites B. Tabanao, Vladimer B. Kobayashi and Roben A. Juanatas
Eng. Proc. 2026, 134(1), 34; https://doi.org/10.3390/engproc2026134034 - 7 Apr 2026
Viewed by 482
Abstract
Anthropogenic marine debris (AMD) in shallow coastal benthic areas poses serious threats to ecosystems, human health, and the economy. Addressing this issue is hindered by limited data on AMD distribution and classification. We explored the use of semantic segmentation, specifically Pyramid Scene Parsing [...] Read more.
Anthropogenic marine debris (AMD) in shallow coastal benthic areas poses serious threats to ecosystems, human health, and the economy. Addressing this issue is hindered by limited data on AMD distribution and classification. We explored the use of semantic segmentation, specifically Pyramid Scene Parsing Network (PSPNet) and Deep Convolutional Neural Network for Semantic Image Segmentation, Version 3, (DeepLabV3) models, for automated AMD detection and classification. The performance was evaluated using mean intersection over union (mIoU), pixel accuracy, and frames per second (FPS). PSPNet achieved a higher mIoU (77.03%) than DeepLabV3 (75.98%), indicating better object identification. However, DeepLabV3 outperformed PSPNet in pixel accuracy (92.24% vs. 92.01%) and FPS (8.83 vs. 6.92), making it more appropriate for real-time applications. To enable real-time identification and classification of AMD, the models are deployed in a minicomputer with adequate processing power, significantly enhancing the models’ frame rate during real-time image processing. While both models are effective, DeepLabV3 is recommended for real-time AMD segmentation. The study contributes to improving AMD monitoring and management in coastal environments through AI-driven solutions. Full article
Show Figures

Figure 1

19 pages, 2147 KB  
Article
Dual-Mamba-ResNet: A Novel Vision State Space Network for Aero-Engine Ablation Detection
by Xin Wang, Hai Shu, Yaxi Xu, Qiang Fu and Jide Qian
Aerospace 2026, 13(3), 273; https://doi.org/10.3390/aerospace13030273 - 15 Mar 2026
Viewed by 481
Abstract
With the rapid development of the aviation industry, engines operate under extreme conditions of high temperature, high pressure, and high vibration, making them prone to surface damage such as ablation. Ablation not only affects the structural integrity of engine components but also threatens [...] Read more.
With the rapid development of the aviation industry, engines operate under extreme conditions of high temperature, high pressure, and high vibration, making them prone to surface damage such as ablation. Ablation not only affects the structural integrity of engine components but also threatens flight safety, making efficient and accurate detection of paramount importance. Traditional detection methods rely on manual visual inspection and non-destructive testing, which suffer from high subjectivity and low efficiency. In recent years, deep learning has achieved significant progress in industrial defect detection. However, conventional CNN-and Transformer-based architectures still suffer from substantial computational overhead and inadequate boundary segmentation accuracy in aero-engine ablation detection. This paper proposes a novel dual-pathway network Visual State-Space Residual Neural Network (VSS-ResNet) based on Mamba that combines Visual State Space (VSS) modules with ResNet50. This architecture leverages the global modeling capability of VSS modules and the local feature extraction capability of CNNs, effectively enhancing the accuracy and robustness of ablation boundary detection with the support of multi-scale feature fusion modules. Experimental results demonstrate that the proposed method achieves superior performance in mIoU, mPA, and Acc compared to mainstream segmentation models such as U-Net, Pyramid Scene Parsing Network (PSPNet), and DeepLab V3+ on a self-constructed engine endoscopic ablation dataset, validating its potential in intelligent aero-engine inspection. Full article
(This article belongs to the Section Aeronautics)
Show Figures

Figure 1

17 pages, 4021 KB  
Article
Dangerous Goods Detection in X-Ray Security Inspection Images Based on Improved YOLOv8-seg
by Ting Wang, Pengfei Yuan and Aili Wang
Electronics 2026, 15(5), 1112; https://doi.org/10.3390/electronics15051112 - 7 Mar 2026
Viewed by 681
Abstract
In X-ray security inspection imagery, hazardous object detection is challenged by severe object overlap/occlusion, ambiguous boundaries of small objects, and complex texture representations caused by material diversity. Although YOLOv8-seg provides real-time instance segmentation capability, it still has clear limitations in this application scenario. [...] Read more.
In X-ray security inspection imagery, hazardous object detection is challenged by severe object overlap/occlusion, ambiguous boundaries of small objects, and complex texture representations caused by material diversity. Although YOLOv8-seg provides real-time instance segmentation capability, it still has clear limitations in this application scenario. Specifically, the original SPPF module has limited ability to model long-range spatial dependencies, making it difficult to accurately separate boundaries of densely overlapped objects, while the C2f module is insufficient for multi-scale feature parsing of hazardous items with diverse sizes and materials and introduces feature redundancy, which degrades segmentation accuracy in occluded scenes. To address these issues, this paper proposes an improved YOLOv8-seg framework for X-ray hazardous object detection, termed LM-YOLOv8. For feature enhancement, an SPPF-LSKA module is constructed by integrating large-kernel separable attention with dynamic receptive-field adjustment, thereby improving global contextual modeling and alleviating boundary ambiguity. For multi-scale feature fusion, a C2f-MSC module is designed by combining multi-branch dilated convolutions with the C2f structure to enhance complex contour parsing and cross-scale feature interaction. Experiments on the PIDray dataset show that the proposed method achieves 84.8% mAP50 in instance segmentation, representing an improvement of approximately 4.0 percentage points over the baseline YOLOv8-seg. In addition, the method demonstrates stronger robustness on challenging hard/hidden subsets, validating its effectiveness for X-ray security inspection hazardous object detection. Full article
(This article belongs to the Special Issue Image Processing, Target Tracking and Recognition System Design)
Show Figures

Figure 1

12 pages, 1584 KB  
Article
Deep Learning Segmentation Models for UAV-Based Detection of Crop Damage in Rapeseed Using RGB Imagery
by Barbara Dobosz, Dariusz Gozdowski, Jerzy Koronczok, Jan Žukovskis and Elżbieta Wójcik-Gront
Agriculture 2026, 16(5), 536; https://doi.org/10.3390/agriculture16050536 - 27 Feb 2026
Cited by 1 | Viewed by 771
Abstract
The objective of this study was to evaluate the accuracy of detecting crop damage caused by wild boar in rapeseed fields using UAV (unmanned aerial vehicle)-derived RGB (red, green and blue) imagery and deep learning segmentation models. The experiments were conducted on rapeseed [...] Read more.
The objective of this study was to evaluate the accuracy of detecting crop damage caused by wild boar in rapeseed fields using UAV (unmanned aerial vehicle)-derived RGB (red, green and blue) imagery and deep learning segmentation models. The experiments were conducted on rapeseed crops at full maturity shortly before harvest in central-western Poland in 2021. Four convolutional neural network architectures—U-Net (U-shaped network), U-Net++, DeepLabV3+ (deep learning + labelling), and PSPNet (Pyramid Scene Parsing Network)—were benchmarked using two input configurations: RGB imagery alone and RGB combined with the topographic position index (TPI) derived from a digital surface model (DSM). Model performance was assessed using overall accuracy, F1-score (harmonic mean of precision and recall), and Intersection over Union (IoU), with class-specific metrics reported to provide a realistic evaluation of damaged-area detection. For RGB-only data, overall accuracy ranged from 0.957 to 0.972, while damaged-class F1 and IoU reached 0.752 and 0.603, respectively, for the best-performing model (U-Net). When RGB data were supplemented with TPI, overall accuracy and damaged-class metrics changed only slightly, indicating limited benefit from the topographic feature under these field conditions. Non-damaged crop areas were consistently well-classified (F1 > 0.977, IoU > 0.955). These results confirm that UAV-based RGB imagery enables reliable late-season assessment of wildlife-induced crop damage, and that reporting class-specific metrics in spatially independent test sets is essential for realistic performance evaluation. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

25 pages, 62812 KB  
Article
From Prompts to Self-Prompts: Parameter-Efficient Multi-Label Remote Sensing via Mask-Guided Classification
by Ge Qu, Xiongwei Guan, Fei Wen and Xinyu Zou
Remote Sens. 2026, 18(3), 518; https://doi.org/10.3390/rs18030518 - 5 Feb 2026
Viewed by 820
Abstract
Multi-label remote sensing scene classification (MLRSSC) requires autonomous discovery of all relevant land-cover categories without human guidance. Conventional expert classifiers return only label vectors without spatial evidence, while foundation segmenters (e.g., SAM, RemoteSAM) remain passively dependent on external prompts—misaligned with autonomous interpretation. We [...] Read more.
Multi-label remote sensing scene classification (MLRSSC) requires autonomous discovery of all relevant land-cover categories without human guidance. Conventional expert classifiers return only label vectors without spatial evidence, while foundation segmenters (e.g., SAM, RemoteSAM) remain passively dependent on external prompts—misaligned with autonomous interpretation. We introduce SAFI-XRS, a parameter-efficient self-prompted framework that transforms passive prompting into active scene parsing. By training only <2% of a 332M-parameter segmenter (∼2.4M parameters), SAFI-XRS generates class-aligned queries from images via a Semantic Query Generator (SQR), replacing external prompts with self-generated conditioning. A Mask-Guided Classifier (MGC) aggregates spatial evidence into label confidences, enabling mask-based explainability. Experiments on UCM-ML, DFC15-ML, and AID-ML show SAFI-XRS surpasses text-prompted foundation segmenters (+3.9/+3.8 mAP on balanced datasets) while achieving 6.8× parameter efficiency compared to expert models, validating a practical path toward autonomous, explainable RS scene understanding. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

15 pages, 1262 KB  
Article
Structured Scene Parsing with a Hierarchical CLIP Model for Images
by Yunhao Sun, Xiaoao Chen, Heng Chen, Yiduo Liang and Ruihua Qi
Appl. Sci. 2026, 16(2), 788; https://doi.org/10.3390/app16020788 - 12 Jan 2026
Viewed by 618
Abstract
Visual Relationship Prediction (VRP) is crucial for advancing structured scene understanding, yet existing methods struggle with ineffective multimodal fusion, static relationship representations, and a lack of logical consistency. To address these limitations, this paper proposes a Hierarchical CLIP model (H-CLIP) for structured scene [...] Read more.
Visual Relationship Prediction (VRP) is crucial for advancing structured scene understanding, yet existing methods struggle with ineffective multimodal fusion, static relationship representations, and a lack of logical consistency. To address these limitations, this paper proposes a Hierarchical CLIP model (H-CLIP) for structured scene parsing. Our approach leverages a pre-trained CLIP backbone to extract aligned visual, textual, and spatial features for entities and their union regions. A multi-head self-attention mechanism then performs deep, dynamic multimodal fusion. The core innovation is a consistency and reversibility verification mechanism, which imposes algebraic constraints as a regularization loss to enforce logical coherence in the learned relation space. Extensive experiments on the Visual Genome dataset demonstrate the superiority of the proposed method. H-CLIP significantly outperforms state-of-the-art baselines on the predicate classification task, achieving a Recall@50 score of 64.31% and a Mean Recall@50 of 36.02%, thereby validating its effectiveness in generating accurate and logically consistent scene graphs even under long-tailed distributions. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

21 pages, 3379 KB  
Article
KORIE: A Multi-Task Benchmark for Detection, OCR, and Information Extraction on Korean Retail Receipts
by Mahmoud SalahEldin Kasem, Mohamed Mahmoud, Mostafa Farouk Senussi, Mahmoud Abdalla and Hyun Soo Kang
Mathematics 2026, 14(1), 187; https://doi.org/10.3390/math14010187 - 4 Jan 2026
Viewed by 3399
Abstract
We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP [...] Read more.
We introduce KORIE, a curated benchmark of 748 Korean retail receipts designed to evaluate scene text detection, Optical Character Recognition (OCR), and Information Extraction (IE) under challenging digitization conditions. Unlike existing large-scale repositories, KORIE consists exclusively of receipts digitized via flatbed scanning (HP LaserJet MFP), specifically selected to preserve complex thermal printing artifacts such as ink fading, banding, and mechanical creases. We establish rigorous baselines across three tasks: (1) Detection, comparing Weakly Supervised Object Localization (WSOL) against state-of-the-art fully supervised models (YOLOv9, YOLOv10, YOLOv11, and DINO-DETR); (2) OCR, benchmarking Tesseract, EasyOCR, PaddleOCR, and a custom Attention-based BiGRU; and (3) Information Extraction, evaluating the zero-shot capabilities of Large Language Models (Llama-3, Qwen-2.5) on structured field parsing. Our results identify YOLOv11 as the optimal detector for dense receipt layouts and demonstrate that while PaddleOCR achieves the lowest Character Error Rate (15.84%), standard LLMs struggle in zero-shot settings due to domain mismatch with noisy Korean receipt text, particularly for price-related fields (F1 scores ≈ 25%). We release the dataset, splits, and evaluation code to facilitate reproducible research on degraded Hangul document understanding. Full article
Show Figures

Figure 1

19 pages, 4893 KB  
Article
LLMs in Staging: An Orchestrated LLM Workflow for Structured Augmentation with Fact Scoring
by Giuseppe Trimigno, Gianfranco Lombardo, Michele Tomaiuolo, Stefano Cagnoni and Agostino Poggi
Future Internet 2025, 17(12), 535; https://doi.org/10.3390/fi17120535 - 24 Nov 2025
Cited by 1 | Viewed by 988
Abstract
Retrieval-augmented generation (RAG) enriches prompts with external knowledge, but it often relies on additional infrastructure that may be impractical in resource-constrained or offline settings. In addition, updating the internal knowledge of a language model through retraining is costly and inflexible. To address these [...] Read more.
Retrieval-augmented generation (RAG) enriches prompts with external knowledge, but it often relies on additional infrastructure that may be impractical in resource-constrained or offline settings. In addition, updating the internal knowledge of a language model through retraining is costly and inflexible. To address these limitations, we propose an explainable and structured prompt augmentation pipeline that enhances inputs using pre-trained models and rule-based extractors, without requiring external sources. We describe this approach as an orchestrated LLM workflow: a structured sequence in which lightweight LLM modules assume specialized roles. Specifically, (1) an extractor module identifies factual triples from input prompts by combining dependency parsing with a rule-based extraction algorithm; (2) a scorer module, based on a generic lightweight LLM, evaluates the importance of each triple via its self-attention patterns, leveraging internal beliefs to promote explainability and trustworthy cooperation with the downstream model; (3) a performer module processes the augmented prompt for downstream tasks in supervised fine-tuning or zero-shot settings. Much like in a theater staging, each module operates transparently behind the scenes to support and elevate the performer’s final output. We evaluate this approach across multiple performer architectures (encoder-only, encoder-decoder, and decoder-only) and NLP tasks (multiple-choice QA, open-book QA, and summarization). Our results show that this structured augmentation with scored facts yields consistent improvements compared to baseline prompting: up to a 28.78% accuracy improvement for multiple-choice QA, up to a 9.42% BLEURT improvement for open-book QA, and up to a 18.14% ROUGE-L improvement for summarization. By decoupling knowledge scoring from task execution, our method provides a practical, interpretable, and low-cost alternative to RAG in static or knowledge-limited environments. Full article
Show Figures

Graphical abstract

23 pages, 5751 KB  
Article
Automatic Diagnosis, Classification, and Segmentation of Abdominal Aortic Aneurysm and Dissection from Computed Tomography Images
by Hakan Baltaci, Sercan Yalcin, Muhammed Yildirim and Harun Bingol
Diagnostics 2025, 15(19), 2476; https://doi.org/10.3390/diagnostics15192476 - 27 Sep 2025
Viewed by 1900
Abstract
Background/Objectives: Diagnosis of abdominal aortic aneurysm and abdominal aortic dissection (AAA and AAD) is of strategic importance as cardiovascular disease has fatal implications worldwide. This study presents a novel deep learning-based approach for the accurate and efficient diagnosis of abdominal aortic aneurysms [...] Read more.
Background/Objectives: Diagnosis of abdominal aortic aneurysm and abdominal aortic dissection (AAA and AAD) is of strategic importance as cardiovascular disease has fatal implications worldwide. This study presents a novel deep learning-based approach for the accurate and efficient diagnosis of abdominal aortic aneurysms (AAAs) and aortic dissections (AADs) from CT images. Methods: Our proposed convolutional neural network (CNN) architecture effectively extracts relevant features from CT scans and classifies regions as normal or diseased. Additionally, the model accurately delineates the boundaries of detected aneurysms and dissections, aiding in clinical decision-making. A pyramid scene parsing network has been built in a hybrid method. The layer block after the classification layer is divided into two groups: whether there is an AAA or AAD region in the abdominal CT image, and determination of the borders of the detected diseased region in the medical image. Results: In this sense, both detection and segmentation are performed in AAA and AAD diseases. Python programming has been used to assess the accuracy and performance results of the proposed strategy. From the results, average accuracy rates of 83.48%, 86.9%, 88.25%, and 89.64% were achieved using ResDenseUNet, INet, C-Net, and the proposed strategy, respectively. Also, intersection over union (IoU) of 79.24%, 81.63%, 82.48%, and 83.76% have been achieved using ResDenseUNet, INet, C-Net, and the proposed method. Conclusions: The proposed strategy is a promising technique for automatically diagnosing AAA and AAD, thereby reducing the workload of cardiovascular surgeons. Full article
(This article belongs to the Special Issue Artificial Intelligence and Computational Methods in Cardiology 2026)
Show Figures

Figure 1

23 pages, 4776 KB  
Article
Category-Guided Transformer for Semantic Segmentation of High-Resolution Remote Sensing Images
by Yue Ni, Jiahang Liu, Hui Zhang, Weijian Chi and Ji Luan
Remote Sens. 2025, 17(17), 3054; https://doi.org/10.3390/rs17173054 - 2 Sep 2025
Cited by 7 | Viewed by 3113
Abstract
High-resolution remote sensing images suffer from large intra-class variance, high inter-class similarity, and significant scale variations, leading to incomplete segmentation and imprecise boundaries. To address these challenges, Transformer-based methods, despite their strong global modeling capability, often suffer from feature confusion, weak detail representation, [...] Read more.
High-resolution remote sensing images suffer from large intra-class variance, high inter-class similarity, and significant scale variations, leading to incomplete segmentation and imprecise boundaries. To address these challenges, Transformer-based methods, despite their strong global modeling capability, often suffer from feature confusion, weak detail representation, and high computational cost. Moreover, existing multi-scale fusion mechanisms are prone to semantic misalignment across levels, hindering effective information integration and reducing boundary clarity. To address these issues, a Category-Guided Transformer (CIGFormer) is proposed. Specifically, the Category-Information-Guided Transformer Module (CIGTM) integrates global and local branches: the global branch combines window-based self-attention (WSAM) and window adaptive pooling self-attention (WAPSAM), using class predictions to enhance global context modeling and reduce intra-class and inter-class confusion; the local branch extracts multi-scale structural features to refine semantic representation and boundaries. In addition, an Adaptive Wavelet Fusion Module (AWFM) is designed, which leverages wavelet decomposition and channel-spatial joint attention for dynamic multi-scale fusion while preserving structural details. Extensive experiments on the ISPRS Vaihingen and Potsdam datasets demonstrate that CIGFormer, with only 21.50 M parameters, achieves outstanding performance in small object recognition, boundary refinement, and complex scene parsing, showing strong potential for practical applications. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Figure 1

Back to TopTop