Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (499)

Search Parameters:
Keywords = query strategy

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 2831 KB  
Article
DualGraphRAG: A Dual-View Graph-Enhanced Retrieval-Augmented Generation Framework for Reliable and Efficient Question Answering
by Mengqi Li and Rufu Qin
Appl. Sci. 2026, 16(5), 2221; https://doi.org/10.3390/app16052221 (registering DOI) - 25 Feb 2026
Abstract
Graph-enhanced Retrieval-Augmented Generation (RAG) frameworks, such as GraphRAG, improve large language model (LLM)-based question answering (QA) by constructing and leveraging structured, knowledge-condensed graph information. However, they still face challenges in complex multi-hop reasoning tasks and often incur substantial time and resource costs, resulting [...] Read more.
Graph-enhanced Retrieval-Augmented Generation (RAG) frameworks, such as GraphRAG, improve large language model (LLM)-based question answering (QA) by constructing and leveraging structured, knowledge-condensed graph information. However, they still face challenges in complex multi-hop reasoning tasks and often incur substantial time and resource costs, resulting in low efficiency. To address these limitations, we propose DualGraphRAG, a dual-view graph-enhanced RAG framework designed to achieve both high QA performance and computational efficiency for complex reasoning over open-domain corpora. Specifically, DualGraphRAG constructs a knowledge graph (KG) by automatically extracting triples from unstructured text using LLMs, and embeds KG nodes with unified text embeddings. For each query, multiple types of KG nodes are generated through a dedicated query enhancement module. Based on these nodes, DualGraphRAG employs a dual-view retrieval strategy to retrieve both one-hop triples that capture local context and shortest paths that compress global connectivity information, thereby facilitating answer generation. Experimental results show that, compared with NaiveRAG, GraphRAG, and LightRAG, DualGraphRAG achieves the best or competitive performance on benchmark datasets and significantly improves efficiency. Overall, DualGraphRAG organizes and exploits KG information in a dual-view manner, leveraging triples and shortest paths to offer a reliable and efficient framework for open-domain QA with complex multi-hop reasoning. Full article
(This article belongs to the Special Issue Large Language Models and Knowledge Computing)
18 pages, 4500 KB  
Article
Localizing Perceptual Artifacts in Synthetic Images for Image Quality Assessment via Deep-Learning-Based Anomaly Detection
by Zijin Yin
Electronics 2026, 15(5), 916; https://doi.org/10.3390/electronics15050916 - 24 Feb 2026
Abstract
While deep generative models, such as text-to-image diffusion, demonstrate strong capabilities in synthesizing photorealistic images, they frequently produce perceptual artifacts (e.g., distorted structures or unnatural textures) that require manual correction. Existing artifact localization methods typically rely on fully supervised training with large-scale pixel-level [...] Read more.
While deep generative models, such as text-to-image diffusion, demonstrate strong capabilities in synthesizing photorealistic images, they frequently produce perceptual artifacts (e.g., distorted structures or unnatural textures) that require manual correction. Existing artifact localization methods typically rely on fully supervised training with large-scale pixel-level annotations, which suffer from high labeling costs. To address these challenges, we propose a novel framework based on the core insight that perceptual artifacts can be fundamentally modeled as “semantic outliers”—regions that inherently fail to match any pre-defined semantic categories. Instead of learning specific artifact features, we introduce a Mask-based Semantic Rejection (MSR) mechanism within a semantic segmentation architecture. This mechanism leverages the “one-vs-all” property of object queries to identify regions that are consistently rejected by all pre-trained semantic categories. Furthermore, we design a flexible adaptation strategy that supports both zero-shot inference using pre-trained semantic knowledge and fine-tuning with a margin-based suppression objective to explicitly optimize the rejection boundary using minimal supervision. Comprehensive experiments across 11 synthesis tasks demonstrate that MSR significantly outperforms state-of-the-art methods, particularly in data-efficient scenarios. Specifically, the framework achieves mIoU improvements of 6.52% and 13.06% on the text-to-image task using only 10% and 50% of labeled samples, respectively, underscoring its superior capability. Full article
(This article belongs to the Special Issue Computer Vision and AI Algorithms for Diverse Scenarios)
Show Figures

Figure 1

25 pages, 3276 KB  
Article
SIDWA: Synthetic Image Detection Based on Discrete Wavelet Transform Stem and Deformable Sliding Window Cross-Attention
by Luo Li, Tianyi Lu, Jiaxin Song and Ke Cheng
Electronics 2026, 15(4), 891; https://doi.org/10.3390/electronics15040891 - 21 Feb 2026
Viewed by 89
Abstract
With the rapid evolution of Generative Adversarial Networks (GANs) and diffusion models (DMs), the detection of synthetic images faces significant challenges due to non-rigid artifacts and complex frequency biases. In this paper, we propose SIDWA, a novel dual-branch detection framework that leverages the [...] Read more.
With the rapid evolution of Generative Adversarial Networks (GANs) and diffusion models (DMs), the detection of synthetic images faces significant challenges due to non-rigid artifacts and complex frequency biases. In this paper, we propose SIDWA, a novel dual-branch detection framework that leverages the synergy between frequency and spatial domains. Within the spatial branch, we design a Deformable Sliding Window Cross-Attention (DSWA) module, which utilizes a learnable offset mechanism to dynamically warp the receptive field, effectively capturing distorted edges and non-linear texture features. Simultaneously, the Discrete Wavelet Transform (DWT) Stem decomposes input images into multi-scale sub-bands to preserve crucial high-frequency residues. Through a Frequency-Semantic Resonance Projector (FSRP) strategy, the semantic priors from the spatial branch act as queries to guide the model toward localized frequency anomalies, achieving a unified “where to look” and “how to analyze” approach. Experimental results for the SIDataset (SIDset) benchmark demonstrate that Synthetic Image Detection based on Discrete Wavelet Transform Stem and Deformable Sliding Window Cross-Attention (SIDWA) achieves superior performance, with an average accuracy exceeding 95% and a competitive inference time of 18.2 ms on an NVIDIA A100 GPU. Ablation studies further validate the critical role of learnable offsets and frequency integration in enhancing robustness and generalization. SIDWA offers an efficient and reliable forensic solution for combating the growing threats of sophisticated generative forgeries. Full article
Show Figures

Figure 1

23 pages, 1201 KB  
Article
Comparative Read Performance Analysis of PostgreSQL and MongoDB in E-Commerce: An Empirical Study of Filtering and Analytical Queries
by Jovita Urnikienė, Vaida Steponavičienė and Svetoslav Atanasov
Big Data Cogn. Comput. 2026, 10(2), 66; https://doi.org/10.3390/bdcc10020066 - 19 Feb 2026
Viewed by 200
Abstract
This paper presents a comparative analysis of read performance for PostgreSQL and MongoDB in e-commerce scenarios, using identical datasets in a resource-constrained single-host environment. The results demonstrate that PostgreSQL executes complex analytical queries 1.6–15.1 times faster, depending on the query type and data [...] Read more.
This paper presents a comparative analysis of read performance for PostgreSQL and MongoDB in e-commerce scenarios, using identical datasets in a resource-constrained single-host environment. The results demonstrate that PostgreSQL executes complex analytical queries 1.6–15.1 times faster, depending on the query type and data volume. The study employed synthetic data generation with the Faker library across three stages, processing up to 300,000 products and executing each of 6 query types 15 times. Both filtering and analytical queries were tested on non-indexed data in a controlled localhost environment with PostgreSQL 17.5 and MongoDB 7.0.14, using default configurations. PostgreSQL showed 65–80% shorter execution times for multi-criteria queries, while MongoDB required approximately 33% less disk space. These findings suggest that normalized relational schemas are advantageous for transactional e-commerce systems where analytical queries dominate the workload. The results are directly applicable to small and medium e-commerce developers operating in budget-constrained, single-host deployment environments when choosing between relational and document-oriented databases for structured transactional data with read-heavy analytical workloads. A minimal indexed validation confirms that the baseline trends remain consistent under a simple indexing configuration. Future work will examine broader indexing strategies, write-intensive workloads, and distributed deployment scenarios. Full article
Show Figures

Figure 1

25 pages, 8032 KB  
Article
Knowledge-Based Approach for the Digitalization and Analysis of Historic Built Heritage: Application in a Calabrian Context (Italy)
by Serena Buglisi, Livio De Luca, Massimo Lauria and Angela Quattrocchi
Heritage 2026, 9(2), 75; https://doi.org/10.3390/heritage9020075 - 15 Feb 2026
Viewed by 164
Abstract
The conservation process is iterative and interactive. Periodic updates stratify data across disciplines and time. Still the transition from raw data to structured knowledge is often slowed by procedural gaps and tooling limitations, creating a semantic divide between abundant digital resources and truly [...] Read more.
The conservation process is iterative and interactive. Periodic updates stratify data across disciplines and time. Still the transition from raw data to structured knowledge is often slowed by procedural gaps and tooling limitations, creating a semantic divide between abundant digital resources and truly intelligible data. This article proposes a methodological and operational approach for managing the continuity of the information flow within a digitalization process functional to a conservation strategy for the Historical Built Heritage. A graph-structured semantic knowledge base was developed and it is fed by data from heterogeneous sources (Building Information Modeling, reality-based annotation platforms and graph databases), organized according to an explicit conceptual model for representing the building’s diachronic evolution. Interaction and querying are mediated by a prototypical multidimensional visualization environment. The experimentation has proven to anticipate contextualization, to rationalize mapping, to harmonize heterogeneous resources, and to formalize knowledge for sharing and querying. Calabrian heritage, which is part of the region’s identity and subject to natural and anthropogenic risks, is the case of interest. Application scenarios are exemplified in the experiment on San Giovannello, Gerace (RC). Full article
Show Figures

Figure 1

14 pages, 3762 KB  
Article
An IF-MPWM Algorithm to Extend the Clean Bandwidth for All-Digital Transmitters
by Yutong Liu, Qiang Zhou, Jie Yang, Lei Zhu and Haoyang Fu
Electronics 2026, 15(4), 800; https://doi.org/10.3390/electronics15040800 - 13 Feb 2026
Viewed by 132
Abstract
In all-digital transmitters (ADTx), the in-band quantization noise generated by pulse coding provides only limited clean bandwidth (CBW), significantly increasing the difficulty of analog filter design. To address the constrained CBW of RF pulse sequences in ADTx, this paper proposes an optimization strategy [...] Read more.
In all-digital transmitters (ADTx), the in-band quantization noise generated by pulse coding provides only limited clean bandwidth (CBW), significantly increasing the difficulty of analog filter design. To address the constrained CBW of RF pulse sequences in ADTx, this paper proposes an optimization strategy for suppressing noise across a broader frequency domain. Distinguished from traditional schemes with limited noise suppression range, the expansion of CBW is innovatively achieved by setting multiple groups of frequency observation points near the carrier frequency, enabling more comprehensive constraints of in-band noise. Meanwhile, aiming at the problems of large look-up table scale and slow query speed, a partitioned look-up strategy is proposed. During a look-up, traversal is confined only to the partition containing the input point, eliminating the need to scan all elements. This strategy substantially reduces the number of error calculations and comparisons, significantly improving the real-time performance of mapping look-up and lowering the computational demands on digital processing devices. Through the collaborative optimization of noise suppression and query efficiency, this study highlights its breakthrough contributions and provides technical support for the optimization of RF pulse sequences in ADTx. Full article
Show Figures

Figure 1

24 pages, 1642 KB  
Article
ProbeSpec: Robust Model Fingerprinting via Dynamic Perturbation Response Spectrum
by Shanshan Lou, Hanzhe Yu and Qi Xuan
Electronics 2026, 15(4), 729; https://doi.org/10.3390/electronics15040729 - 9 Feb 2026
Viewed by 251
Abstract
Deep neural networks (DNNs) represent critical intellectual property that model owners urgently need to protect. With the increasing value of models, malicious attackers increasingly attempt to extract model functionality through techniques such as fine-tuning, distillation, and pruning. Model fingerprinting has emerged as a [...] Read more.
Deep neural networks (DNNs) represent critical intellectual property that model owners urgently need to protect. With the increasing value of models, malicious attackers increasingly attempt to extract model functionality through techniques such as fine-tuning, distillation, and pruning. Model fingerprinting has emerged as a mainstream protection strategy. However, existing fingerprinting methods either exhibit vulnerability to model modifications due to reliance on decision boundary features or require prohibitively large query budgets for accurate verification. This paper proposes ProbeSpec, which captures model fingerprints through dynamic behavioral analysis rather than static output matching. We discover that a model’s response patterns under multi-level perturbations form a unique “behavioral spectrum”, originating from implicit decision mechanisms learned during training and preserved even after various attacks. ProbeSpec employs three complementary probe types to elicit this characteristic and leverages DCT frequency-domain transformation for efficient fingerprint extraction. Extensive experiments show that ProbeSpec achieves 100% detection rate in the majority of attack scenarios, with an overall accuracy exceeding 95% across all tested architectures. Meanwhile, it effectively distinguishes independently trained models and requires only 80 probe samples for fingerprint extraction. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

20 pages, 682 KB  
Article
Semantic Search for System Dynamics Models Using Vector Embeddings in a Cloud Microservices Environment
by Pavel Kyurkchiev, Anton Iliev and Nikolay Kyurkchiev
Future Internet 2026, 18(2), 86; https://doi.org/10.3390/fi18020086 - 5 Feb 2026
Viewed by 344
Abstract
Efficient retrieval of mathematical and structural similarities in System Dynamics models remains a significant challenge for traditional lexical systems, which often fail to capture the contextual dependencies of simulation processes. This paper presents an architectural approach and implementation of a semantic search module [...] Read more.
Efficient retrieval of mathematical and structural similarities in System Dynamics models remains a significant challenge for traditional lexical systems, which often fail to capture the contextual dependencies of simulation processes. This paper presents an architectural approach and implementation of a semantic search module integrated into an existing cloud-based modeling and simulation system. The proposed method employs a strategy for serializing graph structures into textual descriptions, followed by the generation of vector embeddings via local ONNX inference and indexing within a vector database (Qdrant). Experimental validation performed on a diverse corpus of complex dynamic models, compares the proposed approach against traditional information retrieval methods (Full-Text Search, Keyword Search in PostgreSQL, and Apache Lucene with Standard and BM25 scoring). The results demonstrate the distinct advantage of semantic search, achieving high precision (over 90%) within the scope of the evaluated corpus and effectively eliminating information noise. In comparison, keyword search exhibited only 24.8% precision with a significant rate of false positives, while standard full-text analysis failed to identify relevant models for complex conceptual queries (0 results). Despite a recorded increase in latency (~2 s), the study proves that the vector-based approach is a significantly more robust solution for detecting hidden semantic connections in mathematical model databases, providing a foundation for future developments toward multi-vector indexing strategies. Full article
(This article belongs to the Special Issue Intelligent Agents and Their Application)
Show Figures

Graphical abstract

27 pages, 20135 KB  
Article
Seeing Like Argus: Multi-Perspective Global–Local Context Learning for Remote Sensing Semantic Segmentation
by Hongbing Chen, Yizhe Feng, Kun Wang, Mingrui Liao, Haoting Zhai, Tian Xia, Yubo Zhang, Jianhua Jiao and Changji Wen
Remote Sens. 2026, 18(3), 521; https://doi.org/10.3390/rs18030521 - 5 Feb 2026
Viewed by 459
Abstract
Accurate semantic segmentation of high-resolution remote sensing imagery is crucial for applications such as land cover mapping, urban development monitoring, and disaster response. However, remote sensing data still present inherent challenges, including complex spatial structures, significant intra-class variability, and diverse object scales, which [...] Read more.
Accurate semantic segmentation of high-resolution remote sensing imagery is crucial for applications such as land cover mapping, urban development monitoring, and disaster response. However, remote sensing data still present inherent challenges, including complex spatial structures, significant intra-class variability, and diverse object scales, which demand models capable of capturing rich contextual information from both local and global regions. To address these issues, we propose ArgusNet, a novel segmentation framework that enhances multi-scale representations through a series of carefully designed fusion mechanisms. At the core of ArgusNet lies the synergistic integration of Adaptive Windowed Additive Attention (AWAA) and 2D Selective Scan (SS2D). Specifically, our AWAA extends additive attention into a window-based structure with a dynamic routing mechanism, enabling multi-perspective local feature interaction via multiple global query vectors. Furthermore, we introduce a decoder optimization strategy incorporating three-stage feature fusion and a Macro Guidance Module (MGM) to improve spatial detail preservation and semantic consistency. Experiments on benchmark remote sensing datasets demonstrate that ArgusNet achieves competitive and improved segmentation performance compared to state-of-the-art methods, particularly in scenarios requiring fine-grained object delineation and robust multi-scale contextual understanding. Full article
Show Figures

Figure 1

32 pages, 3731 KB  
Article
A Comparative Study of RQA-Guided Attention Mechanisms with LSTM Autoencoder for Bearing Anomaly Detection
by Ayşenur Hatipoğlu and Ersen Yılmaz
Sensors 2026, 26(3), 1015; https://doi.org/10.3390/s26031015 - 4 Feb 2026
Viewed by 311
Abstract
Accurate anomaly detection in rotating machinery under noisy conditions remains challenging in Prognostics and Health Management (PHM). Existing deep learning autoencoders and attention mechanisms rely primarily on data-driven similarity measures and fail to explicitly incorporate nonlinear dynamical characteristics of degradation. In this study, [...] Read more.
Accurate anomaly detection in rotating machinery under noisy conditions remains challenging in Prognostics and Health Management (PHM). Existing deep learning autoencoders and attention mechanisms rely primarily on data-driven similarity measures and fail to explicitly incorporate nonlinear dynamical characteristics of degradation. In this study, we propose a Recurrence Quantification Analysis-Aware Attention (RQAA) framework that systematically injects chaos-theoretic descriptors into the attention mechanism of LSTM-based autoencoders for unsupervised anomaly detection. Specifically, RQA metrics including recurrence rate, determinism, laminarity, entropy, and trapping time are computed at the window level and embedded into the query-key-value attention scoring to guide the model toward dynamically informative temporal patterns. Three attention variants are developed to investigate different fusion strategies between learned representations and RQA-driven structural cues. The proposed framework is evaluated on three widely used bearing vibration datasets, which are IMS, CWRU, and HUST. Experimental results demonstrate that RQAA consistently outperforms conventional LSTM autoencoders and classical attention-based models, achieving up to 99.85% F1-score and 99.00% AUC while exhibiting superior robustness in low signal-to-noise scenarios. Further analysis reveals that explicit dynamical guidance enhances anomaly separability and reduces false alarms, particularly in early-stage fault detection. These findings indicate that integrating nonlinear dynamical information directly into attention scoring offers a principled and effective pathway for advancing unsupervised anomaly detection in rotating machinery and safety-critical industrial systems. Full article
(This article belongs to the Special Issue Sensor-Based Fault Diagnosis and Prognosis)
Show Figures

Figure 1

22 pages, 10079 KB  
Article
FS2-DETR: Transformer-Based Few-Shot Sonar Object Detection with Enhanced Feature Perception
by Shibo Yang, Xiaoyu Zhang and Panlong Tan
J. Mar. Sci. Eng. 2026, 14(3), 304; https://doi.org/10.3390/jmse14030304 - 4 Feb 2026
Viewed by 258
Abstract
In practical underwater object detection tasks, imbalanced sample distribution and the scarcity of samples for certain classes often lead to insufficient model training and limited generalization capability. To address these challenges, this paper proposes FS2-DETR (Few-Shot Detection Transformer for Sonar Images), a transformer-based [...] Read more.
In practical underwater object detection tasks, imbalanced sample distribution and the scarcity of samples for certain classes often lead to insufficient model training and limited generalization capability. To address these challenges, this paper proposes FS2-DETR (Few-Shot Detection Transformer for Sonar Images), a transformer-based few-shot object detection network tailored for sonar imagery. Considering that sonar images generally contain weak, small, and blurred object features, and that data scarcity in some classes can hinder effective feature learning, the proposed FS2-DETR introduces the following improvements over the baseline DETR model. (1) Feature Enhancement Compensation Mechanism: A decoder-prediction-guided feature resampling module (DPGFRM) is designed to process the multi-scale features and subsequently enhance the memory representations, thereby strengthening the exploitation of key features and improving detection performance for weak and small objects. (2) Visual Prompt Enhancement Mechanism: Discriminative visual prompts are generated to jointly enhance object queries and memory, thereby highlighting distinctive image features and enabling more effective feature capture for few-shot objects. (3) Multi-Stage Training Strategy: Adopting a progressive training strategy to strengthen the learning of class-specific layers, effectively mitigating misclassification in few-shot scenarios and enhancing overall detection accuracy. Extensive experiments conducted on the improved UATD sonar image dataset demonstrate that the proposed FS2-DETR achieves superior detection accuracy and robustness under few-shot conditions, outperforming existing state-of-the-art detection algorithms. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

24 pages, 5060 KB  
Article
Eagle-YOLO: Enhancing Real-Time Small Object Detection in UAVs via Multi-Granularity Feature Aggregation
by Yan Du, Zifeng Dai, Teng Wu, Quan Zhu, Changzhen Hu and Shengjun Wei
Drones 2026, 10(2), 112; https://doi.org/10.3390/drones10020112 - 3 Feb 2026
Viewed by 300
Abstract
Real-time object detection in Unmanned Aerial Vehicle (UAV) imagery presents unique challenges, primarily characterized by extreme scale variations and intense background clutter. Existing detectors often suffer from spectral homogenization in which the critical high-frequency details of minute targets are washed out by dominant [...] Read more.
Real-time object detection in Unmanned Aerial Vehicle (UAV) imagery presents unique challenges, primarily characterized by extreme scale variations and intense background clutter. Existing detectors often suffer from spectral homogenization in which the critical high-frequency details of minute targets are washed out by dominant background signals during feature downsampling. To address this, we propose Eagle-YOLO, a dynamic feature aggregation framework designed to master these complexities without compromising inference speed. We introduce three core innovations: (1) the Hierarchical Granularity Block (HG-Block), which employs a residual granularity injection pathway to function as a detail anchor for tiny objects while simultaneously accumulating semantics for large structures; (2) the Cross-Stage Context Modulation (CSCM) mechanism, which leverages a global context query to filter background redundancy and recalibrate features across network stages; and (3) the Scale-Adaptive Heterogeneous Convolution (SAHC) strategy, which dynamically aligns receptive fields with the inherent scale distribution of aerial data. Extensive experiments on the DUT Anti-UAV dataset demonstrate that Eagle-YOLO achieves a remarkable balance between accuracy and latency. Specifically, our lightweight Eagle-YOLO-T variant achieves 74.62% AP, surpassing the robust baseline RTMDet-T by 1.67% while maintaining a real-time inference speed of 141 FPS on an NVIDIA RTX 4090 GPU. Furthermore, on the challenging Anti-UAV dataset, our Eagle-YOLOv8-M variant reaches an impressive 94.38% AP50val, outperforming the standard YOLOv8-M by 2.83% and proving its efficacy for edge-deployed aerial surveillance applications. Full article
Show Figures

Figure 1

26 pages, 6390 KB  
Article
Image Captioning Using Enhanced Cross-Modal Attention with Multi-Scale Aggregation for Social Hotspot and Public Opinion Monitoring
by Shan Jiang, Yingzhao Chen, Rilige Chaomu and Zheng Liu
Inventions 2026, 11(1), 13; https://doi.org/10.3390/inventions11010013 - 2 Feb 2026
Viewed by 276
Abstract
Large volumes of images shared on social media have made image captioning an important tool for social hotspot identification and public opinion monitoring, where accurate visual–language alignment is essential for reliable analysis. However, existing image captioning models based on BLIP-2 (Bootstrapped Language–Image Pre-training) [...] Read more.
Large volumes of images shared on social media have made image captioning an important tool for social hotspot identification and public opinion monitoring, where accurate visual–language alignment is essential for reliable analysis. However, existing image captioning models based on BLIP-2 (Bootstrapped Language–Image Pre-training) often struggle with complex, context-rich, and socially meaningful images in real-world social media scenarios, mainly due to insufficient cross-modal interaction, redundant visual token representations, and an inadequate ability to capture multi-scale semantic cues. As a result, the generated captions tend to be incomplete or less informative. To address these limitations, this paper proposes ECMA (Enhanced Cross-Modal Attention), a lightweight module integrated into the Querying Transformer (Q-Former) of BLIP-2. ECMA enhances cross-modal interaction through bidirectional attention between visual features and query tokens, enabling more effective information exchange, while a multi-scale visual aggregation strategy is introduced to model semantic representations at different levels of abstraction. In addition, a semantic residual gating mechanism is designed to suppress redundant information while preserving task-relevant features. ECMA can be seamlessly incorporated into BLIP-2 without modifying the original architecture or fine-tuning the vision encoder or the large language model, and is fully compatible with OPT (Open Pre-trained Transformer)-based variants. Experimental results on the COCO (Common Objects in Context) benchmark demonstrate consistent performance improvements, where ECMA improves the CIDEr (Consensus-based Image Description Evaluation) score from 144.6 to 146.8 and the BLEU-4 score from 42.5 to 43.9 on the OPT-6.7B model, corresponding to relative gains of 1.52% and 3.29%, respectively, while also achieving competitive METEOR (Metric for Evaluation of Translation with Explicit Ordering) scores. Further evaluations on social media datasets show that ECMA generates more coherent, context-aware, and socially informative captions, particularly for images involving complex interactions and socially meaningful scenes. Full article
Show Figures

Figure 1

29 pages, 477 KB  
Article
Sem4EDA: A Knowledge-Graph and Rule-Based Framework for Automated Fault Detection and Energy Optimization in EDA-IoT Systems
by Antonios Pliatsios and Michael Dossis
Computers 2026, 15(2), 103; https://doi.org/10.3390/computers15020103 - 2 Feb 2026
Viewed by 258
Abstract
This paper presents Sem4EDA, an ontology-driven and rule-based framework for automated fault diagnosis and energy-aware optimization in Electronic Design Automation (EDA) and Internet of Things (IoT) environments. The escalating complexity of modern hardware systems, particularly within IoT and embedded domains, presents formidable challenges [...] Read more.
This paper presents Sem4EDA, an ontology-driven and rule-based framework for automated fault diagnosis and energy-aware optimization in Electronic Design Automation (EDA) and Internet of Things (IoT) environments. The escalating complexity of modern hardware systems, particularly within IoT and embedded domains, presents formidable challenges for traditional EDA methodologies. While EDA tools excel at design and simulation, they often operate as siloed applications, lacking the semantic context necessary for intelligent fault diagnosis and system-level optimization. Sem4EDA addresses this gap by providing a comprehensive ontological framework developed in OWL 2, creating a unified, machine-interpretable model of hardware components, EDA design processes, fault modalities, and IoT operational contexts. We present a rule-based reasoning system implemented through SPARQL queries, which operates atop this knowledge base to automate the detection of complex faults such as timing violations, power inefficiencies, and thermal issues. A detailed case study, conducted via a large-scale trace-driven co-simulation of a smart city environment, demonstrates the framework’s practical efficacy: by analyzing simulated temperature sensor telemetry and Field-Programmable Gate Array (FPGA) configurations, Sem4EDA identified specific energy inefficiencies and overheating risks, leading to actionable optimization strategies that resulted in a 23.7% reduction in power consumption and 15.6% decrease in operating temperature for the modeled sensor cluster. This work establishes a foundational step towards more autonomous, resilient, and semantically-aware hardware design and management systems. Full article
(This article belongs to the Special Issue Advances in Semantic Multimedia and Personalized Digital Content)
Show Figures

Figure 1

24 pages, 469 KB  
Article
Cross-Lingual Adaptation for Multilingual Table Question Answering and Comparative Evaluation with Large Language Models
by Sanghyun Cho, Minho Kim, Hye-Lynn Kim, Jung-Hun Lee, Hyuk-Chul Kwon and Soo-Jong Lim
Computers 2026, 15(2), 92; https://doi.org/10.3390/computers15020092 - 1 Feb 2026
Viewed by 297
Abstract
Table question answering has been studied using datasets drawn from a variety of tabular sources and task formats. However, most publicly available resources have been created in high-resource languages such as English. For low-resource languages, researchers are often required to construct new datasets [...] Read more.
Table question answering has been studied using datasets drawn from a variety of tabular sources and task formats. However, most publicly available resources have been created in high-resource languages such as English. For low-resource languages, researchers are often required to construct new datasets or translate existing ones, which incurs substantial time, effort, and financial cost. In contrast to natural language text, table data consists of structured entries whose interpretation is less affected by language-specific syntax or word order. In this work, we present a cost-effective strategy for multilingual table QA that relies on selectively translating only the questions of existing datasets. Leveraging the language-agnostic structure of tables, our approach maintains the original table content while translating queries into multiple target languages. To address possible performance drops caused by using table data in the source language rather than the target language, we apply cross-lingual adaptation techniques using contrastive learning and adversarial training. In addition, to strengthen reasoning ability while avoiding degradation in languages not seen during pre-training, we perform supplementary pre-training of a RoBERTa-based multilingual encoder with SQL-derived table data. Finally, we extend our investigation beyond encoder-based architectures and evaluate decoder-only large language models under the same multilingual table QA setting. The experiments show that LLaMA-3 models exhibit strong cross-lingual generalization even without using translated table context and often achieve competitive performance using only Korean table data. Moreover, the performance gap among training configurations such as translated queries or translated datasets is notably smaller compared to encoder-based models, highlighting the inherent multilingual robustness of modern LLMs. We further evaluate LLaMA-3 models on domain-specific table datasets and observe that domain knowledge acquired from Korean tables transfers effectively across languages even without multilingual supervision, underscoring the potential of LLMs for specialized multilingual table reasoning. These findings demonstrate that LLMs can serve as an effective alternative for multilingual table QA, particularly in low-resource or partially translated environments. Full article
(This article belongs to the Special Issue Advances in Semantic Multimedia and Personalized Digital Content)
Show Figures

Figure 1

Back to TopTop