Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (108)

Search Parameters:
Keywords = semantic-guided proposal sampling

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
32 pages, 7135 KB  
Article
Evolutionary Multi-Objective Prompt Learning for Synthetic Text Data Generation with Black-Box Large Language Models
by Diego Pastrián, Nicolás Hidalgo, Víctor Reyes and Erika Rosas
Appl. Sci. 2026, 16(8), 3623; https://doi.org/10.3390/app16083623 - 8 Apr 2026
Abstract
High-quality training data are essential for the performance and generalization of artificial intelligence systems, particularly in dynamic environments such as adaptive stream processing for disaster response. However, constructing large and representative datasets remains costly and time-consuming, especially in domains where real data are [...] Read more.
High-quality training data are essential for the performance and generalization of artificial intelligence systems, particularly in dynamic environments such as adaptive stream processing for disaster response. However, constructing large and representative datasets remains costly and time-consuming, especially in domains where real data are scarce or difficult to obtain. Large Language Models (LLMs) provide powerful capabilities for synthetic text generation, yet the quality of generated data strongly depends on the design of input prompts. Prompt engineering is therefore critical, but it remains largely manual and difficult to scale, particularly in black-box settings where model internals are inaccessible. This work introduces EVOLMD-MO, a multi-objective evolutionary framework for automated prompt learning aimed at generating high-quality synthetic text datasets using black-box LLMs. The proposed approach formulates prompt optimization as a multi-objective search problem in which candidate prompts evolve through genetic operators guided by two complementary objectives: semantic fidelity to reference data and generative diversity of the produced samples. To support scalable optimization, the framework integrates a modular multi-agent architecture that decouples prompt evolution, LLM interaction, and evaluation mechanisms. The evolutionary process is implemented using the NSGA-II algorithm, enabling the discovery of diverse Pareto-optimal prompts that balance semantic preservation and diversity. Experimental evaluation using large-scale disaster-related social media data demonstrates that the proposed approach consistently improves prompt quality across generations while maintaining a stable trade-off between fidelity and diversity. Compared with a single-objective baseline, EVOLMD-MO explores a significantly broader semantic search space and produces more diverse yet semantically coherent synthetic datasets. These results indicate that multi-objective evolutionary prompt learning constitutes a promising strategy for black-box LLM-driven data generation, with potential applicability to adaptive data analytics and real-time decision-support systems in highly dynamic environments, pending broader validation across domains and models. Full article
(This article belongs to the Special Issue Resource Management for AI-Centric Computing Systems)
Show Figures

Figure 1

27 pages, 26065 KB  
Article
AEFOP: Adversarial Energy Field Optimization for Adversarial Example Purification
by Heqi Peng, Shengpeng Xiao and Yuanfang Guo
Appl. Sci. 2026, 16(7), 3588; https://doi.org/10.3390/app16073588 - 7 Apr 2026
Abstract
As AI-driven educational systems increasingly rely on deep neural networks, their vulnerability to adversarial perturbations raises concerns about assessment integrity, fairness, and reliability. Adversarial example purification is attractive for such deployments because it removes input perturbations without modifying the already deployed models. However, [...] Read more.
As AI-driven educational systems increasingly rely on deep neural networks, their vulnerability to adversarial perturbations raises concerns about assessment integrity, fairness, and reliability. Adversarial example purification is attractive for such deployments because it removes input perturbations without modifying the already deployed models. However, most existing purification methods are inherently goal-free: denoising-based approaches apply blind heuristic operators, while reconstruction-based methods rely on stochastic sampling guided by natural image priors. These methods typically suppress perturbations at the cost of weakening semantic details or inducing structural distortions. To address this limitation, we propose a novel goal-directed purification framework, termed adversarial energy field optimization for adversarial example purification (AEFOP). AEFOP formulates purification as a constrained optimization problem by defining a learnable adversarial energy which quantifies how far an input deviates from the benign region. This allows adversarial examples to be explicitly pushed from high-energy regions toward low-energy benign regions along an interpretable descent trajectory. Specifically, we build an adversarial energy network and optimize the energy field via a two-stage strategy: adversarial energy field shaping, which enforces distance-like energy behavior and correct gradient directions, and task-driven energy field calibration, which unrolls the descent process to calibrate the field with classification-consistency and semantic-preservation objectives. Extensive experiments across multiple attack scenarios demonstrate that AEFOP achieves superior purification accuracy and high visual quality while requiring only a few gradient steps during inference, offering a practical and efficient robustness layer for vision-based AI services in education. Full article
Show Figures

Figure 1

21 pages, 13964 KB  
Article
Towards Generalizable Deepfake Detection via Facial Landmark-Guided Convolution and Local Structure Awareness
by Hao Chen, Zhengxu Zhang, Qin Li and Chunhui Feng
Algorithms 2026, 19(4), 270; https://doi.org/10.3390/a19040270 - 1 Apr 2026
Viewed by 261
Abstract
As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized [...] Read more.
As deepfakes become increasingly realistic, there is a growing need for robust and highly accurate facial forgery detection algorithms. Existing studies show that global feature modeling approaches (Transformer, VMamba) are effective in capturing long-range dependencies, yet they often lack sufficient sensitivity to localized facial tampering artifacts. Meanwhile, traditional convolutional methods excel at extracting local image features but struggle to incorporate prior knowledge about facial anatomy, resulting in limited representational capability. To address these limitations, this paper proposes LGMamba, a novel detection framework that integrates facial guidance focusing on key facial components and fine-grained detail regions commonly manipulated in deepfakes with global modeling. First, we introduce an innovative Landmark-Guided Convolution (LGConv), which adaptively adjusts convolutional sampling positions using facial landmark information. This allows the model to attend to forgery-prone facial regions, such as the eyes and mouth. Second, we design a parallel Facial Structure Awareness Block (FSAB) to operate alongside the VMamba-based visual State-Space Model. Equipped with a multi-stage residual design and a CBAM attention mechanism, FSAB enhances the model’s sensitivity to subtle facial artifacts, enabling joint exploitation of global semantic consistency and fine-grained forgery cues within a unified architecture. The proposed LGMamba achieves superior performance compared to existing mainstream approaches. In cross-dataset evaluations, it attains AUC scores of 92.34% on CD1 and 96.01% on CD2, outperforming all compared methods. Full article
Show Figures

Figure 1

29 pages, 6909 KB  
Article
MDE-UNet: A Physically Guided Asymmetric Fusion Network for Multi-Source Meteorological Data Lightning Identification
by Yihua Chen, Yuanpeng Han, Yujian Zhang, Yi Liu, Lin Song, Jialei Wang, Xinjue Wang and Qilin Zhang
Remote Sens. 2026, 18(7), 1027; https://doi.org/10.3390/rs18071027 - 29 Mar 2026
Viewed by 205
Abstract
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and [...] Read more.
Utilizing multi-source meteorological data for lightning identification is crucial for monitoring severe convective weather. However, several key challenges persist in this field: dimensional imbalance and modal competition among multi-source heterogeneous data, model training bias caused by the extreme sparsity of lightning samples, and an imbalance between false alarms and missed detections resulting from complex background noise. To address these challenges, this paper proposes a lightning identification network guided by physical priors and constrained by supervision. First, to tackle the issue of modal competition in fusing satellite (high-dimensional) and radar (low-dimensional) data, a physical prior-guided asymmetric radar information enhancement mechanism is introduced. This mechanism uses radar physical features as contextual guidance to selectively enhance the latent weak radar signatures. Second, at the architectural level, a multi-source multi-scale feature fusion module and a weighted sliding window–multilayer perceptron (MLP) enhanced decoding unit are constructed. The former achieves the coupling of multi-scale physical features at a 2 km grid scale through cross-level semantic alignment, building a highly consistent feature field that effectively improves the model’s ability to detect lightning signals. The latter leverages adaptive receptive fields and the nonlinear modeling capability of MLPs to effectively smooth spatially discrete noise, ensuring spatial continuity in the reconstructed results. Finally, to address the model bias caused by severe class imbalance between positive and negative samples—resulting from the extreme sparsity of lightning events—an asymmetrically weighted BCE-DICE loss function is designed. Its “asymmetric” characteristic is implemented by assigning different penalty weights to false-positive and false-negative predictions. This loss function balances pixel-level accuracy and inter-class equilibrium while imposing high-weight penalties on false-positive predictions, achieving synergistic optimization of feature enhancement and directional suppression. Experimental results show that the proposed method effectively increases the hit rate while substantially reducing the false alarm rate, enabling efficient utilization of multi-source data and high-precision identification of lightning strike areas. Full article
Show Figures

Figure 1

37 pages, 6776 KB  
Article
Semantic Mapping and Cross-Model Data Integration in BIM: A Lightweight and Scalable Schedule-Level Workflow
by Tianjiao Zhao and Ri Na
Buildings 2026, 16(7), 1347; https://doi.org/10.3390/buildings16071347 - 28 Mar 2026
Viewed by 277
Abstract
Despite the widespread adoption of BIM, information exchange across disciplines remains hindered by heterogeneous structures at the tabular data level, particularly when integrating data across multiple discipline-specific models. Manual mapping, rigid templates, or one-off programming scripts are labor-intensive and difficult to scale, limiting [...] Read more.
Despite the widespread adoption of BIM, information exchange across disciplines remains hindered by heterogeneous structures at the tabular data level, particularly when integrating data across multiple discipline-specific models. Manual mapping, rigid templates, or one-off programming scripts are labor-intensive and difficult to scale, limiting automated querying, cross-model aggregation, and schedule-level analytics. This study proposes a lightweight, workflow-driven approach for semantic normalization and cross-model integration of BIM schedule data, with optional script-supported workflow configuration used only to assist the configuration of deterministic, rule-guided mapping logic, rather than serving as a core analytical method. By introducing a customizable subcategory layer, the workflow enables fine-grained semantic alignment and efficient normalization across diverse schedule datasets, implemented through lightweight Python scripting and rule-guided semantic matching used solely as a supporting mechanism for deterministic field mapping. Using structural, architectural, and HVAC models, we demonstrate a stepwise process including data cleaning, hierarchical classification, consistency checking, batch analytics, and automated computation of cross-model metrics such as opening-to-wall ratios. Sample-based validation confirms the workflow’s reliability, achieving semantic mapping agreement rates above 95% and reducing manual processing time by more than 85%. The workflow is readily extensible to other disciplines and modeling conventions, supporting high-throughput data integration for tasks such as design coordination, semantic alignment, RFI reduction, accelerated design reviews, and data-driven decision making. Overall, rather than introducing a new algorithm, the contribution of this work lies in formalizing a reusable, schedule-level workflow abstraction that enables consistent semantic alignment and automated cross-model aggregation without relying on rigid ontologies or training-intensive learning-based models. Any optional tooling used during workflow configuration is auxiliary and does not constitute a standalone learning-based method requiring model training or performance benchmarking. This provides a reusable methodological foundation for scalable, schedule-level BIM data integration and cross-model analytics. Full article
Show Figures

Figure 1

22 pages, 26802 KB  
Article
Attention-Guided Semantic Segmentation and Scan-to-Model Geometric Reconstruction of Underground Tunnels from Mobile Laser Scanning
by Yingjia Huang, Jiang Ye, Xiaohui Li and Jingliang Du
Appl. Sci. 2026, 16(6), 3042; https://doi.org/10.3390/app16063042 - 21 Mar 2026
Viewed by 255
Abstract
Mobile Laser Scanning (MLS) integrated with Simultaneous Localization and Mapping (SLAM) has emerged as a key technology for digitizing GNSS-denied environments, such as underground mines. However, the automated interpretation of unstructured, high-density point clouds into semantic engineering models remains challenging due to extreme [...] Read more.
Mobile Laser Scanning (MLS) integrated with Simultaneous Localization and Mapping (SLAM) has emerged as a key technology for digitizing GNSS-denied environments, such as underground mines. However, the automated interpretation of unstructured, high-density point clouds into semantic engineering models remains challenging due to extreme geometric anisotropy in point distributions and severe class imbalance inherent to narrow tunnel environments. To address these issues, this study proposes a highly automated scan-to-model framework for precise semantic segmentation and vectorized two-dimensional (2D) profile reconstruction. First, an enhanced hierarchical deep learning network tailored for point clouds is introduced. The architecture incorporates a context-aware sampling strategy with an expanded receptive field of up to 10 m to preserve axial continuity, coupled with a spatial–geometric dual-attention mechanism to refine boundary delineation. In addition, a composite Focal–Dice loss function is employed to alleviate the dominance of wall points during network training. Experimental validation on a field-collected dataset comprising 16 mine tunnels demonstrates that the proposed model achieves a mean Intersection over Union (mIoU) of 85.15% (±0.29%) and an Overall Accuracy (OA) of 95.13% (±0.13%). Building on this semantic foundation, a robust geometric modeling pipeline is established using curvature-guided filtering and density-adaptive B-spline fitting. The reconstructed profiles accurately recover the geometric mean surface of the tunnel wall, yielding an overall filtered Root Mean Square Error (RMSE) of 4.96 ± 0.48 cm. The proposed framework provides an efficient end-to-end solution for deformation analysis and digital twinning of underground mining infrastructure. Full article
(This article belongs to the Special Issue Artificial Intelligence Applications in Underground Space Technology)
Show Figures

Figure 1

25 pages, 29036 KB  
Article
Task-Oriented Unsupervised SAR Image Enhancement with Semantic Preservation for Robust Target Recognition
by Chengyu Wan, Siqian Zhang, Lingjun Zhao, Tao Tang and Gangyao Kuang
Remote Sens. 2026, 18(6), 930; https://doi.org/10.3390/rs18060930 - 19 Mar 2026
Viewed by 229
Abstract
Synthetic aperture radar (SAR) images often suffer from coupled degradations such as speckle noise, background clutter, and system disturbances, which distort target structure and reduce feature discriminability for target recognition. Most existing enhancement methods typically optimize perceptual quality and may produce visually appealing [...] Read more.
Synthetic aperture radar (SAR) images often suffer from coupled degradations such as speckle noise, background clutter, and system disturbances, which distort target structure and reduce feature discriminability for target recognition. Most existing enhancement methods typically optimize perceptual quality and may produce visually appealing yet recognition-inconsistent results, especially when paired supervision is unavailable. To address this, an unsupervised SAR image quality enhancement framework is proposed in this study, formulating the degradation as a domain shift problem between low- and high-quality SAR data. A DualGAN-based architecture is adopted to learn bidirectional mappings with reconstruction regularization, enabling enhancement without paired samples. To explicitly preserve task-relevant features and enforce structural consistency, a segmentation-guided recognition-oriented constraint is introduced to embed task awareness into the enhancement process. Furthermore, to mitigate semantic drift during unpaired translation, a semantic preservation constraint based on contrastive learning is proposed to align the enhanced, original, and smoothed images, which can maintain semantic fidelity and reinforce structural cues. Experimental results demonstrate that the proposed framework effectively bridges the domain gap between low- and high-quality SAR images, producing semantically consistent enhancement and improving robustness in target recognition. Evaluations on the GMVT dataset show that the proposed method achieves an average recognition accuracy improvement of over 10% across six recognition networks and four imaging conditions. Full article
(This article belongs to the Special Issue SAR Images Processing and Analysis (3rd Edition))
Show Figures

Figure 1

26 pages, 4173 KB  
Article
Physics-Guided Variational Causal Intervention Network for Few-Shot Radar Jamming Recognition
by Dong Xia, Liming Lv, Youjian Zhang, Yanxi Lu, Fang Li, Lin Liu, Xiang Liu, Yajun Zeng and Zhan Ge
Sensors 2026, 26(6), 1900; https://doi.org/10.3390/s26061900 - 18 Mar 2026
Viewed by 204
Abstract
Rapid and accurate recognition of radar active jamming is a prerequisite for cognitive electronic countermeasures. However, under complex electromagnetic environments with scarce training samples, existing deep learning models are prone to capturing spurious correlations induced by environmental confounders, resulting in notable performance degradation. [...] Read more.
Rapid and accurate recognition of radar active jamming is a prerequisite for cognitive electronic countermeasures. However, under complex electromagnetic environments with scarce training samples, existing deep learning models are prone to capturing spurious correlations induced by environmental confounders, resulting in notable performance degradation. To address this causal confounding issue, we propose a physics-guided variational causal intervention network (PG-VCIN). First, we reconstruct a structured causal model of jamming signal generation, decoupling observations into robust physical statistical features and sensitive time–frequency image representations. Physical priors are then leveraged to perform dynamic precision-weighted modulation of visual feature extraction, enforcing physical consistency at the representation learning stage. Second, we formulate deconfounding within an active inference framework and introduce a variational information bottleneck to optimize mutual information, thereby filtering out high-complexity redundant information attributable to confounders while preserving the essential causal semantics. Finally, we numerically approximate the causal effect by imposing dual intervention constraints in the latent space, including intra-class invariance and confounder invariance. Experiments on a semi-physical simulation dataset demonstrate that the proposed method achieves substantially higher recognition accuracy than several representative few-shot baselines in extremely low-sample regimes, validating the effectiveness of integrating physical mechanisms with causal inference. Full article
(This article belongs to the Section Radar Sensors)
Show Figures

Figure 1

43 pages, 2166 KB  
Article
Research on Root Cause Analysis Method for Certain Civil Aircraft Based on Ensemble Learning and Large Language Model Reasoning
by Wenyou Du, Jingtao Du, Haoran Zhang and Dongsheng Yang
Machines 2026, 14(3), 322; https://doi.org/10.3390/machines14030322 - 12 Mar 2026
Viewed by 418
Abstract
To address the challenges commonly encountered in civil aircraft operating under multi-mode, strongly coupled closed-loop control—namely scarce fault samples, pronounced distribution shift, and root-cause explanations that are easily confounded by covariates—this paper proposes a root-cause analysis method that integrates ensemble learning with constraint-guided [...] Read more.
To address the challenges commonly encountered in civil aircraft operating under multi-mode, strongly coupled closed-loop control—namely scarce fault samples, pronounced distribution shift, and root-cause explanations that are easily confounded by covariates—this paper proposes a root-cause analysis method that integrates ensemble learning with constraint-guided reasoning by large language models (LLMs). First, for Full Authority Digital Engine Control (FADEC) monitoring sequences, a feature system comprising environment-normalized ratios, mechanism-informed mixing indices, and multi-scale temporal statistics is constructed, thereby improving cross-mode comparability and enhancing engineering-semantic expressiveness. Second, in the anomaly detection stage, a cost-sensitive LightGBM model is adopted and a validation-set-based adaptive thresholding strategy is introduced to achieve robust identification under highly imbalanced fault conditions. Furthermore, for Root Cause Analysis (RCA), a “computation–reasoning decoupling” framework is developed: Shapley Additive exPlanations (SHAP) are used to generate segment-level contribution evidence, while causal chains, engineering prohibitions, and structured output templates are injected into prompts to constrain the LLM, enabling it to infer root-cause candidates and produce structured explanations under mechanism-consistency constraints. Experiments on real flight data demonstrate that our method yields an anomaly detection F1-score of 0.9577 and improves overall RCA accuracy to 97.1% (versus 62.3% for a pure SHAP baseline). Practically, by translating complex high-dimensional data into actionable natural language diagnostic reports, the proposed method provides reliable and interpretable decision support for rapid RCA. Full article
(This article belongs to the Section Automation and Control Systems)
Show Figures

Figure 1

23 pages, 1299 KB  
Article
Target-Guided Asymmetric Path Modeling in Equipment Maintenance Knowledge Graphs
by Meng Chen and Yuming Bo
Symmetry 2026, 18(3), 439; https://doi.org/10.3390/sym18030439 - 3 Mar 2026
Viewed by 377
Abstract
Knowledge graph completion via link prediction is critical for intelligent equipment maintenance systems to support accurate fault diagnosis and maintenance decision making. However, existing approaches struggle to simultaneously capture local structural dependencies and perform effective multi-hop reasoning due to limited receptive fields or [...] Read more.
Knowledge graph completion via link prediction is critical for intelligent equipment maintenance systems to support accurate fault diagnosis and maintenance decision making. However, existing approaches struggle to simultaneously capture local structural dependencies and perform effective multi-hop reasoning due to limited receptive fields or inefficient path exploration mechanisms. Traditional path-based methods implicitly assume path symmetry, treating all reasoning chains equally without considering their task-specific relevance. To address this issue, we propose a Graph Attention Network (GAT)-guided semantic path reasoning framework that breaks this symmetry through attention-driven asymmetric weighting, integrating local structural encoding with global multi-hop inference. The key innovation lies in a target-guided biased path sampling strategy, which transforms GAT attention weights into probabilistic transition biases, enabling adaptive exploration of high-quality semantic paths relevant to specific prediction targets. GATs learn importance-aware local representations, which guide biased random walks to efficiently sample task-relevant reasoning paths. The sampled paths are encoded and aggregated to form global semantic context representations, which are then fused with local embeddings through a gating mechanism for final link prediction. Experimental evaluations on FB15k-237, WN18RR, and a real-world equipment maintenance knowledge graph demonstrate that the proposed method consistently outperforms state-of-the-art baselines, achieving an MRR of 0.614 on the maintenance dataset and 0.485 on WN18RR. Further analysis shows that the learned path attention weights provide interpretable asymmetric reasoning evidence, enhancing transparency for safety-critical maintenance applications. Full article
(This article belongs to the Special Issue Symmetry and Asymmetry Study in Graph Theory)
Show Figures

Figure 1

32 pages, 2876 KB  
Article
CCNETS: A Modular Causal Learning Framework for Pattern Recognition in Imbalanced Datasets
by Hanbeot Park, Yunjeong Cho and Hunhee Kim
Appl. Sci. 2026, 16(4), 1998; https://doi.org/10.3390/app16041998 - 17 Feb 2026
Viewed by 350
Abstract
Handling class imbalance remains a central challenge in machine learning, particularly in pattern recognition tasks where identifying rare but critical anomalies is of paramount importance. Traditional generative models often decouple data synthesis from classification, leading to a distribution mismatch that limits their practical [...] Read more.
Handling class imbalance remains a central challenge in machine learning, particularly in pattern recognition tasks where identifying rare but critical anomalies is of paramount importance. Traditional generative models often decouple data synthesis from classification, leading to a distribution mismatch that limits their practical benefit. To address these shortcomings, we introduce Causal Cooperative Networks (CCNETS), a modular framework that establishes a functional causal link between generation, inference, and reconstruction. CCNETS is composed of three specialized cooperative modules: an Explainer for latent feature abstraction, a Reasoner for probabilistic label prediction, and a Producer for context-aware data synthesis. These components interact through a dynamic causal feedback loop, where classification outcomes directly guide targeted sample synthesis to adaptively reinforce vulnerable decision boundaries. A key innovation, our proposed Zoint mechanism, enables the adaptive fusion of latent and observable features, enhancing semantic richness and decision-making robustness under uncertainty. We evaluated CCNETS on two distinct real-world datasets: Credit Card Fraud Detection dataset, characterized by extreme imbalance (fraud rate < 0.2%), and the AI4I 2020 Predictive Maintenance dataset (failure rate < 4%). Across comprehensive experimental setups, CCNETS consistently outperformed baseline methods, achieving superior F1-scores, and AUPRC. Furthermore, data synthesized by CCNETS demonstrated enhanced generalization and learning stability under limited data conditions. These results establish CCNETS as a scalable, interpretable, and hybrid soft computing framework that effectively aligns synthetic data with classifier objectives, advancing robust imbalanced learning. Full article
(This article belongs to the Special Issue Machine Learning and Its Application for Anomaly Detection)
Show Figures

Figure 1

20 pages, 2405 KB  
Article
Confidence-Guided Adaptive Diffusion Network for Medical Image Classification
by Yang Yan, Zhuo Xie and Wenbo Huang
J. Imaging 2026, 12(2), 80; https://doi.org/10.3390/jimaging12020080 - 14 Feb 2026
Viewed by 358
Abstract
Medical image classification is a fundamental task in medical image analysis and underpins a wide range of clinical applications, including dermatological screening, retinal disease assessment, and malignant tissue detection. In recent years, diffusion models have demonstrated promising potential for medical image classification owing [...] Read more.
Medical image classification is a fundamental task in medical image analysis and underpins a wide range of clinical applications, including dermatological screening, retinal disease assessment, and malignant tissue detection. In recent years, diffusion models have demonstrated promising potential for medical image classification owing to their strong representation learning capability. However, existing diffusion-based classification methods often rely on oversimplified prior modeling strategies, which fail to adequately capture the intrinsic multi-scale semantic information and contextual dependencies inherent in medical images. As a result, the discriminative power and stability of feature representations are constrained in complex scenarios. In addition, fixed noise injection strategies neglect variations in sample-level prediction confidence, leading to uniform perturbations being imposed on samples with different levels of semantic reliability during the diffusion process, which in turn limits the model’s discriminative performance and generalization ability. To address these challenges, this paper proposes a Confidence-Guided Adaptive Diffusion Network (CGAD-Net) for medical image classification. Specifically, a hybrid prior modeling framework is introduced, consisting of a Hierarchical Pyramid Context Modeling (HPCM) module and an Intra-Scale Dilated Convolution Refinement (IDCR) module. These two components jointly enable the diffusion-based feature modeling process to effectively capture fine-grained structural details and global contextual semantic information. Furthermore, a Confidence-Guided Adaptive Noise Injection (CG-ANI) strategy is designed to dynamically regulate noise intensity during the diffusion process according to sample-level prediction confidence. Without altering the underlying discriminative objective, CG-ANI stabilizes model training and enhances robust representation learning for semantically ambiguous samples.Experimental results on multiple public medical image classification benchmarks, including HAM10000, APTOS2019, and Chaoyang, demonstrate that CGAD-Net achieves competitive performance in terms of classification accuracy, robustness, and training stability. These results validate the effectiveness and application potential of confidence-guided diffusion modeling for two-dimensional medical image classification tasks, and provide valuable insights for further research on diffusion models in the field of medical image analysis. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

24 pages, 2623 KB  
Article
CD-Mosaic: A Context-Aware and Domain-Consistent Data Augmentation Method for PCB Micro-Defect Detection
by Sifan Lai, Shuangchao Ge, Xiaoting Guo, Jie Li and Kaiqiang Feng
Electronics 2026, 15(4), 767; https://doi.org/10.3390/electronics15040767 - 11 Feb 2026
Viewed by 265
Abstract
Detecting minute defects, such as spurs on the surface of a Printed Circuit Board (PCB), is extremely challenging due to their small size (average size < 20 pixels), sparse features, and high dependence on circuit topology context. The original Mosaic data augmentation method [...] Read more.
Detecting minute defects, such as spurs on the surface of a Printed Circuit Board (PCB), is extremely challenging due to their small size (average size < 20 pixels), sparse features, and high dependence on circuit topology context. The original Mosaic data augmentation method faces significant challenges with semantic adaptability when dealing with such tasks. Its unrestricted random cropping mechanism easily disrupts the topological structure of minute defects attached to the circuits, leading to the loss of key features. Moreover, a splicing strategy without domain constraints struggles to simulate real texture interference in industrial settings, making it difficult for the model to adapt to the complex and variable industrial inspection environment. To address these issues, this paper proposes a Context-aware and Domain-consistent Mosaic (CD-Mosaic) augmentation algorithm. This algorithm abandons pure randomness and constructs an adaptive augmentation framework that synergizes feature fidelity, geometric generalization, and texture perturbation. Geometrically, an intelligent sampling and dynamic integrity verification mechanism, driven by “utilization-centrality”, is designed to establish a controlled sample quality distribution. This prioritizes the preservation of the topological semantics of dominant samples to guide feature convergence. Meanwhile, an appropriate number of edge-truncated samples are strategically retained as geometric hard examples to enhance the model’s robustness against local occlusion. For texture, a dual-granularity visual perturbation strategy is proposed. Using a homologous texture library, a hard mask is generated in the background area to simulate foreign object interference, and a local transparency soft mask is applied in the defect area to simulate low signal-to-noise ratio imaging. This strategy synthesizes visual hard examples while maintaining photometric consistency. Experiments on an industrial-grade PCB dataset containing 2331 images demonstrate that the YOLOv11m model equipped with CD-Mosaic achieves a significant performance improvement. Compared with the native Mosaic baseline, the core metrics mAP@0.5 and Recall reach 0.923 and 86.1%, respectively, with a net increase of 8.3% and 8.8%; mAP@0.5:0.95 and APsmall, which characterize high-precision localization and small target detection capabilities, are improved to 0.529 (+3.0%) and 0.534 (+3.3%), respectively; the comprehensive metric F1-score jumps to 0.903 (+6.2%). The experiments prove that this method effectively solves the problem of missed detections of industrial minute defects by balancing sample quality and detection difficulty. Moreover, the inference speed of 84.9 FPS fully meets the requirements of industrial real-time detection. Full article
Show Figures

Figure 1

15 pages, 3040 KB  
Article
CGA-ViT: Channel-Guided Additive Attention for Efficient Vision Recognition
by Yayue Zhao, Jingli Miao, Zhenping Li, Baiyang Li, Anqi Zhuo and Yingxiao Zhao
Appl. Sci. 2026, 16(4), 1740; https://doi.org/10.3390/app16041740 - 10 Feb 2026
Cited by 1 | Viewed by 406
Abstract
Vision transformers (ViTs) excel at global context modeling with self-attention. However, standard self-attention leads to quadratic computational complexity, which restricts its practical use in high-resolution or latency-sensitive tasks. Existing methods achieve linear complexity via local window constraints or additive approximations. However, they often [...] Read more.
Vision transformers (ViTs) excel at global context modeling with self-attention. However, standard self-attention leads to quadratic computational complexity, which restricts its practical use in high-resolution or latency-sensitive tasks. Existing methods achieve linear complexity via local window constraints or additive approximations. However, they often compromise long-range dependency modeling. To address this issue, we propose the channel-guided additive attention vision transformer (CGA-ViT), which achieves synergistic optimization of multi-scale feature extraction and efficient global context modeling. First, we propose multi-scale dilated feature embedding (MDFE). By designing multi-scale sampling and spatial feature embedding, we can expand the receptive field and capture fine-grained features simply by adjusting the dilation rate in the early stages; second, we design channel-guided additive attention (CGA), dynamically modulating key vectors using query-derived descriptors, enabling long-range semantic interactions while maintaining linear complexity growth. We adopt a hierarchical structure, and in the shallow layers, we use CGA to carry out local-global interactions and use efficient additive attention in deep layers for global integration. Evaluations on ImageNet-1K show that CGA-ViT achieves 84.0% Top-1 accuracy with 4.7 GFLOPs, outperforming Swin-T (81.3%) and ConvNeXt-T (82.1%) by 2.7 and 1.9 percentage points under comparable computational costs. Ablation experiments verify MDFE and CGA, which together contribute to 65.0% of performance gains, with the rest from token-level supervision. Overall, CGA-ViT effectively balances the intrinsic tradeoff between efficiency and global modeling capability, significantly boosts visual recognition performance without extra computational overhead, and provides an efficient solution for lightweight ViT design. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

22 pages, 1012 KB  
Article
DeltaVLM: Interactive Remote Sensing Image Change Analysis via Instruction-Guided Difference Perception
by Pei Deng, Wenqian Zhou and Hanlin Wu
Remote Sens. 2026, 18(4), 541; https://doi.org/10.3390/rs18040541 - 8 Feb 2026
Viewed by 486
Abstract
The accurate interpretation of land cover changes in multi-temporal satellite imagery is critical for Earth observation. However, existing methods typically yield static outputs—such as binary masks or fixed captions—lacking interactivity and user guidance. To address this limitation, we introduce remote sensing image change [...] Read more.
The accurate interpretation of land cover changes in multi-temporal satellite imagery is critical for Earth observation. However, existing methods typically yield static outputs—such as binary masks or fixed captions—lacking interactivity and user guidance. To address this limitation, we introduce remote sensing image change analysis (RSICA), a novel paradigm that enables the instruction-guided, multi-turn exploration of temporal differences in bi-temporal images through visual question answering. To realize RSICA, we propose DeltaVLM, a vision language model specifically designed for interactive change understanding. DeltaVLM comprises three key components: (1) a fine-tuned bi-temporal vision encoder that independently extracts semantic features from each image in the input pair; (2) a visual difference perception module with a cross-semantic relation measuring (CSRM) mechanism to interpret changes; and (3) an instruction-guided Q-former that selects query-relevant change features and aligns them with a frozen large language model to generate context-aware responses. We also present ChangeChat-105k, a large-scale instruction-following dataset containing over 105k diverse samples. Extensive experiments show that DeltaVLM achieves state-of-the-art performance in both single-turn captioning and multi-turn interactive change analysis, surpassing both general multimodal models and specialized remote sensing vision language models. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

Back to TopTop