Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (40)

Search Parameters:
Keywords = cross-modality re-identification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 4081 KB  
Article
Neural Network-Based Prediction of Residual Paravalvular Leak in Bicuspid Aortic Valve TAVI Using CT-Derived Anatomical Features
by Yijun Yao, Weili Jiang, Xinyue Yang, Jianyong Wang, Ruisi Tang, Yuan Feng, Yiming Li and Mao Chen
Biomedicines 2026, 14(4), 946; https://doi.org/10.3390/biomedicines14040946 - 21 Apr 2026
Viewed by 168
Abstract
Background/Objectives: Transcatheter aortic valve implantation (TAVI) in patients with bicuspid aortic valve (BAV) remains associated with higher rates of residual paravalvular leak (PVL), which confers a two-fold increase in mortality. Despite procedural optimization including balloon post-dilatation, a subset of patients exhibit residual [...] Read more.
Background/Objectives: Transcatheter aortic valve implantation (TAVI) in patients with bicuspid aortic valve (BAV) remains associated with higher rates of residual paravalvular leak (PVL), which confers a two-fold increase in mortality. Despite procedural optimization including balloon post-dilatation, a subset of patients exhibit residual ≥moderate PVL. Pre-procedural identification of these patients could guide procedural planning. Methods: We retrospectively analyzed 402 BAV patients who underwent TAVI with self-expanding valves and balloon post-dilatation between January 2016 and June 2024. A multi-modal deep learning model (Model B) was developed, integrating a 3D ResNet encoder for computed tomography (CT) imaging features with a multilayer perceptron (MLP) for clinical variables, fused via a cross-attention mechanism. Its performance was compared against a conventional model (Model A) combining clinical variables with manually derived CT measurements. Both models were evaluated on identical test folds using 5-fold stratified cross-validation. Results: Of 402 patients, 36 (9.0%) had residual ≥moderate PVL, associated with significantly larger aortic root dimensions at all anatomical levels and greater aortic valve calcification volume (median 887.6 vs. 559.2 mm3; p = 0.004). Model A achieved a mean AUC of 0.694 (95% CI: 0.596–0.792). Model B achieved a mean AUC of 0.822 (95% CI: 0.680–0.964), with a specificity of 0.971, accuracy of 0.881, and PPV of 0.860, while sensitivity was 0.429, reflecting the limited number of outcome events in this cohort. Conclusions: A multi-modal deep learning model integrating expert-segmented CT imaging with clinical variables demonstrated significantly improved discrimination over the conventional approach in this internal cohort for predicting residual PVL in BAV-TAVI, supporting the integration of segmentation-guided deep learning into pre-procedural TAVI planning. However, given the modest number of outcome events, external validation is required to confirm the generalizability of these findings. Full article
(This article belongs to the Section Molecular and Translational Medicine)
Show Figures

Graphical abstract

20 pages, 4497 KB  
Article
Remote Sensing Identification of Benggang Using a Two-Stream Network with Multimodal Feature Enhancement and Sparse Attention
by Xuli Rao, Qihao Chen, Kexin Zhu, Zhide Chen, Jinshi Lin and Yanhe Huang
Electronics 2026, 15(6), 1331; https://doi.org/10.3390/electronics15061331 - 23 Mar 2026
Viewed by 249
Abstract
Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a [...] Read more.
Benggang (Benggang), a typical landform characterized by severe erosion and a geohazard in the red-soil hilly regions of southern China, is characterized by a fragmented texture, irregular boundaries, and high similarity to background objects such as bare soil and roads, which poses a dual challenge of “multiscale variability + strong noise” for automated identification at regional scales. To address insufficient information from a single modality and the limited representation of cross-scale features, this study proposes a dual-stream feature-fusion network (DF-Net) for multisource data consisting of a digital orthophoto map (DOM) and a digital elevation model (DEM). The method adopts ResNeSt50d as the backbone of the two branches: on the DOM side, a Canny-edge channel is stacked to enhance high-frequency boundary information; on the DEM side, derived terrain factors, including slope, aspect, curvature, and hillshade, are introduced to provide morphological constraints. In the cross-modal fusion stage, a multiscale sparse attention fusion module is designed, which acquires contextual information via multiwindow average pooling and suppresses noise interference through top-K sparsification. In the decision stage, a multibranch ensemble is employed to improve classification stability. Taking Anxi County, Fujian Province, as the study area, a coregistered dataset of GF-2 (1 m) DOM and ALOS (12.5 m) DEMs is constructed, and a zonal partitioning strategy is adopted to evaluate the model’s generalization ability. The experimental results show that DF-Net achieves 97.44% accuracy, 85.71% recall, and an 82.98% F1 score in the independent test zone, outperforming multiple mainstream CNN/transformer classification models. This study indicates that the strategy of “multimodal feature enhancement + sparse attention fusion” tailored to Benggang erosional landforms can significantly improve recognition performance under complex backgrounds, providing technical support for rapid Benggang surveys and governance-effectiveness assessments. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

28 pages, 5658 KB  
Article
A Multimodule Collaborative Framework for Unsupervised Visible–Infrared Person Re-Identification with Channel Enhancement Modality
by Baoshan Sun, Yi Du and Liqing Gao
Sensors 2026, 26(6), 1770; https://doi.org/10.3390/s26061770 - 11 Mar 2026
Viewed by 433
Abstract
Unsupervised visible–infrared person re-identification (USL-VI-ReID) plays a pivotal role in cross-modal computer vision applications for intelligent surveillance and public safety. However, the task remains hampered by large modality gaps and limited granularity in feature representations. In particular, channel augmentation (CA) is typically used [...] Read more.
Unsupervised visible–infrared person re-identification (USL-VI-ReID) plays a pivotal role in cross-modal computer vision applications for intelligent surveillance and public safety. However, the task remains hampered by large modality gaps and limited granularity in feature representations. In particular, channel augmentation (CA) is typically used only for data augmentation, and its potential as an independent input modality remains unexplored. To address these shortcomings, we present a multimodule collaborative USL-VI-ReID framework that explicitly treats CA as a separate input modality. The framework combines four complementary modules. The Person-ReID Adaptive Convolutional Block Attention Module (PA-CBAM) module extracts discriminative features using a two-level attention mechanism that refines salient spatial and channel cues. The Varied Regional Alignment (VRA) module performs cross-modal regional alignment and leverages the Multimodal Assisted Adversarial Learning (MAAL) to reinforce region-level correspondence. The Varied Regional Neighbor Learning (VRNL) implements reliable neighborhood learning via multi-region association to stabilize pseudo-labels and capture local structure. Finally, the Uniform Merging (UM) module merges split clusters through alternating contrastive learning to improve cluster consistency. We evaluate the proposed method on SYSU-MM01 and RegDB. On RegDB’s visible-to-infrared setting, the approach achieves Rank-1 = 93.34%, mean Average Precision (mAP) = 87.55%, and mean Inverse Negative Penalty (mINP) = 76.08%. These results indicate that our method effectively reduces modal discrepancies and increases feature discriminability. It outperforms most existing unsupervised baselines and several supervised approaches, thereby advancing the practical applicability of USL-VI-ReID. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
Show Figures

Figure 1

20 pages, 1379 KB  
Article
Hybrid Vision Transformer–CNN Framework for Alzheimer’s Disease Cell Type Classification: A Comparative Study with Vision–Language Models
by Md Easin Hasan, Md Tahmid Hasan Fuad, Omar Sharif and Amy Wagler
J. Imaging 2026, 12(3), 98; https://doi.org/10.3390/jimaging12030098 - 25 Feb 2026
Viewed by 874
Abstract
Accurate identification of Alzheimer’s disease (AD)-related cellular characteristics from microscopy images is essential for understanding neurodegenerative mechanisms at the cellular level. While most computational approaches focus on macroscopic neuroimaging modalities, cell type classification from microscopy remains relatively underexplored. In this study, we propose [...] Read more.
Accurate identification of Alzheimer’s disease (AD)-related cellular characteristics from microscopy images is essential for understanding neurodegenerative mechanisms at the cellular level. While most computational approaches focus on macroscopic neuroimaging modalities, cell type classification from microscopy remains relatively underexplored. In this study, we propose a hybrid vision transformer–convolutional neural network (ViT–CNN) framework that integrates DeiT-Small and EfficientNet-B7 to classify three AD-related cell types—astrocytes, cortical neurons, and SH-SY5Y neuroblastoma cells—from phase-contrast microscopy images. We perform a comparative evaluation against conventional CNN architectures (DenseNet, ResNet, InceptionNet, and MobileNet) and prompt-based multimodal vision–language models (GPT-5, GPT-4o, and Gemini 2.5-Flash) using zero-shot, few-shot, and chain-of-thought prompting. Experiments conducted with stratified fivefold cross-validation show that the proposed hybrid model achieves a test accuracy of 61.03% and a macro F1 score of 61.85, outperforming standalone CNN baselines and prompt-only LLM approaches under data-limited conditions. These results suggest that combining convolutional inductive biases with transformer-based global context modeling can improve generalization for cellular microscopy classification. While constrained by dataset size and scope, this work serves as a proof of concept and highlights promising directions for future research in domain-specific pretraining, multimodal data integration, and explainable AI for AD-related cellular analysis. Full article
Show Figures

Figure 1

23 pages, 15029 KB  
Article
LPDiag: LLM-Enhanced Multimodal Prototype Learning Framework for Intelligent Tomato Leaf Disease Diagnosis
by Heng Dong, Xuemei Qiu, Dawei Fan, Mingyue Han, Jiaming Yu, Changcai Yang, Jinghu Li, Ruijun Liu, Riqing Chen and Qiufeng Chen
Agriculture 2026, 16(4), 419; https://doi.org/10.3390/agriculture16040419 - 12 Feb 2026
Viewed by 682
Abstract
Tomato leaf diseases exhibit subtle inter-class differences and substantial intra-class variability, making accurate identification challenging for conventional deep learning models, especially under real-world conditions with diverse lighting, occlusion, and growth stages. Moreover, most existing approaches rely solely on visual features and lack the [...] Read more.
Tomato leaf diseases exhibit subtle inter-class differences and substantial intra-class variability, making accurate identification challenging for conventional deep learning models, especially under real-world conditions with diverse lighting, occlusion, and growth stages. Moreover, most existing approaches rely solely on visual features and lack the ability to incorporate semantic descriptions or expert knowledge, limiting their robustness and interpretability. To address these issues, we propose LPDiag, a multimodal prototype-attention diagnostic framework that integrates large language models (LLMs) for fine-grained recognition of tomato diseases. The framework first employs an LLM-driven semantic understanding module to encode symptom-aware textual embeddings from disease descriptions. These embeddings are then aligned with multi-scale visual features extracted by an enhanced Res2Net backbone, enabling cross-modal representation learning. A set of learnable prototype vectors, combined with a knowledge-enhanced attention mechanism, further strengthens the interaction between visual patterns and LLM prior knowledge, resulting in more discriminative and interpretable representations. Additionally, we develop an interactive diagnostic system that supports natural-language querying and image-based identification, facilitating practical deployment in heterogeneous agricultural environments. Extensive experiments on three widely used datasets demonstrate that LPDiag achieves a mean accuracy of 98.83%, outperforming state-of-the-art models while offering improved explanatory capability. The proposed framework offers a promising direction for integrating LLM-based semantic reasoning with visual perception to enhance intelligent and trustworthy plant disease diagnostics. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

25 pages, 7527 KB  
Article
Heterogeneous Multi-Domain Dataset Synthesis to Facilitate Privacy and Risk Assessments in Smart City IoT
by Matthew Boeding, Michael Hempel, Hamid Sharif and Juan Lopez
Electronics 2026, 15(3), 692; https://doi.org/10.3390/electronics15030692 - 5 Feb 2026
Viewed by 566
Abstract
The emergence of the Smart Cities paradigm and the rapid expansion and integration of Internet of Things (IoT) technologies within this context have created unprecedented opportunities for high-resolution behavioral analytics, urban optimization, and context-aware services. However, this same proliferation intensifies privacy risks, particularly [...] Read more.
The emergence of the Smart Cities paradigm and the rapid expansion and integration of Internet of Things (IoT) technologies within this context have created unprecedented opportunities for high-resolution behavioral analytics, urban optimization, and context-aware services. However, this same proliferation intensifies privacy risks, particularly those arising from cross-modal data linkage across heterogeneous sensing platforms. To address these challenges, this paper introduces a comprehensive, statistically grounded framework for generating synthetic, multimodal IoT datasets tailored to Smart City research. The framework produces behaviorally plausible synthetic data suitable for preliminary privacy risk assessment and as a benchmark for future re-identification studies, as well as for evaluating algorithms in mobility modeling, urban informatics, and privacy-enhancing technologies. As part of our approach, we formalize probabilistic methods for synthesizing three heterogeneous and operationally relevant data streams—cellular mobility traces, payment terminal transaction logs, and Smart Retail nutrition records—capturing the behaviors of a large number of synthetically generated urban residents over a 12-week period. The framework integrates spatially explicit merchant selection using K-Dimensional (KD)-tree nearest-neighbor algorithms, temporally correlated anchor-based mobility simulation reflective of daily urban rhythms, and dietary-constraint filtering to preserve ecological validity in consumption patterns. In total, the system generates approximately 116 million mobility pings, 5.4 million transactions, and 1.9 million itemized purchases, yielding a reproducible benchmark for evaluating multimodal analytics, privacy-preserving computation, and secure IoT data-sharing protocols. To show the validity of this dataset, the underlying distributions of these residents were successfully validated against reported distributions in published research. We present preliminary uniqueness and cross-modal linkage indicators; comprehensive re-identification benchmarking against specific attack algorithms is planned as future work. This framework can be easily adapted to various scenarios of interest in Smart Cities and other IoT applications. By aligning methodological rigor with the operational needs of Smart City ecosystems, this work fills critical gaps in synthetic data generation for privacy-sensitive domains, including intelligent transportation systems, urban health informatics, and next-generation digital commerce infrastructures. Full article
Show Figures

Figure 1

20 pages, 20306 KB  
Article
Robust Tracker: Integrating CPM-YOLO and BOTSORT for Cross-Modal Vessel Tracking
by Feng Lv and Ying Zhang
Sensors 2026, 26(3), 983; https://doi.org/10.3390/s26030983 - 3 Feb 2026
Viewed by 513
Abstract
This paper presents a high-accuracy and robust multi-object tracking method for maritime vessel detection and tracking in complex marine environments, characterized by dense targets, large-scale variations, and frequent occlusions. The proposed approach adopts an enhanced YOLOv8-based detector with lightweight feature enhancement and attention [...] Read more.
This paper presents a high-accuracy and robust multi-object tracking method for maritime vessel detection and tracking in complex marine environments, characterized by dense targets, large-scale variations, and frequent occlusions. The proposed approach adopts an enhanced YOLOv8-based detector with lightweight feature enhancement and attention mechanisms to improve its capability in detecting small-scale vessels and complex scenes. Furthermore, a tracking framework combining BOTSORT with an OSNet-based re-identification (ReID) model is employed to achieve stable and reliable vessel association. Experimental results on the Near-Infrared On-Shore (NIR) dataset demonstrate that the proposed method improves Precision, Recall, mAP@0.5, and mAP@0.5:0.95 by approximately 4.0%, 5.0%, 5.1%, and 5.4%, respectively, compared with the baseline YOLOv8, while reducing parameter count and model size by 11.6% and 6.5%. On the Visible On-Shore (VIS) dataset, the proposed method outperforms state-of-the-art approaches in detection accuracy and robustness, further validating its effectiveness and cross-modal generalization capability. In multi-object tracking tasks, the proposed CPM-YOLO and BOTSORT framework demonstrates clear advantages in trajectory continuity, occlusion handling, and identity preservation compared with mainstream tracking algorithms. On the NIR dataset, the proposed method achieves a competitive inference speed of 188 FPS, while running at 187 FPS on the VIS dataset, demonstrating that the accuracy improvements are achieved without sacrificing real-time performance. Overall, the proposed method achieves a favorable balance between detection accuracy, tracking robustness, and model efficiency, making it well-suited for practical maritime applications. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

20 pages, 33907 KB  
Article
GLCN: Graph-Aware Locality-Enhanced Cross-Modality Re-ID Network
by Junjie Cao, Yuhang Yu, Rong Rong and Xing Xie
J. Imaging 2026, 12(1), 42; https://doi.org/10.3390/jimaging12010042 - 13 Jan 2026
Viewed by 538
Abstract
Cross-modality person re-identification faces challenges such as illumination discrepancies, local occlusions, and inconsistent modality structures, leading to misalignment and sensitivity issues. We propose GLCN, a framework that addresses these problems by enhancing representation learning through locality enhancement, cross-modality structural alignment, and intra-modality compactness. [...] Read more.
Cross-modality person re-identification faces challenges such as illumination discrepancies, local occlusions, and inconsistent modality structures, leading to misalignment and sensitivity issues. We propose GLCN, a framework that addresses these problems by enhancing representation learning through locality enhancement, cross-modality structural alignment, and intra-modality compactness. Key components include the Locality-Preserved Cross-branch Fusion (LPCF) module, which combines Local–Positional–Channel Gating (LPCG) for local region and positional sensitivity; Cross-branch Context Interpolated Attention (CCIA) for stable cross-branch consistency; and Graph-Enhanced Center Geometry Alignment (GE-CGA), which aligns class-center similarity structures across modalities to preserve category-level relationships. We also introduce Intra-Modal Prototype Discrepancy Mining Loss (IPDM-Loss) to reduce intra-class variance and improve inter-class separation, thereby creating more compact identity structures in both RGB and IR spaces. Extensive experiments on SYSU-MM01, RegDB, and other benchmarks demonstrate the effectiveness of our approach. Full article
Show Figures

Figure 1

17 pages, 1552 KB  
Article
Adaptive Pseudo Text Augmentation for Noise-Robust Text-to-Image Person Re-Identification
by Lian Xiong, Wangdong Li, Huaixin Chen and Yuxi Feng
Sensors 2025, 25(23), 7157; https://doi.org/10.3390/s25237157 - 24 Nov 2025
Cited by 1 | Viewed by 865
Abstract
Text-to-image person re-identification (T2I-ReID) aims to retrieve pedestrians from images/videos based on textual descriptions. However, most methods implicitly assume that training image–text pairs are correctly aligned, while in practice, issues such as under-correlated and falsely correlated image–text pairs arise due to coarse-grained text [...] Read more.
Text-to-image person re-identification (T2I-ReID) aims to retrieve pedestrians from images/videos based on textual descriptions. However, most methods implicitly assume that training image–text pairs are correctly aligned, while in practice, issues such as under-correlated and falsely correlated image–text pairs arise due to coarse-grained text annotations and erroneous textual descriptions. To address this problem, we propose a T2I-ReID method based on noise identification and pseudo-text generation. We first extracts image–text features using the Contrastive Language–Image Pre-Training model (CLIP), then employs the token fusion model to select and fuse informative local token features, resulting in token fusion embedding (TFE) for fine-grained representations. To identify noisy image–text pairs, we apply the two-component Gaussian mixture model (GMM) to fit the per-sample loss distributions computed by the predictions of basic feature embedding (BFE) and TFE. Finally, when the noise identification tends to stabilize, we employ a multimodal large language model (MLLM) to generate pseudo-texts that replace the noisy text, facilitating learning more reliable visual–semantic associations and cross-modal alignment under noisy conditions. Extensive experiments on the CUHK-PEDES, ICFG-PEDES, and RSTPReid datasets demonstrate the effectiveness of our proposed model and the good compatibility with other baselines. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

25 pages, 3819 KB  
Article
Cross-Modal and Contrastive Optimization for Explainable Multimodal Recognition of Predatory and Parasitic Insects
by Mingyu Liu, Liuxin Wang, Ruihao Jia, Shiyu Ji, Yalin Wu, Yuxin Wu, Luozehan Xie and Min Dong
Insects 2025, 16(12), 1187; https://doi.org/10.3390/insects16121187 - 22 Nov 2025
Cited by 1 | Viewed by 1170
Abstract
Natural enemies play a vital role in pest suppression and ecological balance within agricultural ecosystems. However, conventional vision-based recognition methods are highly susceptible to illumination variation, occlusion, and background noise in complex field environments, making it difficult to accurately distinguish morphologically similar species. [...] Read more.
Natural enemies play a vital role in pest suppression and ecological balance within agricultural ecosystems. However, conventional vision-based recognition methods are highly susceptible to illumination variation, occlusion, and background noise in complex field environments, making it difficult to accurately distinguish morphologically similar species. To address these challenges, a multimodal natural enemy recognition and ecological interpretation framework, termed MAVC-XAI, is proposed to enhance recognition accuracy and ecological interpretability in real-world agricultural scenarios. The framework employs a dual-branch spatiotemporal feature extraction network for deep modeling of both visual and acoustic signals, introduces a cross-modal sampling attention mechanism for dynamic inter-modality alignment, and incorporates cross-species contrastive learning to optimize inter-class feature boundaries. Additionally, an explainable generation module is designed to provide ecological visualizations of the model’s decision-making process in both visual and acoustic domains. Experiments conducted on multimodal datasets collected across multiple agricultural regions confirm the effectiveness of the proposed approach. The MAVC-XAI framework achieves an accuracy of 0.938, a precision of 0.932, a recall of 0.927, an F1-score of 0.929, an mAP@50 of 0.872, and a Top-5 recognition rate of 97.8%, all significantly surpassing unimodal models such as ResNet, Swin-T, and VGGish, as well as multimodal baselines including MMBT and ViLT. Ablation experiments further validate the critical contributions of the cross-modal sampling attention and contrastive learning modules to performance enhancement. The proposed framework not only enables high-precision natural enemy identification under complex ecological conditions but also provides an interpretable and intelligent foundation for AI-driven ecological pest management and food security monitoring. Full article
Show Figures

Figure 1

20 pages, 59706 KB  
Article
Learning Hierarchically Consistent Disentanglement with Multi-Channel Augmentation for Public Security-Oriented Sketch Person Re-Identification
by Yu Ye, Zhihong Sun and Jun Chen
Sensors 2025, 25(19), 6155; https://doi.org/10.3390/s25196155 - 4 Oct 2025
Cited by 1 | Viewed by 989
Abstract
Sketch re-identification (Re-ID) aims to retrieve pedestrian photographs in the gallery dataset by a query sketch image drawn by professionals, which is crucial for criminal investigations and missing person searches in the field of public security. The main challenge of this task lies [...] Read more.
Sketch re-identification (Re-ID) aims to retrieve pedestrian photographs in the gallery dataset by a query sketch image drawn by professionals, which is crucial for criminal investigations and missing person searches in the field of public security. The main challenge of this task lies in bridging the significant modality gap between sketches and photos while extracting discriminative modality-invariant features. However, information asymmetry between sketches and RGB photographs, particularly the differences in color information, severely interferes with cross-modal matching processes. To address this challenge, we propose a novel network architecture that integrates multi-channel augmentation with hierarchically consistent disentanglement learning. Specifically, a multi-channel augmentation module is developed to mitigate the interference of color bias in cross-modal matching. Furthermore, a modality-disentangled prototype(MDP) module is introduced to decompose pedestrian representations at the feature level into modality-invariant structural prototypes and modality-specific appearance prototypes. Additionally, a cross-layer decoupling consistency constraint is designed to ensure the semantic coherence of disentangled prototypes across different network layers and to improve the stability of the whole decoupling process. Extensive experimental results on two public datasets demonstrate the superiority of our proposed approach over state-of-the-art methods. Full article
(This article belongs to the Special Issue Advances in Security for Emerging Intelligent Systems)
Show Figures

Figure 1

25 pages, 54500 KB  
Article
Parking Pattern Guided Vehicle and Aircraft Detection in Aligned SAR-EO Aerial View Images
by Zhe Geng, Shiyu Zhang, Yu Zhang, Chongqi Xu, Linyi Wu and Daiyin Zhu
Remote Sens. 2025, 17(16), 2808; https://doi.org/10.3390/rs17162808 - 13 Aug 2025
Viewed by 1668
Abstract
Although SAR systems can provide high-resolution aerial view images all-day, all-weather, the aspect and pose-sensitivity of the SAR target signatures, which defies the Gestalt perceptual principles, sets a frustrating performance upper bound for SAR Automatic Target Recognition (ATR). Therefore, we propose a network [...] Read more.
Although SAR systems can provide high-resolution aerial view images all-day, all-weather, the aspect and pose-sensitivity of the SAR target signatures, which defies the Gestalt perceptual principles, sets a frustrating performance upper bound for SAR Automatic Target Recognition (ATR). Therefore, we propose a network to support context-guided ATR by using aligned Electro-Optical (EO)-SAR image pairs. To realize EO-SAR image scene grammar alignment, the stable context features highly correlated to the parking patterns of the vehicle and aircraft targets are extracted from the EO images as prior knowledge, which is used to assist SAR-ATR. The proposed network consists of a Scene Recognition Module (SRM) and an instance-level Cross-modality ATR Module (CATRM). The SRM is based on a novel light-condition-driven adaptive EO-SAR decision weighting scheme, and the Outlier Exposure (OE) approach is employed for SRM training to realize Out-of-Distribution (OOD) scene detection. Once the scene depicted in the cut of interest is identified with the SRM, the image cut is sent to the CATRM for ATR. Considering that the EO-SAR images acquired from diverse observation angles often feature unbalanced quality, a novel class-incremental learning method based on the Context-Guided Re-Identification (ReID)-based Key-view (CGRID-Key) exemplar selection strategy is devised so that the network is capable of continuous learning in the open-world deployment environment. Vehicle ATR experimental results based on the UNICORN dataset, which consists of 360-degree EO-SAR images of an army base, show that the CGRID-Key exemplar strategy offers a classification accuracy 29.3% higher than the baseline model for the incremental vehicle category, SUV. Moreover, aircraft ATR experimental results based on the aligned EO-SAR images collected over several representative airports and the Arizona aircraft boneyard show that the proposed network achieves an F1 score of 0.987, which is 9% higher than YOLOv8. Full article
(This article belongs to the Special Issue Applications of SAR for Environment Observation Analysis)
Show Figures

Figure 1

22 pages, 7733 KB  
Article
Parsing-Guided Differential Enhancement Graph Learning for Visible-Infrared Person Re-Identification
by Xingpeng Li, Huabing Liu, Chen Xue, Nuo Wang and Enwen Hu
Electronics 2025, 14(15), 3118; https://doi.org/10.3390/electronics14153118 - 5 Aug 2025
Cited by 2 | Viewed by 1231
Abstract
Visible-Infrared Person Re-Identification (VI-ReID) is of crucial importance in applications such as monitoring and security. However, challenges faced from intra-class variations and cross-modal differences are often exacerbated by inaccurate infrared analysis and insufficient structural modeling. To address these issues, we propose Parsing-guided Differential [...] Read more.
Visible-Infrared Person Re-Identification (VI-ReID) is of crucial importance in applications such as monitoring and security. However, challenges faced from intra-class variations and cross-modal differences are often exacerbated by inaccurate infrared analysis and insufficient structural modeling. To address these issues, we propose Parsing-guided Differential Enhancement Graph Learning (PDEGL), a novel framework that learns discriminative representations through a dual-branch architecture synergizing global feature refinement with part-based structural graph analysis. In particular, we introduce a Differential Infrared Part Enhancement (DIPE) module to correct infrared parsing errors and a Parsing Structural Graph (PSG) module to model high-order topological relationships between body parts for structural consistency matching. Furthermore, we design a Position-sensitive Spatial-Channel Attention (PSCA) module to enhance global feature discriminability. Extensive evaluations on the SYSU-MM01, RegDB, and LLCM datasets demonstrate that our PDEGL method achieves competitive performance. Full article
Show Figures

Figure 1

22 pages, 2514 KB  
Article
High-Accuracy Recognition Method for Diseased Chicken Feces Based on Image and Text Information Fusion
by Duanli Yang, Zishang Tian, Jianzhong Xi, Hui Chen, Erdong Sun and Lianzeng Wang
Animals 2025, 15(15), 2158; https://doi.org/10.3390/ani15152158 - 22 Jul 2025
Viewed by 1236
Abstract
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces [...] Read more.
Poultry feces, a critical biomarker for health assessment, requires timely and accurate pathological identification for food safety. Conventional visual-only methods face limitations due to environmental sensitivity and high visual similarity among feces from different diseases. To address this, we propose MMCD (Multimodal Chicken-feces Diagnosis), a ResNet50-based multimodal fusion model leveraging semantic complementarity between images and descriptive text to enhance diagnostic precision. Key innovations include the following: (1) Integrating MASA(Manhattan self-attention)and DSconv (Depthwise Separable convolution) into the backbone network to mitigate feature confusion. (2) Utilizing a pre-trained BERT to extract textual semantic features, reducing annotation dependency and cost. (3) Designing a lightweight Gated Cross-Attention (GCA) module for dynamic multimodal fusion, achieving a 41% parameter reduction versus cross-modal transformers. Experiments demonstrate that MMCD significantly outperforms single-modal baselines in Accuracy (+8.69%), Recall (+8.72%), Precision (+8.67%), and F1 score (+8.72%). It surpasses simple feature concatenation by 2.51–2.82% and reduces parameters by 7.5M and computations by 1.62 GFLOPs versus the base ResNet50. This work validates multimodal fusion’s efficacy in pathological fecal detection, providing a theoretical and technical foundation for agricultural health monitoring systems. Full article
(This article belongs to the Section Animal Welfare)
Show Figures

Figure 1

25 pages, 1669 KB  
Article
Zero-Shot Infrared Domain Adaptation for Pedestrian Re-Identification via Deep Learning
by Xu Zhang, Yinghui Liu, Liangchen Guo and Huadong Sun
Electronics 2025, 14(14), 2784; https://doi.org/10.3390/electronics14142784 - 10 Jul 2025
Cited by 1 | Viewed by 1375
Abstract
In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification [...] Read more.
In computer vision, the performance of detectors trained under optimal lighting conditions is significantly impaired when applied to infrared domains due to the scarcity of labeled infrared target domain data and the inherent degradation in infrared image quality. Progress in cross-domain pedestrian re-identification is hindered by the lack of labeled infrared image data. To address the degradation of pedestrian recognition in infrared environments, we propose a framework for zero-shot infrared domain adaptation. This integrated approach is designed to mitigate the challenges of pedestrian recognition in infrared domains while enabling zero-shot domain adaptation. Specifically, an advanced reflectance representation learning module and an exchange–re-decomposition–coherence process are employed to learn illumination invariance and to enhance the model’s effectiveness, respectively. Additionally, the CLIP (Contrastive Language–Image Pretraining) image encoder and DINO (Distillation with No Labels) are fused for feature extraction, improving model performance under infrared conditions and enhancing its generalization capability. To further improve model performance, we introduce the Non-Local Attention (NLA) module, the Instance-based Weighted Part Attention (IWPA) module, and the Multi-head Self-Attention module. The NLA module captures global feature dependencies, particularly long-range feature relationships, effectively mitigating issues such as blurred or missing image information in feature degradation scenarios. The IWPA module focuses on localized regions to enhance model accuracy in complex backgrounds and unevenly lit scenes. Meanwhile, the Multi-head Self-Attention module captures long-range dependencies between cross-modal features, further strengthening environmental understanding and scene modeling. The key innovation of this work lies in the skillful combination and application of existing technologies to new domains, overcoming the challenges posed by vision in infrared environments. Experimental results on the SYSU-MM01 dataset show that, under the single-shot setting, Rank-1 Accuracy (Rank-1) andmean Average Precision (mAP) values of 37.97% and 37.25%, respectively, were achieved, while in the multi-shot setting, values of 34.96% and 34.14% were attained. Full article
(This article belongs to the Special Issue Deep Learning in Image Processing and Computer Vision)
Show Figures

Figure 1

Back to TopTop