Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,619)

Search Parameters:
Keywords = texture transformer

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 43590 KB  
Article
TreeSpecViT: Fine-Grained Tree Species Classification from UAV RGB Imagery for Campus-Scale Human–Vegetation Coupling Analysis
by Yinghui Yuan, Yunfeng Yang, Zhulin Chen and Sheng Xu
Remote Sens. 2026, 18(6), 928; https://doi.org/10.3390/rs18060928 - 18 Mar 2026
Abstract
On university campuses, trees and green spaces shape how students and staff move and use outdoor spaces. To support planning, tree species information is needed at the level of individual trees. Tree species classification from UAV RGB imagery remains difficult in complex campus [...] Read more.
On university campuses, trees and green spaces shape how students and staff move and use outdoor spaces. To support planning, tree species information is needed at the level of individual trees. Tree species classification from UAV RGB imagery remains difficult in complex campus scenes because roads, buildings, shadows and subtle inter species differences degrade recognition. To address background interference, the loss of subtle fine-grained cues before tokenization, and insufficient local structure modeling in lightweight transformer-based classification, we propose TreeSpecViT for tree species classification. It uses a MobileViT backbone and a Background Suppression Module (BSM) to reduce clutter from non-canopy regions. A Fine-Grained Feature Guidance (FGF) module is inserted before the unfold operation to enhance canopy details and guide tokenization toward key regions. 1×1 convolutional neck layers align channels, and a Global and Local Fusion (GLF) module jointly models overall crown semantics and local textures for species recognition. From the predicted masks and species labels, we build an individual tree digital archive. The archive stores per tree geometric attributes and can be linked with grids of campus activity intensity to analyze how activity patterns relate to vegetation structure. TreeSpecViT achieves an Accuracy of 87.88% (+6.06%) and an F1 score of 76.48% (+5.08%) on the SZUTreeDataset. On our self constructed NJFUDataset, it reaches 76.30% (+5.10%) in Accuracy and 70.10% (+7.20%) in F1. These results surpass mainstream models. Ablation experiments show that the modules jointly reduce background clutter and enhance canopy features. Overall, TreeSpecViT supports campus scale analyses that link human activity intensity to vegetation patterns and provides a practical basis for planning and adjusting campus green spaces. Full article
19 pages, 7323 KB  
Article
Mathematical Benchmarking of Convolutional Neural Networks for Thai Dialect Recognition: A Spectrogram Texture Classification Approach
by Porawat Visutsak, Duongduen Ongrungruaeng, Surapong Wiriya and Keun Ho Ryu
Electronics 2026, 15(6), 1271; https://doi.org/10.3390/electronics15061271 - 18 Mar 2026
Abstract
This study rigorously evaluates 13 Convolutional Neural Network (CNN) architectures for Thai dialect recognition. By treating Automatic Speech Recognition (ASR) as a computer vision texture classification task, we processed an extensive 840-h dataset from the Spoken Language Systems, Chulalongkorn University (SLSCU) corpus. Raw [...] Read more.
This study rigorously evaluates 13 Convolutional Neural Network (CNN) architectures for Thai dialect recognition. By treating Automatic Speech Recognition (ASR) as a computer vision texture classification task, we processed an extensive 840-h dataset from the Spoken Language Systems, Chulalongkorn University (SLSCU) corpus. Raw audio from four major dialects—Central, Northern (Khummuang), Northeastern (Korat), and Southern (Pat-tani)—was transformed into 2D Mel-spectrograms using the Short-Time Fourier Transform (STFT). We analyzed a diverse range of architectures, including the VGG, Inception, ResNet, DenseNet, and MobileNet families, to establish the optimal trade-off between mathematical complexity and spectral feature extraction. Our experimental results identify NASNet-Mobile as the most effective model, achieving a macro-average F1-score of 0.9425. The analysis suggests that NASNet’s search-optimized cell structure is uniquely capable of capturing the multiscale texture of phonetic formants. In contrast, we observed a catastrophic mode collapse in VGG16 (32.97% accuracy), likely due to excessive parameter bloat, while Xception and MobileNetV2 maintained robust generalization. Confusion matrix analysis reveals high acoustic distinctiveness for Southern Thai (96.7% recall), whereas Northern Thai exhibits significant spectral overlap with Central Thai. These results support the hypothesis that CNNs interpret spectrograms as textures rather than discrete objects, positioning NASNet-Mobile as a high-performance, low-latency baseline for edge-device deployment in resource-constrained environments. Full article
(This article belongs to the Special Issue Advances in Machine Learning for Image Classification)
Show Figures

Figure 1

14 pages, 3141 KB  
Article
Enhanced Real-Time Detector for Industrial Vision-Based Corn Impurity Detection
by Xiao Zhang, Yuhang Bian, Xiangdong Li, Haoze Yu, Dong Li and Min Wu
Foods 2026, 15(6), 1065; https://doi.org/10.3390/foods15061065 - 18 Mar 2026
Abstract
The effective cleaning of corn prior to storage is crucial for ensuring grain quality and safety. Traditional Convolutional Neural Network (CNN)-based detection methods often struggle to maintain accuracy in scenarios with dense occlusions. Furthermore, limitations in image quality and feature representation hinder their [...] Read more.
The effective cleaning of corn prior to storage is crucial for ensuring grain quality and safety. Traditional Convolutional Neural Network (CNN)-based detection methods often struggle to maintain accuracy in scenarios with dense occlusions. Furthermore, limitations in image quality and feature representation hinder their generalization to diverse impurity types. To address these challenges, this paper proposes an enhanced real-time detector transformer model named RT-DETR-CD (Real-Time Detector Transformer with Convolution and Dynamic Upsampling) for corn impurity detection based on industrial vision. This approach integrates Receptive Field Attention Convolutions (RFAConv) to enhance sensitivity to local texture details and employs the dynamic upsampling operator DySample to restore high-frequency edge information. Additionally, a novel Inner-Shape-IoU loss function is introduced to accelerate bounding box regression for objects with varying aspect ratios. Images were captured using FLIR industrial cameras under controllable annular LED illumination. Experiments on a self-built dataset demonstrate that the proposed model achieves a 4.7% improvement in mean average precision (mAP) and operates at 68 frames per second (FPS), outperforming the original RT-DETR model in both accuracy and speed. This work provides a practical solution for real-time, high-precision impurity detection on grain processing lines. Full article
(This article belongs to the Section Food Analytical Methods)
Show Figures

Figure 1

23 pages, 13050 KB  
Article
BAWSeg: A UAV Multispectral Benchmark for Barley Weed Segmentation
by Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Dustin Severtson and Ajmal Mian
Remote Sens. 2026, 18(6), 915; https://doi.org/10.3390/rs18060915 - 17 Mar 2026
Abstract
Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or [...] Read more.
Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or on single-stream convolutional neural network (CNN) and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA (Vegetation Index and Spectral Attention), a two-stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five-band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip-connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio-based index compression. The index stream operates on vegetation-index maps with windowed self-attention to model local structure efficiently, state-space layers to propagate field-scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment-oriented evaluation, we introduce BAWSeg, a four-year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near-infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage-free block splits. On BAWSeg, VISA achieves 75.6% mean Intersection over Union (mIoU) and 63.5% weed Intersection over Union (IoU) with 22.8 M parameters, outperforming a multispectral SegFormer-B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross-plot and cross-year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively. The full BAWSeg benchmark dataset, VISA code, trained model weights, and protocol files will be released upon publication. Full article
Show Figures

Figure 1

28 pages, 12746 KB  
Article
PSTNet: A Hyperspectral Image Classification Method Based on Adaptive Spectral–Spatial Tokens and Parallel Attention
by Shaokang Yu, Yong Mei, Xiangsuo Fan, Song Guo, Wujun Xu and Jinlong Fan
Remote Sens. 2026, 18(6), 901; https://doi.org/10.3390/rs18060901 - 15 Mar 2026
Abstract
Hyperspectral image classification holds significant applications across multiple domains due to its rich spectral and spatial information. However, it faces challenges such as spectral variation within the same object, spectral variation across different objects, and noise interference. Existing methods like convolutional neural networks [...] Read more.
Hyperspectral image classification holds significant applications across multiple domains due to its rich spectral and spatial information. However, it faces challenges such as spectral variation within the same object, spectral variation across different objects, and noise interference. Existing methods like convolutional neural networks perform well in local feature extraction but inadequately model long-range dependencies. While Transformers can capture global relationships, they struggle to effectively coordinate spectral and spatial information modeling. To address these limitations, this paper proposes a dual-branch collaborative Transformer network (PST-Net). This architecture integrates an adaptive spectral–spatial token (ASST) module, a Parallel Attention-Augmented lightweight CNN branch (PA-SSCNN), and a collaborative fusion layer. The ASST constructs joint representation tokens through local spectral smoothing and learnable spatial embedding. PA-SSCNN employs 3D-2D cascaded convolutions and channel–spatial attention mechanisms to enhance local texture and spatial feature extraction; CHIB enables deep interaction and synergistic fusion of dual-branch features across different levels and scales. Experimental results demonstrate that with only 2% labeled samples, PST-Net achieves overall classification accuracies of 96.31%, 96.59%, 95.27%, and 89.06% on the Salinas and Whuhh, and the two complex urban scene datasets Qingyun and Houston. Especially in fine-grained categories and complex scenes, it exhibits strong robustness. The ablation experiment further validated the effectiveness and complementarity of each module. This study provides an efficient collaborative modeling framework for hyperspectral image classification that balances global dependencies and local details. Full article
Show Figures

Figure 1

28 pages, 5420 KB  
Article
HEMS-RTDETR: A Lightweight Edge-Enhanced and Deformation-Aware Detector for Floating Debris in Complex Water Environments
by Yiwei Cui, Xinyi Jiang, Haiting Yu, Meizhen Lei and Jia Ren
Electronics 2026, 15(6), 1226; https://doi.org/10.3390/electronics15061226 - 15 Mar 2026
Abstract
Floating debris detection in complex aquatic environments holds significant importance for water resource protection and maritime safety monitoring. However, this task faces three core challenges: severe background interference leading to blurred target textures, significant non-rigid deformations, and the frequent loss of small targets [...] Read more.
Floating debris detection in complex aquatic environments holds significant importance for water resource protection and maritime safety monitoring. However, this task faces three core challenges: severe background interference leading to blurred target textures, significant non-rigid deformations, and the frequent loss of small targets at long distances. To address these issues, we propose a high-performance lightweight detection algorithm, termed High-Efficiency Edge-Aware Multi-Scale Real-Time Detection Transformer (HEMS-RTDETR), built upon the Real-Time Detection Transformer (RT-DETR) architecture. First, to suppress disturbances induced by water surface ripples and specular reflections, a Cross-Stage Partial Multi-Scale Edge Information Enhancement (CSP-MSEIE) module is introduced to reconstruct the backbone network. By removing computational redundancy while incorporating explicit edge enhancement, feature extraction capability and noise robustness for weak-texture targets are significantly improved. Second, to handle irregular debris morphology, a Deformable Attention Transformer (DAT) module is integrated, enabling adaptive attention focusing on geometrically deformed regions. Finally, an Efficient Multi-Scale Bidirectional Feature Pyramid Network (EMBSFPN) is constructed to enhance cross-scale semantic interaction and alleviate small-target signal loss. Experimental results demonstrate that, compared with RTDETR-r18, HEMS-RTDETR reduces parameters to 12.57 M, improves mAP@0.5 and mAP@0.5:0.95 by 2.44% and 3.05%, respectively, and maintains real-time inference at 93 FPS, indicating strong robustness and application potential in dynamic aquatic environments. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)
Show Figures

Figure 1

30 pages, 3316 KB  
Article
A Novel Hybrid CNN-ViT-Based Bi-Directional Cross-Guidance Fusion-Driven Breast Cancer Detection Model
by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif
Life 2026, 16(3), 474; https://doi.org/10.3390/life16030474 - 14 Mar 2026
Abstract
Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at [...] Read more.
Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at capturing localized textures, whereas Vision Transformers (ViTs) capture long-range dependencies; however, both often struggle to produce a unified representation that consistently supports diagnostic decision-making. To address these limitations, this study presents a dual-stream framework integrating ConvNeXt for high-fidelity local feature extraction with Swin Transformer V2 for hierarchical global context modeling. A Bi-Directional Cross-Guidance (BDCG) mechanism is added to harmonize interactions between the two feature domains and ensure mutual information learning in the representations. Furthermore, a Prototype-Anchored Similarity Head (PASH) is used to stabilize classification using distance-based reasoning instead of using linear separation. Comprehensive experiments show the effectiveness of the proposed method using two benchmark datasets. On Dataset 1, the model achieves accuracy: 98.8%, precision: 98.7%, recall: 98.6%, and F1 score: 97.2%, outperforming existing models based on CNN, ViTs, and hybrid architectures, and provides a lower inference time (8.3 ms/image). On the more heterogeneous Dataset 2, the model maintains strong performance, with an accuracy of 97.0%, precision of 95.4%, recall of 94.8%, and F1-score of 95.1%, demonstrating its resilience to domain shift and imaging variability. These results underscore the value of structural multi-scale feature interaction and prototype-driven classification for robust mammographic analysis. The consistent performance across internal and external evaluations indicates the potential for the proposed framework to be reliably applied in computer-aided screening systems. Full article
Show Figures

Figure 1

16 pages, 7270 KB  
Article
Multi-Domain Fusion for UAV Image Super-Resolution Based on Tiny-Transformer
by Qiaoyue Man, Seok-Jeong Gee and Young-Im Cho
Drones 2026, 10(3), 204; https://doi.org/10.3390/drones10030204 - 14 Mar 2026
Abstract
Unmanned Aerial Vehicle imagery often suffers from severe spatial detail degradation due to sensor limitations and motion blur, hindering downstream vision tasks. To address this, we propose a lightweight super-resolution framework leveraging a Tiny-Transformer backbone enhanced by a multi-domain feature fusion strategy. Specifically, [...] Read more.
Unmanned Aerial Vehicle imagery often suffers from severe spatial detail degradation due to sensor limitations and motion blur, hindering downstream vision tasks. To address this, we propose a lightweight super-resolution framework leveraging a Tiny-Transformer backbone enhanced by a multi-domain feature fusion strategy. Specifically, we jointly model spatial structural semantics and frequency domain texture priors via a cross-domain fusion attention mechanism, enabling coordinated restoration of global consistency and local details. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches on standard benchmarks, achieving significant gains in Peak Signal-to-Noise Ratio and structural similarity index while maintaining low computational cost. Notably, the model exhibits superior robustness in reconstructing high-frequency textures common in aerial scenes. This work provides an efficient, deployable solution for enhancing visual fidelity in resource-constrained applications such as urban planning and precision agriculture. Full article
Show Figures

Figure 1

23 pages, 2994 KB  
Article
Texturally Modified Zirconia–Tungstophosphoric Acid Catalysts for Efficient Lignocellulosic Pyrolysis
by Jose L. Buitrago, Leticia Jésica Méndez, Mónica Laura Casella, Juan Antonio Cecilia, Enrique Rodríguez-Castellón, Ileana D. Lick and Luis R. Pizzio
Reactions 2026, 7(1), 21; https://doi.org/10.3390/reactions7010021 - 14 Mar 2026
Abstract
This work presents the synthesis, characterization, and application of zirconium oxide (ZrO2)-based catalysts, modified with macro (silica nanospheres, NSP-SiO2) and mesopore templates (Pluronic 123), impregnated with tungstophosphoric acid (TPA), in the catalytic pyrolysis of tomato agro-industrial residues. The NSP-SiO [...] Read more.
This work presents the synthesis, characterization, and application of zirconium oxide (ZrO2)-based catalysts, modified with macro (silica nanospheres, NSP-SiO2) and mesopore templates (Pluronic 123), impregnated with tungstophosphoric acid (TPA), in the catalytic pyrolysis of tomato agro-industrial residues. The NSP-SiO2 (SXX) and P123 (PYY) amount mainly influences the ZrO2SXXPYY-specific surface area (SBET) and average pore diameter (Dp). 31P MAS NMR and FT-IR characterization results show that TPA (H3PW12O40) was partially transformed into [P2W21O71]6− and [PW11O39]7− during the synthesis steps. The acidic properties of ZrO2SXXPYY samples containing 25 and 50 wt% of TPA (ZrO2SXXPYYT25 and ZrO2SXXPYYT50, respectively) are dependent on both the TPA content and the support nature. Bio-oil composition and product selectivity were strongly influenced by the textural and acid-based properties of the catalysts. Notably, non-catalytic pyrolysis favored pathways leading to C2 compounds, with a high content of acetic acid and hydroxyacetone. In contrast, the use of catalysts promoted the formation of higher molecular weight oxygenated compounds (C5–C6), specifically furans, aldehydes, and ketones. Full article
17 pages, 18019 KB  
Article
Knit-Edit: A Unified Multi-Task Editing Framework for Knitted Garments
by Zhiping Wu, Qiang Fu, Jing Li and Jiajun Liu
Electronics 2026, 15(6), 1208; https://doi.org/10.3390/electronics15061208 - 13 Mar 2026
Viewed by 51
Abstract
Generative Artificial Intelligence has shown immense potential in industrial design. However, applying Diffusion Transformers to precision manufacturing faces a critical bottleneck: the trade-off between flexible multi-task editing and high-fidelity texture preservation. Existing methods often suffer from “texture collapse” when merging multiple adapters, failing [...] Read more.
Generative Artificial Intelligence has shown immense potential in industrial design. However, applying Diffusion Transformers to precision manufacturing faces a critical bottleneck: the trade-off between flexible multi-task editing and high-fidelity texture preservation. Existing methods often suffer from “texture collapse” when merging multiple adapters, failing to maintain the intricate topological structures required for industrial standards. To address this, we present Knit-Edit, a unified framework for high-precision knitted garment editing. Our core contribution is EditLoRI, a novel task decoupling mechanism utilizing orthogonal Low-Rank Adaptation. By projecting task-specific gradients into orthogonal subspaces, EditLoRI enables the interference-free composition of multiple editing capabilities within a single lightweight model. Furthermore, we introduce a structure-preserving spatial guidance strategy using Bounding Boxes to resolve the localization ambiguity of text prompts. Validated on our constructed KnitEdit dataset, the proposed method significantly outperforms state-of-the-art baselines in controllability and structural fidelity, offering a robust solution for intelligent generative manufacturing. Full article
(This article belongs to the Section Artificial Intelligence)
Show Figures

Figure 1

14 pages, 17510 KB  
Article
Engineering Polymorphic Phase Boundary in Aerosol-Deposited Ba(ZrxTi1−x)O3 Thick Films for Large Transverse Piezoelectricity
by Jinlin Yang, Long Teng, Zhenwei Shen, Wenjia Zhang, Shuping Li, Hanfei Zhu, Hongbo Cheng and Yongguang Xiao
Nanomaterials 2026, 16(6), 352; https://doi.org/10.3390/nano16060352 - 13 Mar 2026
Viewed by 113
Abstract
Conventional deposition techniques hinder the integration of high-performance lead-free piezoelectric thick films on silicon substrates due to slow growth kinetics and complex processing. Herein, dense, crack–free Ba(ZrxTi1−x)O3 (BZT, x = 0–0.10) thick films (~2 μm) were fabricated [...] Read more.
Conventional deposition techniques hinder the integration of high-performance lead-free piezoelectric thick films on silicon substrates due to slow growth kinetics and complex processing. Herein, dense, crack–free Ba(ZrxTi1−x)O3 (BZT, x = 0–0.10) thick films (~2 μm) were fabricated via aerosol deposition (AD) followed by annealing, forming a nanocrystalline microstructure with an average grain size of ~78 nm. Compositional tuning showed optimal electromechanical performance at x = 0.03, attributed to the coexistence of tetragonal and orthorhombic phases near room temperature that reduce the phase transformation energy barrier. The optimized BZT films exhibit excellent electrical properties: saturation polarization of 31.3 μC/cm2, relative permittivity of 430, dielectric tunability figure of merit (FOM) of 155, and a large transverse piezoelectric coefficient |e31, f| of 1.01 C/m2—comparable to textured magnetron–sputtered BaTiO3 films but with higher deposition efficiency. This work provides a high-throughput route for fabricating piezoelectric thick films, highlighting the potential of compositionally engineered AD–processed BZT in lead-free MEMS applications. Full article
(This article belongs to the Special Issue Advances in Ferroelectric and Multiferroic Nanostructures)
Show Figures

Figure 1

30 pages, 10668 KB  
Article
MambaLIC: State-Space Models for Efficient Remote Sensing Image Compression
by Haobo Xiong, Kai Liu, Huachao Xiao, Chongyang Ding and Feiyang Wang
Remote Sens. 2026, 18(6), 881; https://doi.org/10.3390/rs18060881 - 12 Mar 2026
Viewed by 112
Abstract
Remote sensing (RS) images, characterized by their large size and rich texture, require algorithms capable of effectively integrating both global and local features for compression. However, existing Learned Image Compression (LIC) approaches face distinct bottlenecks. While Transformer-based architectures typically suffer from heavy computational [...] Read more.
Remote sensing (RS) images, characterized by their large size and rich texture, require algorithms capable of effectively integrating both global and local features for compression. However, existing Learned Image Compression (LIC) approaches face distinct bottlenecks. While Transformer-based architectures typically suffer from heavy computational loads, standard State Space Models (SSMs) often incur prohibitive memory costs when processing high-resolution inputs. To address these limitations, we propose MambaLIC, a novel RS image compression network that integrates the efficient long-range modeling of SSMs with the local modeling ability of CNNs. In this paper, we introduce an innovative Remote Sensing State Space Model (RS-SSM) module, which combines visual SSM with dynamic convolution for remote sensing image compression. This integration facilitates effective interaction between local and global information, thereby enhancing the performance of RS image compression. Furthermore, we propose an SSM attention-based (SSA-based) spatial-channel context model for better entropy modeling. Compared to Transformer-CNN mixed architectures, MambaLIC reduces computational complexity by 63.9% and achieves superior rate-distortion (RD) performance. Consequently, compared to the latest SS2D-based method MambaIC, MambaLIC achieves substantial efficiency gains, saving 78.8% in memory usage. Experimental results demonstrate that MambaLIC achieves state-of-the-art (SOTA) performance, outperforming VVC (VTM-17.0) by 14.22%, 18.48%, and 17.47% in BD-rate on UC-Merced, LoveDA, and xView datasets, respectively. Full article
Show Figures

Figure 1

30 pages, 5823 KB  
Article
Complex Weather Highway Aerial Vehicle Detection Network with Feature Enhancement and Grid-Based Feature Fusion
by Ningzhi Zeng and Jinzheng Lu
Appl. Sci. 2026, 16(6), 2710; https://doi.org/10.3390/app16062710 - 12 Mar 2026
Viewed by 65
Abstract
In highway aerial imagery, complex weather conditions such as rain, fog, snow, and low illumination often lead to severe appearance degradation and feature loss of vehicle targets, posing significant challenges for vehicle detection. Existing research faces two major challenges: first, the lack of [...] Read more.
In highway aerial imagery, complex weather conditions such as rain, fog, snow, and low illumination often lead to severe appearance degradation and feature loss of vehicle targets, posing significant challenges for vehicle detection. Existing research faces two major challenges: first, the lack of large-scale, high-quality annotated datasets tailored for complex weather scenarios; second, the difficulty traditional detectors encounter in effectively extracting feature information and performing multi-scale feature fusion under conditions of severe feature degradation and dense distribution of small objects. To address these issues, this paper investigates both data construction and algorithm design. Firstly, a Complex Weather Highway Vehicle Dataset (CWHVD) is established to provide a benchmark for related research. Secondly, a Feature-Enhanced Grid-Based Feature Fusion Complex-Weather Vehicle Detection Network (FGCV-Det) is proposed. A wavelet transform-based Feature Enhancement Module (FEWT) is introduced at the input stage to strengthen edge and texture representation. In the backbone, Adaptive Pinwheel Convolution (APConv) and a C3K2-HD module based on Hidden State Mixer-Based State Space Duality (HSM-SSD) are employed to enhance semantic modeling. Furthermore, a Complex Weather Grid Feature Pyramid Network (CWG-FPN) is designed to achieve weighted cross-scale fusion. The FGCV-Det significantly outperforms YOLO11s on CWHVD, achieving 63.4% precision, 48.6% recall, 51.7% mAP50, and 28.2% mAP50:95. It also generalizes well, reaching 47.1% and 49.6% mAP50 on VisDrone2019 and UAVDT, respectively, surpassing baseline and mainstream detectors, demonstrating strong robustness and generalization capability. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

16 pages, 22406 KB  
Article
Isotropic Reconstruction of Anisotropic vEM Volumes with ViT-Guided Diffusion
by Junchao Qiu, Guojia Wan, Zhengyun Zhou, Minghui Liao, Xiangdong Liu, Xinyuan Li and Bo Du
Electronics 2026, 15(6), 1181; https://doi.org/10.3390/electronics15061181 - 12 Mar 2026
Viewed by 135
Abstract
Volume electron microscopy (vEM) provides nanometer-scale 3D imaging, yet its axial (z) resolution is often much lower than the in-plane (xy) resolution, yielding anisotropic volumes that hinder segmentation and connectomic reconstruction. We present a two-stage cross-axial super-resolution framework [...] Read more.
Volume electron microscopy (vEM) provides nanometer-scale 3D imaging, yet its axial (z) resolution is often much lower than the in-plane (xy) resolution, yielding anisotropic volumes that hinder segmentation and connectomic reconstruction. We present a two-stage cross-axial super-resolution framework for isotropic reconstruction that combines a conditional diffusion model and domain-specific self-supervised pretraining of a vision transformer (ViT). First, the student–teacher self-distillation paradigm of DINOv3 is adopted to learn representations from large sets of high-resolution xy sections, capturing vEM-specific texture statistics and ultrastructural patterns. Second, a conditional diffusion denoiser is trained with supervised anisotropic degradation simulated by z-downsampling, while a perceptual loss based on frozen ViT feature distances constrains generated slices to match real-section distributions. These constraints recover axial high-frequency details and reduce hallucinated textures and inter-slice drift, improving cross-slice consistency. Experiments on two public vEM datasets show improved fidelity, perceptual quality, and membrane-boundary continuity over interpolation and learning-based baselines. Full article
Show Figures

Figure 1

21 pages, 23671 KB  
Article
Zero-Shot Polarization-Intensity Physical Fusion Monocular Depth Estimation for High Dynamic Range Scenes
by Renhao Rao, Zhizhao Ouyang, Shuang Chen, Liang Chen, Guoqin Huang and Changcai Cui
Photonics 2026, 13(3), 268; https://doi.org/10.3390/photonics13030268 - 11 Mar 2026
Viewed by 161
Abstract
Monocular 3D reconstruction remains a persistent challenge for autonomous driving systems in Degraded Visual Environments (DVEs) with extreme glare and low illumination, such as highway tunnels, due to the lack of reliable texture cues. This paper proposes a physics-aware deep learning framework that [...] Read more.
Monocular 3D reconstruction remains a persistent challenge for autonomous driving systems in Degraded Visual Environments (DVEs) with extreme glare and low illumination, such as highway tunnels, due to the lack of reliable texture cues. This paper proposes a physics-aware deep learning framework that overcomes these limitations by fusing polarization sensing with conventional intensity imaging. Unlike traditional end-to-end data-driven fusion strategies, we propose a Modality-Aligned Parameter Injectionstrategy. By remapping the weight space of the input layer, this strategy achieves a smooth transfer of the pre-trained Vision Transformer (i.e., MiDaS) to multi-modal inputs. Its core advantage lies in the seamless integration of four-channel polarization geometric information while fully preserving the pre-trained semantic representation capabilities of the backbone network, thereby avoiding the overfitting risk associated with training from scratch on small-sample data. Furthermore, we design a Reliability-Aware Gating mechanism that dynamically re-weights appearance and geometric cues based on intensity saturation and the physical validity of polarization signals as measured by the Degree of Linear Polarization (DoLP). We validate the proposed method on our self-constructed POLAR-GLV benchmark, a real-world dataset collected specifically for high dynamic range tunnel scenarios. Extensive experiments demonstrate that our method consistently outperforms intensity-only baselines, reducing geometric reconstruction error by 24.2% in high-glare tunnel exit zones and 10.0% at tunnel entrances. Crucially, compared to multi-stream fusion architectures, these performance gains come with negligible additional computational cost, making the framework highly suitable for resource-constrained onboard inference environments. Full article
Show Figures

Figure 1

Back to TopTop