Saved Queries

On university campuses, trees and green spaces shape how students and staff move and use outdoor spaces. To support planning, tree species information is needed at the level of individual trees. Tree species classification from UAV RGB imagery remains difficult in complex campus scenes because roads, buildings, shadows and subtle inter species differences degrade recognition. To address background interference, the loss of subtle fine-grained cues before tokenization, and insufficient local structure modeling in lightweight transformer-based classification, we propose TreeSpecViT for tree species classification. It uses a MobileViT backbone and a Background Suppression Module (BSM) to reduce clutter from non-canopy regions. A Fine-Grained Feature Guidance (FGF) module is inserted before the unfold operation to enhance canopy details and guide tokenization toward key regions.

1 \times 1

convolutional neck layers align channels, and a Global and Local Fusion (GLF) module jointly models overall crown semantics and local textures for species recognition. From the predicted masks and species labels, we build an individual tree digital archive. The archive stores per tree geometric attributes and can be linked with grids of campus activity intensity to analyze how activity patterns relate to vegetation structure. TreeSpecViT achieves an Accuracy of 87.88% (+6.06%) and an F1 score of 76.48% (+5.08%) on the SZUTreeDataset. On our self constructed NJFUDataset, it reaches 76.30% (+5.10%) in Accuracy and 70.10% (+7.20%) in F1. These results surpass mainstream models. Ablation experiments show that the modules jointly reduce background clutter and enhance canopy features. Overall, TreeSpecViT supports campus scale analyses that link human activity intensity to vegetation patterns and provides a practical basis for planning and adjusting campus green spaces. Full article

19 pages, 7323 KB

Open AccessArticle

Mathematical Benchmarking of Convolutional Neural Networks for Thai Dialect Recognition: A Spectrogram Texture Classification Approach

by Porawat Visutsak, Duongduen Ongrungruaeng, Surapong Wiriya and Keun Ho Ryu

Electronics 2026, 15(6), 1271; https://doi.org/10.3390/electronics15061271 - 18 Mar 2026

Abstract

This study rigorously evaluates 13 Convolutional Neural Network (CNN) architectures for Thai dialect recognition. By treating Automatic Speech Recognition (ASR) as a computer vision texture classification task, we processed an extensive 840-h dataset from the Spoken Language Systems, Chulalongkorn University (SLSCU) corpus. Raw audio from four major dialects—Central, Northern (Khummuang), Northeastern (Korat), and Southern (Pat-tani)—was transformed into 2D Mel-spectrograms using the Short-Time Fourier Transform (STFT). We analyzed a diverse range of architectures, including the VGG, Inception, ResNet, DenseNet, and MobileNet families, to establish the optimal trade-off between mathematical complexity and spectral feature extraction. Our experimental results identify NASNet-Mobile as the most effective model, achieving a macro-average F1-score of 0.9425. The analysis suggests that NASNet’s search-optimized cell structure is uniquely capable of capturing the multiscale texture of phonetic formants. In contrast, we observed a catastrophic mode collapse in VGG16 (32.97% accuracy), likely due to excessive parameter bloat, while Xception and MobileNetV2 maintained robust generalization. Confusion matrix analysis reveals high acoustic distinctiveness for Southern Thai (96.7% recall), whereas Northern Thai exhibits significant spectral overlap with Central Thai. These results support the hypothesis that CNNs interpret spectrograms as textures rather than discrete objects, positioning NASNet-Mobile as a high-performance, low-latency baseline for edge-device deployment in resource-constrained environments. Full article

(This article belongs to the Special Issue Advances in Machine Learning for Image Classification)

►▼ Show Figures

Figure 1

14 pages, 3141 KB

Open AccessArticle

Enhanced Real-Time Detector for Industrial Vision-Based Corn Impurity Detection

by Xiao Zhang, Yuhang Bian, Xiangdong Li, Haoze Yu, Dong Li and Min Wu

Foods 2026, 15(6), 1065; https://doi.org/10.3390/foods15061065 - 18 Mar 2026

Abstract

The effective cleaning of corn prior to storage is crucial for ensuring grain quality and safety. Traditional Convolutional Neural Network (CNN)-based detection methods often struggle to maintain accuracy in scenarios with dense occlusions. Furthermore, limitations in image quality and feature representation hinder their generalization to diverse impurity types. To address these challenges, this paper proposes an enhanced real-time detector transformer model named RT-DETR-CD (Real-Time Detector Transformer with Convolution and Dynamic Upsampling) for corn impurity detection based on industrial vision. This approach integrates Receptive Field Attention Convolutions (RFAConv) to enhance sensitivity to local texture details and employs the dynamic upsampling operator DySample to restore high-frequency edge information. Additionally, a novel Inner-Shape-IoU loss function is introduced to accelerate bounding box regression for objects with varying aspect ratios. Images were captured using FLIR industrial cameras under controllable annular LED illumination. Experiments on a self-built dataset demonstrate that the proposed model achieves a 4.7% improvement in mean average precision (mAP) and operates at 68 frames per second (FPS), outperforming the original RT-DETR model in both accuracy and speed. This work provides a practical solution for real-time, high-precision impurity detection on grain processing lines. Full article

(This article belongs to the Section Food Analytical Methods)

►▼ Show Figures

Figure 1

23 pages, 13050 KB

Open AccessArticle

BAWSeg: A UAV Multispectral Benchmark for Barley Weed Segmentation

by Haitian Wang, Xinyu Wang, Muhammad Ibrahim, Dustin Severtson and Ajmal Mian

Remote Sens. 2026, 18(6), 915; https://doi.org/10.3390/rs18060915 - 17 Mar 2026

Abstract

Accurate weed mapping in cereal fields requires pixel-level segmentation from unmanned aerial vehicle (UAV) imagery that remains reliable across fields, seasons, and illumination. Existing multispectral pipelines often depend on thresholded vegetation indices, which are brittle under radiometric drift and mixed crop–weed pixels, or on single-stream convolutional neural network (CNN) and Transformer backbones that ingest stacked bands and indices, where radiance cues and normalized index cues interfere and reduce sensitivity to small weed clusters embedded in crop canopy. We propose VISA (Vegetation Index and Spectral Attention), a two-stream segmentation network that decouples these cues and fuses them at native resolution. The radiance stream learns from calibrated five-band reflectance using local residual convolutions, channel recalibration, spatial gating, and skip-connected decoding, which preserve fine textures, row boundaries, and small weed structures that are often weakened after ratio-based index compression. The index stream operates on vegetation-index maps with windowed self-attention to model local structure efficiently, state-space layers to propagate field-scale context without quadratic attention cost, and Slot Attention to form stable region descriptors that improve discrimination of sparse weeds under canopy mixing. To support supervised training and deployment-oriented evaluation, we introduce BAWSeg, a four-year UAV multispectral dataset collected over commercial barley paddocks in Western Australia, providing radiometrically calibrated blue, green, red, red edge, and near-infrared orthomosaics, derived vegetation indices, and dense crop, weed, and other labels with leakage-free block splits. On BAWSeg, VISA achieves 75.6% mean Intersection over Union (mIoU) and 63.5% weed Intersection over Union (IoU) with 22.8 M parameters, outperforming a multispectral SegFormer-B1 baseline by 1.2 mIoU and 1.9 weed IoU. Under cross-plot and cross-year protocols, VISA maintains 71.2% and 69.2% mIoU, respectively. The full BAWSeg benchmark dataset, VISA code, trained model weights, and protocol files will be released upon publication. Full article

(This article belongs to the Special Issue Intelligent UAV Remote Sensing for Next-Generation Precision Agriculture)

►▼ Show Figures

Figure 1

28 pages, 12746 KB

Open AccessArticle

PSTNet: A Hyperspectral Image Classification Method Based on Adaptive Spectral–Spatial Tokens and Parallel Attention

by Shaokang Yu, Yong Mei, Xiangsuo Fan, Song Guo, Wujun Xu and Jinlong Fan

Remote Sens. 2026, 18(6), 901; https://doi.org/10.3390/rs18060901 - 15 Mar 2026

Abstract

Hyperspectral image classification holds significant applications across multiple domains due to its rich spectral and spatial information. However, it faces challenges such as spectral variation within the same object, spectral variation across different objects, and noise interference. Existing methods like convolutional neural networks perform well in local feature extraction but inadequately model long-range dependencies. While Transformers can capture global relationships, they struggle to effectively coordinate spectral and spatial information modeling. To address these limitations, this paper proposes a dual-branch collaborative Transformer network (PST-Net). This architecture integrates an adaptive spectral–spatial token (ASST) module, a Parallel Attention-Augmented lightweight CNN branch (PA-SSCNN), and a collaborative fusion layer. The ASST constructs joint representation tokens through local spectral smoothing and learnable spatial embedding. PA-SSCNN employs 3D-2D cascaded convolutions and channel–spatial attention mechanisms to enhance local texture and spatial feature extraction; CHIB enables deep interaction and synergistic fusion of dual-branch features across different levels and scales. Experimental results demonstrate that with only 2% labeled samples, PST-Net achieves overall classification accuracies of 96.31%, 96.59%, 95.27%, and 89.06% on the Salinas and Whuhh, and the two complex urban scene datasets Qingyun and Houston. Especially in fine-grained categories and complex scenes, it exhibits strong robustness. The ablation experiment further validated the effectiveness and complementarity of each module. This study provides an efficient collaborative modeling framework for hyperspectral image classification that balances global dependencies and local details. Full article

(This article belongs to the Special Issue Multi-Task Remote Sensing Image Analysis: Classification, Segmentation, and Change Detection)

►▼ Show Figures

Figure 1

28 pages, 5420 KB

Open AccessArticle

HEMS-RTDETR: A Lightweight Edge-Enhanced and Deformation-Aware Detector for Floating Debris in Complex Water Environments

by Yiwei Cui, Xinyi Jiang, Haiting Yu, Meizhen Lei and Jia Ren

Electronics 2026, 15(6), 1226; https://doi.org/10.3390/electronics15061226 - 15 Mar 2026

Abstract

Floating debris detection in complex aquatic environments holds significant importance for water resource protection and maritime safety monitoring. However, this task faces three core challenges: severe background interference leading to blurred target textures, significant non-rigid deformations, and the frequent loss of small targets at long distances. To address these issues, we propose a high-performance lightweight detection algorithm, termed High-Efficiency Edge-Aware Multi-Scale Real-Time Detection Transformer (HEMS-RTDETR), built upon the Real-Time Detection Transformer (RT-DETR) architecture. First, to suppress disturbances induced by water surface ripples and specular reflections, a Cross-Stage Partial Multi-Scale Edge Information Enhancement (CSP-MSEIE) module is introduced to reconstruct the backbone network. By removing computational redundancy while incorporating explicit edge enhancement, feature extraction capability and noise robustness for weak-texture targets are significantly improved. Second, to handle irregular debris morphology, a Deformable Attention Transformer (DAT) module is integrated, enabling adaptive attention focusing on geometrically deformed regions. Finally, an Efficient Multi-Scale Bidirectional Feature Pyramid Network (EMBSFPN) is constructed to enhance cross-scale semantic interaction and alleviate small-target signal loss. Experimental results demonstrate that, compared with RTDETR-r18, HEMS-RTDETR reduces parameters to 12.57 M, improves mAP@0.5 and mAP@0.5:0.95 by 2.44% and 3.05%, respectively, and maintains real-time inference at 93 FPS, indicating strong robustness and application potential in dynamic aquatic environments. Full article

(This article belongs to the Topic Computer Vision and Image Processing, 3rd Edition)

►▼ Show Figures

Figure 1

30 pages, 3316 KB

Open AccessArticle

A Novel Hybrid CNN-ViT-Based Bi-Directional Cross-Guidance Fusion-Driven Breast Cancer Detection Model

by Abdul Rahaman Wahab Sait and Yazeed Alkhurayyif

Life 2026, 16(3), 474; https://doi.org/10.3390/life16030474 - 14 Mar 2026

Abstract

Accurate and early identification of breast cancer from mammography is key to reducing breast cancer mortality, and automated analysis is challenging due to subtle lesion appearances, heterogeneous breast density, and the variance caused by modality. Standard Convolutional Neural Networks (CNNs) are excellent at capturing localized textures, whereas Vision Transformers (ViTs) capture long-range dependencies; however, both often struggle to produce a unified representation that consistently supports diagnostic decision-making. To address these limitations, this study presents a dual-stream framework integrating ConvNeXt for high-fidelity local feature extraction with Swin Transformer V2 for hierarchical global context modeling. A Bi-Directional Cross-Guidance (BDCG) mechanism is added to harmonize interactions between the two feature domains and ensure mutual information learning in the representations. Furthermore, a Prototype-Anchored Similarity Head (PASH) is used to stabilize classification using distance-based reasoning instead of using linear separation. Comprehensive experiments show the effectiveness of the proposed method using two benchmark datasets. On Dataset 1, the model achieves accuracy: 98.8%, precision: 98.7%, recall: 98.6%, and F1 score: 97.2%, outperforming existing models based on CNN, ViTs, and hybrid architectures, and provides a lower inference time (8.3 ms/image). On the more heterogeneous Dataset 2, the model maintains strong performance, with an accuracy of 97.0%, precision of 95.4%, recall of 94.8%, and F1-score of 95.1%, demonstrating its resilience to domain shift and imaging variability. These results underscore the value of structural multi-scale feature interaction and prototype-driven classification for robust mammographic analysis. The consistent performance across internal and external evaluations indicates the potential for the proposed framework to be reliably applied in computer-aided screening systems. Full article

(This article belongs to the Special Issue Advances in Imaging for Female Patients: Breast and Gynecological Diagnostics)

►▼ Show Figures

Figure 1

16 pages, 7270 KB

Open AccessArticle

Multi-Domain Fusion for UAV Image Super-Resolution Based on Tiny-Transformer

by Qiaoyue Man, Seok-Jeong Gee and Young-Im Cho

Drones 2026, 10(3), 204; https://doi.org/10.3390/drones10030204 - 14 Mar 2026

Abstract

Unmanned Aerial Vehicle imagery often suffers from severe spatial detail degradation due to sensor limitations and motion blur, hindering downstream vision tasks. To address this, we propose a lightweight super-resolution framework leveraging a Tiny-Transformer backbone enhanced by a multi-domain feature fusion strategy. Specifically, we jointly model spatial structural semantics and frequency domain texture priors via a cross-domain fusion attention mechanism, enabling coordinated restoration of global consistency and local details. Extensive experiments demonstrate that our method outperforms state-of-the-art approaches on standard benchmarks, achieving significant gains in Peak Signal-to-Noise Ratio and structural similarity index while maintaining low computational cost. Notably, the model exhibits superior robustness in reconstructing high-frequency textures common in aerial scenes. This work provides an efficient, deployable solution for enhancing visual fidelity in resource-constrained applications such as urban planning and precision agriculture. Full article

►▼ Show Figures

Figure 1

23 pages, 2994 KB

Open AccessArticle

Texturally Modified Zirconia–Tungstophosphoric Acid Catalysts for Efficient Lignocellulosic Pyrolysis

by Jose L. Buitrago, Leticia Jésica Méndez, Mónica Laura Casella, Juan Antonio Cecilia, Enrique Rodríguez-Castellón, Ileana D. Lick and Luis R. Pizzio

Reactions 2026, 7(1), 21; https://doi.org/10.3390/reactions7010021 - 14 Mar 2026

Abstract

This work presents the synthesis, characterization, and application of zirconium oxide (ZrO₂)-based catalysts, modified with macro (silica nanospheres, NSP-SiO₂) and mesopore templates (Pluronic 123), impregnated with tungstophosphoric acid (TPA), in the catalytic pyrolysis of tomato agro-industrial residues. The NSP-SiO₂ (SXX) and P123 (PYY) amount mainly influences the ZrO₂SXXPYY-specific surface area (S_BET) and average pore diameter (D_p). ³¹P MAS NMR and FT-IR characterization results show that TPA (H₃PW₁₂O₄₀) was partially transformed into [P₂W₂₁O₇₁]⁶⁻ and [PW₁₁O₃₉]⁷⁻ during the synthesis steps. The acidic properties of ZrO₂SXXPYY samples containing 25 and 50 wt% of TPA (ZrO₂SXXPYYT25 and ZrO₂SXXPYYT50, respectively) are dependent on both the TPA content and the support nature. Bio-oil composition and product selectivity were strongly influenced by the textural and acid-based properties of the catalysts. Notably, non-catalytic pyrolysis favored pathways leading to C2 compounds, with a high content of acetic acid and hydroxyacetone. In contrast, the use of catalysts promoted the formation of higher molecular weight oxygenated compounds (C5–C6), specifically furans, aldehydes, and ketones. Full article

(This article belongs to the Special Issue Waste Biorefinery Technologies for Accelerating Sustainable Energy Processes, 2nd Edition)

17 pages, 18019 KB

Open AccessArticle

Knit-Edit: A Unified Multi-Task Editing Framework for Knitted Garments

by Zhiping Wu, Qiang Fu, Jing Li and Jiajun Liu

Electronics 2026, 15(6), 1208; https://doi.org/10.3390/electronics15061208 - 13 Mar 2026

Viewed by 51

Abstract

Generative Artificial Intelligence has shown immense potential in industrial design. However, applying Diffusion Transformers to precision manufacturing faces a critical bottleneck: the trade-off between flexible multi-task editing and high-fidelity texture preservation. Existing methods often suffer from “texture collapse” when merging multiple adapters, failing to maintain the intricate topological structures required for industrial standards. To address this, we present Knit-Edit, a unified framework for high-precision knitted garment editing. Our core contribution is EditLoRI, a novel task decoupling mechanism utilizing orthogonal Low-Rank Adaptation. By projecting task-specific gradients into orthogonal subspaces, EditLoRI enables the interference-free composition of multiple editing capabilities within a single lightweight model. Furthermore, we introduce a structure-preserving spatial guidance strategy using Bounding Boxes to resolve the localization ambiguity of text prompts. Validated on our constructed KnitEdit dataset, the proposed method significantly outperforms state-of-the-art baselines in controllability and structural fidelity, offering a robust solution for intelligent generative manufacturing. Full article

(This article belongs to the Section Artificial Intelligence)

►▼ Show Figures

Figure 1

14 pages, 17510 KB

Open AccessArticle

Engineering Polymorphic Phase Boundary in Aerosol-Deposited Ba(Zr_xTi₁₋_x)O₃ Thick Films for Large Transverse Piezoelectricity

by Jinlin Yang, Long Teng, Zhenwei Shen, Wenjia Zhang, Shuping Li, Hanfei Zhu, Hongbo Cheng and Yongguang Xiao

Nanomaterials 2026, 16(6), 352; https://doi.org/10.3390/nano16060352 - 13 Mar 2026

Viewed by 113

Abstract

Conventional deposition techniques hinder the integration of high-performance lead-free piezoelectric thick films on silicon substrates due to slow growth kinetics and complex processing. Herein, dense, crack–free Ba(Zr_xTi₁₋_x)O₃ (BZT, x = 0–0.10) thick films (~2 μm) were fabricated via aerosol deposition (AD) followed by annealing, forming a nanocrystalline microstructure with an average grain size of ~78 nm. Compositional tuning showed optimal electromechanical performance at x = 0.03, attributed to the coexistence of tetragonal and orthorhombic phases near room temperature that reduce the phase transformation energy barrier. The optimized BZT films exhibit excellent electrical properties: saturation polarization of 31.3 μC/cm², relative permittivity of 430, dielectric tunability figure of merit (FOM) of 155, and a large transverse piezoelectric coefficient |e_31, _f| of 1.01 C/m²—comparable to textured magnetron–sputtered BaTiO₃ films but with higher deposition efficiency. This work provides a high-throughput route for fabricating piezoelectric thick films, highlighting the potential of compositionally engineered AD–processed BZT in lead-free MEMS applications. Full article

(This article belongs to the Special Issue Advances in Ferroelectric and Multiferroic Nanostructures)

►▼ Show Figures

Figure 1

30 pages, 10668 KB

Open AccessArticle

MambaLIC: State-Space Models for Efficient Remote Sensing Image Compression

by Haobo Xiong, Kai Liu, Huachao Xiao, Chongyang Ding and Feiyang Wang

Remote Sens. 2026, 18(6), 881; https://doi.org/10.3390/rs18060881 - 12 Mar 2026

Viewed by 112

Abstract

Remote sensing (RS) images, characterized by their large size and rich texture, require algorithms capable of effectively integrating both global and local features for compression. However, existing Learned Image Compression (LIC) approaches face distinct bottlenecks. While Transformer-based architectures typically suffer from heavy computational loads, standard State Space Models (SSMs) often incur prohibitive memory costs when processing high-resolution inputs. To address these limitations, we propose MambaLIC, a novel RS image compression network that integrates the efficient long-range modeling of SSMs with the local modeling ability of CNNs. In this paper, we introduce an innovative Remote Sensing State Space Model (RS-SSM) module, which combines visual SSM with dynamic convolution for remote sensing image compression. This integration facilitates effective interaction between local and global information, thereby enhancing the performance of RS image compression. Furthermore, we propose an SSM attention-based (SSA-based) spatial-channel context model for better entropy modeling. Compared to Transformer-CNN mixed architectures, MambaLIC reduces computational complexity by 63.9% and achieves superior rate-distortion (RD) performance. Consequently, compared to the latest SS2D-based method MambaIC, MambaLIC achieves substantial efficiency gains, saving 78.8% in memory usage. Experimental results demonstrate that MambaLIC achieves state-of-the-art (SOTA) performance, outperforming VVC (VTM-17.0) by 14.22%, 18.48%, and 17.47% in BD-rate on UC-Merced, LoveDA, and xView datasets, respectively. Full article

(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)

►▼ Show Figures

Figure 1

30 pages, 5823 KB

Open AccessArticle

Complex Weather Highway Aerial Vehicle Detection Network with Feature Enhancement and Grid-Based Feature Fusion

by Ningzhi Zeng and Jinzheng Lu

Appl. Sci. 2026, 16(6), 2710; https://doi.org/10.3390/app16062710 - 12 Mar 2026

Viewed by 65

Abstract

In highway aerial imagery, complex weather conditions such as rain, fog, snow, and low illumination often lead to severe appearance degradation and feature loss of vehicle targets, posing significant challenges for vehicle detection. Existing research faces two major challenges: first, the lack of large-scale, high-quality annotated datasets tailored for complex weather scenarios; second, the difficulty traditional detectors encounter in effectively extracting feature information and performing multi-scale feature fusion under conditions of severe feature degradation and dense distribution of small objects. To address these issues, this paper investigates both data construction and algorithm design. Firstly, a Complex Weather Highway Vehicle Dataset (CWHVD) is established to provide a benchmark for related research. Secondly, a Feature-Enhanced Grid-Based Feature Fusion Complex-Weather Vehicle Detection Network (FGCV-Det) is proposed. A wavelet transform-based Feature Enhancement Module (FEWT) is introduced at the input stage to strengthen edge and texture representation. In the backbone, Adaptive Pinwheel Convolution (APConv) and a C3K2-HD module based on Hidden State Mixer-Based State Space Duality (HSM-SSD) are employed to enhance semantic modeling. Furthermore, a Complex Weather Grid Feature Pyramid Network (CWG-FPN) is designed to achieve weighted cross-scale fusion. The FGCV-Det significantly outperforms YOLO11s on CWHVD, achieving 63.4% precision, 48.6% recall, 51.7% mAP50, and 28.2% mAP50:95. It also generalizes well, reaching 47.1% and 49.6% mAP50 on VisDrone2019 and UAVDT, respectively, surpassing baseline and mainstream detectors, demonstrating strong robustness and generalization capability. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

►▼ Show Figures

Figure 1

16 pages, 22406 KB

Open AccessArticle

Isotropic Reconstruction of Anisotropic vEM Volumes with ViT-Guided Diffusion

by Junchao Qiu, Guojia Wan, Zhengyun Zhou, Minghui Liao, Xiangdong Liu, Xinyuan Li and Bo Du

Electronics 2026, 15(6), 1181; https://doi.org/10.3390/electronics15061181 - 12 Mar 2026

Viewed by 135

Abstract

Volume electron microscopy (vEM) provides nanometer-scale 3D imaging, yet its axial (z) resolution is often much lower than the in-plane (

x y

) resolution, yielding anisotropic volumes that hinder segmentation and connectomic reconstruction. We present a two-stage cross-axial super-resolution framework [...] Read more.

Volume electron microscopy (vEM) provides nanometer-scale 3D imaging, yet its axial (z) resolution is often much lower than the in-plane (

x y

) resolution, yielding anisotropic volumes that hinder segmentation and connectomic reconstruction. We present a two-stage cross-axial super-resolution framework for isotropic reconstruction that combines a conditional diffusion model and domain-specific self-supervised pretraining of a vision transformer (ViT). First, the student–teacher self-distillation paradigm of DINOv3 is adopted to learn representations from large sets of high-resolution

x y

sections, capturing vEM-specific texture statistics and ultrastructural patterns. Second, a conditional diffusion denoiser is trained with supervised anisotropic degradation simulated by z-downsampling, while a perceptual loss based on frozen ViT feature distances constrains generated slices to match real-section distributions. These constraints recover axial high-frequency details and reduce hallucinated textures and inter-slice drift, improving cross-slice consistency. Experiments on two public vEM datasets show improved fidelity, perceptual quality, and membrane-boundary continuity over interpolation and learning-based baselines. Full article

(This article belongs to the Topic Theoretical Foundations and Applications of Deep Learning Techniques)

►▼ Show Figures

Figure 1

21 pages, 23671 KB

Open AccessArticle

Zero-Shot Polarization-Intensity Physical Fusion Monocular Depth Estimation for High Dynamic Range Scenes

by Renhao Rao, Zhizhao Ouyang, Shuang Chen, Liang Chen, Guoqin Huang and Changcai Cui

Photonics 2026, 13(3), 268; https://doi.org/10.3390/photonics13030268 - 11 Mar 2026

Viewed by 161

Abstract

Monocular 3D reconstruction remains a persistent challenge for autonomous driving systems in Degraded Visual Environments (DVEs) with extreme glare and low illumination, such as highway tunnels, due to the lack of reliable texture cues. This paper proposes a physics-aware deep learning framework that overcomes these limitations by fusing polarization sensing with conventional intensity imaging. Unlike traditional end-to-end data-driven fusion strategies, we propose a Modality-Aligned Parameter Injectionstrategy. By remapping the weight space of the input layer, this strategy achieves a smooth transfer of the pre-trained Vision Transformer (i.e., MiDaS) to multi-modal inputs. Its core advantage lies in the seamless integration of four-channel polarization geometric information while fully preserving the pre-trained semantic representation capabilities of the backbone network, thereby avoiding the overfitting risk associated with training from scratch on small-sample data. Furthermore, we design a Reliability-Aware Gating mechanism that dynamically re-weights appearance and geometric cues based on intensity saturation and the physical validity of polarization signals as measured by the Degree of Linear Polarization (DoLP). We validate the proposed method on our self-constructed POLAR-GLV benchmark, a real-world dataset collected specifically for high dynamic range tunnel scenarios. Extensive experiments demonstrate that our method consistently outperforms intensity-only baselines, reducing geometric reconstruction error by 24.2% in high-glare tunnel exit zones and 10.0% at tunnel entrances. Crucially, compared to multi-stream fusion architectures, these performance gains come with negligible additional computational cost, making the framework highly suitable for resource-constrained onboard inference environments. Full article

(This article belongs to the Special Issue AI for Photonics: Intelligent Imaging, Learning-Driven Optics, and Photonic Computing)

►▼ Show Figures

Figure 1

Show export options Show export options

Select all

Export citation of selected articles as:

Error

Oops... you haven't selected anything for export.

Displaying article 1-50 on page 1 of 33.

Go to page 1 2 3 4 5

Search Results (1,619)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI