MDPI - Publisher of Open Access Journals

19 pages, 1948 KB

Open AccessArticle

Graph-MambaRoadDet: A Symmetry-Aware Dynamic Graph Framework for Road Damage Detection

by Zichun Tian, Xiaokang Shao and Yuqi Bai

Symmetry 2025, 17(10), 1654; https://doi.org/10.3390/sym17101654 - 5 Oct 2025

Viewed by 251

Road-surface distress poses a serious threat to traffic safety and imposes a growing burden on urban maintenance budgets. While modern detectors based on convolutional networks and Vision Transformers achieve strong frame-level performance, they often overlook an essential property of road environments—structural symmetry [...] Read more.

Road-surface distress poses a serious threat to traffic safety and imposes a growing burden on urban maintenance budgets. While modern detectors based on convolutional networks and Vision Transformers achieve strong frame-level performance, they often overlook an essential property of road environments—structural symmetry within road networks and damage patterns. We present Graph-MambaRoadDet (GMRD), a symmetry-aware and lightweight framework that integrates dynamic graph reasoning with state–space modeling for accurate, topology-informed, and real-time road damage detection. Specifically, GMRD employs an EfficientViM-T1 backbone and two DefMamba blocks, whose deformable scanning paths capture sub-pixel crack patterns while preserving geometric symmetry. A superpixel-based graph is constructed by projecting image regions onto OpenStreetMap road segments, encoding both spatial structure and symmetric topological layout. We introduce a Graph-Generating State–Space Model (GG-SSM) that synthesizes sparse sample-specific adjacency in

O (M)

time, further refined by a fusion module that combines detector self-attention with prior symmetry constraints. A consistency loss promotes smooth predictions across symmetric or adjacent segments. The full INT8 model contains only 1.8 M parameters and 1.5 GFLOPs, sustaining 45 FPS at 7 W on a Jetson Orin Nano—eight times lighter and 1.7× faster than YOLOv8-s. On RDD2022, TD-RD, and RoadBench-100K, GMRD surpasses strong baselines by up to +6.1 mAP_50:95 and, on the new RoadGraph-RDD benchmark, achieves +5.3 G-mAP and +0.05 consistency gain. Qualitative results demonstrate robustness under shadows, reflections, back-lighting, and occlusion. By explicitly modeling spatial and topological symmetry, GMRD offers a principled solution for city-scale road infrastructure monitoring under real-time and edge-computing constraints. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

27 pages, 3948 KB

Open AccessArticle

Fully Automated Segmentation of Cervical Spinal Cord in Sagittal MR Images Using Swin-Unet Architectures

by Rukiye Polattimur, Emre Dandıl, Mehmet Süleyman Yıldırım and Utku Şenol

J. Clin. Med. 2025, 14(19), 6994; https://doi.org/10.3390/jcm14196994 - 2 Oct 2025

Viewed by 329

Abstract

Background/Objectives: The spinal cord is a critical component of the central nervous system that transmits neural signals between the brain and the body’s peripheral regions through its nerve roots. Despite being partially protected by the vertebral column, the spinal cord remains highly [...] Read more.

Background/Objectives: The spinal cord is a critical component of the central nervous system that transmits neural signals between the brain and the body’s peripheral regions through its nerve roots. Despite being partially protected by the vertebral column, the spinal cord remains highly vulnerable to trauma, tumors, infections, and degenerative or inflammatory disorders. These conditions can disrupt neural conduction, resulting in severe functional impairments, such as paralysis, motor deficits, and sensory loss. Therefore, accurate and comprehensive spinal cord segmentation is essential for characterizing its structural features and evaluating neural integrity. Methods: In this study, we propose a fully automated method for segmentation of the cervical spinal cord in sagittal magnetic resonance (MR) images. This method facilitates rapid clinical evaluation and supports early diagnosis. Our approach uses a Swin-Unet architecture, which integrates vision transformer blocks into the U-Net framework. This enables the model to capture both local anatomical details and global contextual information. This design improves the delineation of the thin, curved, low-contrast cervical cord, resulting in more precise and robust segmentation. Results: In experimental studies, the proposed Swin-Unet model (SWU1), which uses transformer blocks in the encoder layer, achieved Dice Similarity Coefficient (DSC) and Hausdorff Distance 95 (HD95) scores of 0.9526 and 1.0707 mm, respectively, for cervical spinal cord segmentation. These results confirm that the model can consistently deliver precise, pixel-level delineations that are structurally accurate, which supports its reliability for clinical assessment. Conclusions: The attention-enhanced Swin-Unet architecture demonstrated high accuracy in segmenting thin and complex anatomical structures, such as the cervical spinal cord. Its ability to generalize with limited data highlights its potential for integration into clinical workflows to support diagnosis, monitoring, and treatment planning. Full article

(This article belongs to the Special Issue Artificial Intelligence and Deep Learning in Medical Imaging)

► Show Figures

Figure 1

20 pages, 162180 KB

Open AccessArticle

Annotation-Efficient and Domain-General Segmentation from Weak Labels: A Bounding Box-Guided Approach

by Ammar M. Okran, Hatem A. Rashwan, Sylvie Chambon and Domenec Puig

Electronics 2025, 14(19), 3917; https://doi.org/10.3390/electronics14193917 - 1 Oct 2025

Viewed by 296

Abstract

Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations [...] Read more.

Manual pixel-level annotation remains a major bottleneck in deploying deep learning models for dense prediction and semantic segmentation tasks across domains. This challenge is especially pronounced in applications involving fine-scale structures, such as cracks in infrastructure or lesions in medical imaging, where annotations are time-consuming, expensive, and subject to inter-observer variability. To address these challenges, this work proposes a weakly supervised and annotation-efficient segmentation framework that integrates sparse bounding-box annotations with a limited subset of strong (pixel-level) labels to train robust segmentation models. The fundamental element of the framework is a lightweight Bounding Box Encoder that converts weak annotations into multi-scale attention maps. These maps guide a ConvNeXt-Base encoder, and a lightweight U-Net–style convolutional neural network (CNN) decoder—using nearest-neighbor upsampling and skip connections—reconstructs the final segmentation mask. This design enables the model to focus on semantically relevant regions without relying on full supervision, drastically reducing annotation cost while maintaining high accuracy. We validate our framework on two distinct domains, road crack detection and skin cancer segmentation, demonstrating that it achieves performance comparable to fully supervised segmentation models using only 10–20% of strong annotations. Given the ability of the proposed framework to generalize across varied visual contexts, it has strong potential as a general annotation-efficient segmentation tool for domains where strong labeling is costly or infeasible. Full article

(This article belongs to the Special Issue Advanced Machine Learning, Pattern Recognition, and Deep Learning Technologies: Methodologies and Applications, 2nd Edition)

► Show Figures

Figure 1

35 pages, 20327 KB

Open AccessArticle

Automated Detection of Beaver-Influenced Floodplain Inundations in Multi-Temporal Aerial Imagery Using Deep Learning Algorithms

by Evan Zocco, Chandi Witharana, Isaac M. Ortega and William Ouimet

ISPRS Int. J. Geo-Inf. 2025, 14(10), 383; https://doi.org/10.3390/ijgi14100383 - 30 Sep 2025

Viewed by 150

Abstract

Remote sensing provides a viable alternative for understanding landscape modifications attributed to beaver activity. The central objective of this study is to integrate multi-source remote sensing observations in tandem with a deep learning (DL) (convolutional neural net or transformer) model to automatically map [...] Read more.

Remote sensing provides a viable alternative for understanding landscape modifications attributed to beaver activity. The central objective of this study is to integrate multi-source remote sensing observations in tandem with a deep learning (DL) (convolutional neural net or transformer) model to automatically map beaver-influenced floodplain inundations (BIFI) over large geographical extents. We trained, validated, and tested eleven different model configurations in three architectures using five ResNet and five B-Finetuned encoders. The training dataset consisted of >25,000 manually annotated aerial image tiles of BIFIs in Connecticut. The YOLOv8 architecture outperformed competing configurations and achieved an F1 score of 80.59% and pixel-based map accuracy of 98.95%. SegFormer and U-Net++’s highest-performing models had F1 scores of 68.98% and 78.86%, respectively. The YOLOv8l-seg model was deployed at a statewide scale based on 1 m resolution multi-temporal aerial imagery acquired from 1990 to 2019 under leaf-on and leaf-off conditions. Our results suggest a variety of inferences when comparing leaf-on and leaf-off conditions of the same year. The model exhibits limitations in identifying BIFIs in panchromatic imagery in occluded environments. Study findings demonstrate the potential of harnessing historical and modern aerial image datasets with state-of-the-art DL models to increase our understanding of beaver activity across space and time. Full article

► Show Figures

Figure 1

22 pages, 6436 KB

Open AccessArticle

Face Morphing Attack Detection Using Similarity Score Patterns Between De-Morphed and Live Images

by Thi Thuy Hoang, Bappy Md Siful Islam and Heejune Ahn

Electronics 2025, 14(19), 3851; https://doi.org/10.3390/electronics14193851 - 28 Sep 2025

Viewed by 198

Abstract

Face morphing attacks have become a serious threat to Face Recognition Systems (FRSs). A de-morphing-based morphing attack detection method has been proposed and studied, which uses suspect and live capture, but the unknown morphing parameters in the used morphing algorithm make applying de-morphing [...] Read more.

Face morphing attacks have become a serious threat to Face Recognition Systems (FRSs). A de-morphing-based morphing attack detection method has been proposed and studied, which uses suspect and live capture, but the unknown morphing parameters in the used morphing algorithm make applying de-morphing methods challenging. This paper proposes a robust face morphing attack detection (FMAD) method (pipeline) leveraging deep learning de-morphing networks. Inspired by differences in similarity score (i.e., cosine similarity between feature vectors) variations between morphed and non-morphed images, the detection pipeline was proposed to learn the variation patterns of similarity scores between live capture and de-morphed face/bona fide images with different de-morphing factors. An effective deep de-morphing network based on StyleGAN and the pSp (pixel2style2pixel) encoder was developed. The network generates de-morphed images from suspect and live images with multiple de-morphing factors and calculates similarity scores between feature vectors from the ArcFace network, which are then classified by the detection network. Experiments on morphing datasets from the Color FERET, FRGCv2, and SYS-MAD databases, including landmark-based and deep learning attacks, demonstrate that the proposed method performs high accuracy in detecting unseen morphing attacks across different databases. It attains an Equal Error Rate (EER) of less than 1–4% and a Bona Fide Presentation Classification Error Rate (BPCER) of approximately 11% at an Attack Presentation Classification Error Rate (APCER) of 0.1%, outperforming previous methods. Full article

(This article belongs to the Topic Recent Advances in Security, Privacy, and Trust)

► Show Figures

Figure 1

18 pages, 9355 KB

Open AccessArticle

Two-Dimensional Image Lempel–Ziv Complexity Calculation Method and Its Application in Defect Detection

by Jiancheng Yin, Wentao Sui, Xuye Zhuang, Yunlong Sheng and Yongbo Li

Entropy 2025, 27(10), 1014; https://doi.org/10.3390/e27101014 - 27 Sep 2025

Viewed by 269

Abstract

Although Lempel–Ziv complexity (LZC) can reflect changes in object characteristics by measuring changes in independent patterns in the signal, it can only be applied to one-dimensional time series and cannot be directly applied to two-dimensional images. To address this issue, this paper proposed [...] Read more.

Although Lempel–Ziv complexity (LZC) can reflect changes in object characteristics by measuring changes in independent patterns in the signal, it can only be applied to one-dimensional time series and cannot be directly applied to two-dimensional images. To address this issue, this paper proposed a two-dimensional Lempel–Ziv complexity by combining the concept of local receptive field in convolutional neural networks. This extends the application scenario of LZC from one-dimensional time series to two-dimensional images, further broadening the scope of application of LZC. First, the pixels and size of the image were normalized. Then, the image was encoded according to the sorting of normalized values within the 4 × 4 region. Next, the encoding result of the image was rearranged into a vector by row. Finally, the Lempel–Ziv complexity of the image could be obtained based on the rearranged vector. The proposed method was further used for defect detection in conjunction with the dilation operator and Sobel operator, and validated by two practical cases. The results showed that the proposed method can effectively identify independent pattern changes in images and can be used for defect detection. The accuracy rate of defect detection can reach 100%. Full article

(This article belongs to the Special Issue Complexity and Synchronization in Time Series)

► Show Figures

Figure 1

14 pages, 3620 KB

Open AccessArticle

Lung Opacity Segmentation in Chest CT Images Using Multi-Head and Multi-Channel U-Nets with Partially Supervised Learning

by Shingo Mabu, Takuya Hamada, Satoru Ikebe and Shoji Kido

Appl. Sci. 2025, 15(19), 10373; https://doi.org/10.3390/app151910373 - 24 Sep 2025

Viewed by 190

Abstract

There has been a large amount of research applying deep learning to the medical field. However, obtaining sufficient training data is challenging in the medical domain because annotation requires specialized knowledge and significant effort. This is especially true for segmentation tasks, where preparing [...] Read more.

There has been a large amount of research applying deep learning to the medical field. However, obtaining sufficient training data is challenging in the medical domain because annotation requires specialized knowledge and significant effort. This is especially true for segmentation tasks, where preparing fully annotated data for every pixel within an image is difficult. To address this, we propose methods to extract useful features for segmentation using two types of U-net-based networks and partially supervised learning with incomplete annotated data. This research specifically focuses on the segmentation of diffuse lung disease opacities in chest CT images. In our dataset, each image is partially annotated with a single type of lung opacity. To tackle this, we designed two distinct U-net architectures: a multi-head U-net, which utilizes a shared encoder and separated decoders for each opacity type, and a multi-channel U-net, which shares the encoder and decoder layers for more efficient feature learning. Furthermore, we integrated partially supervised learning with these networks. This involves employing distinct loss functions to both bring annotated regions (ground truth) and segmented regions (predictions) closer, and to push them apart, thereby suppressing erroneous predictions. In our experiments, we trained the models on partially annotated data and subsequently tested them on fully annotated data to compare the segmentation performance of each method. The results show that the multi-channel model applying partially supervised learning achieved the best performance while also reducing the number of weight parameters. Full article

(This article belongs to the Special Issue Pattern Recognition Applications of Neural Networks and Deep Learning)

► Show Figures

Figure 1

25 pages, 17562 KB

Open AccessArticle

SGFNet: Redundancy-Reduced Spectral–Spatial Fusion Network for Hyperspectral Image Classification

by Boyu Wang, Chi Cao and Dexing Kong

Entropy 2025, 27(10), 995; https://doi.org/10.3390/e27100995 - 24 Sep 2025

Viewed by 323

Abstract

Hyperspectral image classification (HSIC) involves analyzing high-dimensional data that contain substantial spectral redundancy and spatial noise, which increases the entropy and uncertainty of feature representations. Reducing such redundancy while retaining informative content in spectral–spatial interactions remains a fundamental challenge for building efficient and [...] Read more.

Hyperspectral image classification (HSIC) involves analyzing high-dimensional data that contain substantial spectral redundancy and spatial noise, which increases the entropy and uncertainty of feature representations. Reducing such redundancy while retaining informative content in spectral–spatial interactions remains a fundamental challenge for building efficient and accurate HSIC models. Traditional deep learning methods often rely on redundant modules or lack sufficient spectral–spatial coupling, limiting their ability to fully exploit the information content of hyperspectral data. To address these challenges, we propose SGFNet, which is a spectral-guided fusion network designed from an information–theoretic perspective to reduce feature redundancy and uncertainty. First, we designed a Spectral-Aware Filtering Module (SAFM) that suppresses noisy spectral components and reduces redundant entropy, encoding the raw pixel-wise spectrum into a compact spectral representation accessible to all encoder blocks. Second, we introduced a Spectral–Spatial Adaptive Fusion (SSAF) module, which strengthens spectral–spatial interactions and enhances the discriminative information in the fused features. Finally, we developed a Spectral Guidance Gated CNN (SGGC), which is a lightweight gated convolutional module that uses spectral guidance to more effectively extract spatial representations while avoiding unnecessary sequence modeling overhead. We conducted extensive experiments on four widely used hyperspectral benchmarks and compared SGFNet with eight state-of-the-art models. The results demonstrate that SGFNet consistently achieves superior performance across multiple metrics. From an information–theoretic perspective, SGFNet implicitly balances redundancy reduction and information preservation, providing an efficient and effective solution for HSIC. Full article

(This article belongs to the Section Multidisciplinary Applications)

► Show Figures

Figure 1

14 pages, 3062 KB

Open AccessArticle

Self-Supervised Monocular Depth Estimation Based on Differential Attention

by Ming Zhou, Hancheng Yu, Zhongchen Li and Yupu Zhang

Algorithms 2025, 18(9), 590; https://doi.org/10.3390/a18090590 - 19 Sep 2025

Viewed by 352

Abstract

Depth estimation algorithms are widely applied in various fields, including 3D reconstruction, autonomous driving, and industrial robotics. Monocular self-supervised algorithms for depth prediction offer a cost-effective alternative to acquiring depth through hardware devices such as LiDAR. However, current depth prediction networks, predominantly based [...] Read more.

Depth estimation algorithms are widely applied in various fields, including 3D reconstruction, autonomous driving, and industrial robotics. Monocular self-supervised algorithms for depth prediction offer a cost-effective alternative to acquiring depth through hardware devices such as LiDAR. However, current depth prediction networks, predominantly based on conventional encoder–decoder architectures, often encounter two critical limitations: insufficient feature fusion mechanisms during the upsampling phase and constrained receptive fields. These limitations result in the loss of high-frequency details in the predicted depth maps. To overcome these issues, we introduce differential attention operators to enhance global feature representation and refine locally upsampled features within the depth decoder. Furthermore, we equip the decoder with a deformable bin-structured prediction head; this lightweight design enables per-pixel dynamic aggregation of local depth distributions via adaptive receptive field modulation and deformable sampling, enhancing the decoder’s fine-grained detail processing by capturing local geometry and holistic structures. Experimental results on the KITTI and Make3D datasets demonstrate that our proposed method produces more accurate depth maps with finer details compared to existing approaches. Full article

(This article belongs to the Special Issue Algorithms for Feature Selection (3rd Edition))

► Show Figures

Figure 1

27 pages, 28041 KB

Open AccessArticle

A Unified GAN-Based Framework for Unsupervised Video Anomaly Detection Using Optical Flow and RGB Cues

by Seung-Hun Kang and Hyun-Soo Kang

Sensors 2025, 25(18), 5869; https://doi.org/10.3390/s25185869 - 19 Sep 2025

Viewed by 458

Abstract

Video anomaly detection in unconstrained environments remains a fundamental challenge due to the scarcity of labeled anomalous data and the diversity of real-world scenarios. To address this, we propose a novel unsupervised framework that integrates RGB appearance and optical flow motion via a [...] Read more.

Video anomaly detection in unconstrained environments remains a fundamental challenge due to the scarcity of labeled anomalous data and the diversity of real-world scenarios. To address this, we propose a novel unsupervised framework that integrates RGB appearance and optical flow motion via a unified GAN-based architecture. The generator features a dual encoder and a GRU–attention temporal bottleneck, while the discriminator employs ConvLSTM layers and residual-enhanced MLPs to evaluate temporal coherence. To improve training stability and reconstruction quality, we introduce DASLoss—a composite loss that incorporates pixel, perceptual, temporal, and feature consistency terms. Experiments were conducted on three benchmark datasets. On XD-Violence, our model achieves an Average Precision (AP) of 80.5%, outperforming other unsupervised methods such as MGAFlow and Flashback. On Hockey Fight, it achieves an AUC of 0.92 and an F1-score of 0.85, demonstrating strong performance in detecting short-duration violent events. On UCSD Ped2, our model attains an AUC of 0.96, matching several state-of-the-art models despite using no supervision. These results confirm the effectiveness and generalizability of our approach in diverse anomaly detection settings. Full article

(This article belongs to the Special Issue Deep Learning Technologies and Their Applications in Image Processing, Computer Vision, and Computational Intelligence)

► Show Figures

Figure 1

13 pages, 2020 KB

Open AccessArticle

Substrate Orientation-Dependent Synaptic Plasticity and Visual Memory in Sol–Gel-Derived ZnO Optoelectronic Devices

by Dabin Jeon, Seung Hun Lee, JungBeen Cho, Kyoung-Bo Kim and Sung-Nam Lee

Materials 2025, 18(18), 4377; https://doi.org/10.3390/ma18184377 - 19 Sep 2025

Viewed by 396

Abstract

We report Al/ZnO/Al optoelectronic synaptic devices fabricated on c-plane and m-plane sapphire substrates using a sol–gel process. The devices exhibit essential synaptic behaviors such as excitatory postsynaptic current modulation, paired-pulse facilitation, and long-term learning–forgetting dynamics described by Wickelgren’s power law. Comparative analysis reveals [...] Read more.

We report Al/ZnO/Al optoelectronic synaptic devices fabricated on c-plane and m-plane sapphire substrates using a sol–gel process. The devices exhibit essential synaptic behaviors such as excitatory postsynaptic current modulation, paired-pulse facilitation, and long-term learning–forgetting dynamics described by Wickelgren’s power law. Comparative analysis reveals that substrate orientation strongly influences memory performance: devices on m-plane consistently show higher EPSCs, slower decay rates, and superior retention compared to c-plane counterparts. These characteristics are attributed to crystallographic effects that enhance carrier trapping and persistent photoconductivity. To demonstrate their practical applicability, 3 × 3-pixel arrays of adjacent devices were constructed, where a “T”-shaped optical pattern was successfully encoded, learned, and retained across repeated stimulation cycles. These results highlight the critical role of substrate orientation in tailoring synaptic plasticity and memory retention, offering promising prospects for ZnO-based optoelectronic synaptic arrays in in-sensor neuromorphic computing and artificial visual memory systems. Full article

(This article belongs to the Special Issue Functional Materials for Memristors, Metal-Insulator-Metal (MIM) Tunneling Diodes and Field Effect Transistors (FET))

► Show Figures

Figure 1

25 pages, 4796 KB

Open AccessArticle

Vision-Language Guided Semantic Diffusion Sampling for Small Object Detection in Remote Sensing Imagery

by Jian Ma, Mingming Bian, Fan Fan, Hui Kuang, Lei Liu, Zhibing Wang, Ting Li and Running Zhang

Remote Sens. 2025, 17(18), 3203; https://doi.org/10.3390/rs17183203 - 17 Sep 2025

Viewed by 621

Abstract

Synthetic aperture radar (SAR), with its all-weather and all-day active imaging capability, has become indispensable for geoscientific analysis and socio-economic applications. Despite advances in deep learning–based object detection, the rapid and accurate detection of small objects in SAR imagery remains a major challenge [...] Read more.

Synthetic aperture radar (SAR), with its all-weather and all-day active imaging capability, has become indispensable for geoscientific analysis and socio-economic applications. Despite advances in deep learning–based object detection, the rapid and accurate detection of small objects in SAR imagery remains a major challenge due to their extremely limited pixel representation, blurred boundaries in dense distributions, and the imbalance of positive–negative samples during training. Recently, vision–language models such as Contrastive Language-Image Pre-Training (CLIP) have attracted widespread research interest for their powerful cross-modal semantic modeling capabilities. Nevertheless, their potential to guide precise localization and detection of small objects in SAR imagery has not yet been fully exploited. To overcome these limitations, we propose the CLIP-Driven Adaptive Tiny Object Detection Diffusion Network (CDATOD-Diff). This framework introduces a CLIP image–text encoding-guided dynamic sampling strategy that leverages cross-modal semantic priors to alleviate the scarcity of effective positive samples. Furthermore, a generative diffusion-based module reformulates the sampling process through iterative denoising, enhancing contextual awareness. To address regression instability, we design a Balanced Corner–IoU (BC-IoU) loss, which decouples corner localization from scale variation and reduces sensitivity to minor positional errors, thereby stabilizing bounding box predictions. Extensive experiments conducted on multiple SAR and optical remote sensing datasets demonstrate that CDATOD-Diff achieves state-of-the-art performance, delivering significant improvements in detection robustness and localization accuracy under challenging small-object scenarios with complex backgrounds and dense distributions. Full article

(This article belongs to the Special Issue Object Detection in Remote Sensing Images Based on Artificial Intelligence)

► Show Figures

Figure 1

28 pages, 13374 KB

Open AccessArticle

Low-Light Remote Sensing Image Enhancement via Priors Guided End-to-End Latent Residual Diffusion

by Bing Ding, Bei Sun and Xiaoyong Sun

Remote Sens. 2025, 17(18), 3193; https://doi.org/10.3390/rs17183193 - 15 Sep 2025

Viewed by 517

Abstract

Low-light image enhancement, especially for remote sensing images, remains a challenging task due to issues like low brightness, high noise, color distortion, and the unique complexities of remote sensing scenes, such as uneven illumination and large coverage. Existing methods often struggle to balance [...] Read more.

Low-light image enhancement, especially for remote sensing images, remains a challenging task due to issues like low brightness, high noise, color distortion, and the unique complexities of remote sensing scenes, such as uneven illumination and large coverage. Existing methods often struggle to balance efficiency, accuracy, and robustness. Diffusion models have shown potential in image restoration, but they often rely on multi-step noise estimation, leading to inefficiency. To address these issues, this study proposes an enhancement framework based on a lightweight encoder–decoder and a physical-prior-guided end-to-end single-step residual diffusion model. The lightweight encoder–decoder, tailored for low-light scenarios, reduces computational redundancy while preserving key features, ensuring efficient mapping between pixel and latent spaces. Guided by physical priors, the end-to-end trained single-step residual diffusion model simplifies the process by eliminating multi-step noise estimation through end-to-end training, accelerating inference without sacrificing quality. Illumination-invariant priors guide the inference process, alleviating blurriness from missing details and ensuring structural consistency. Experimental results show that it not only demonstrates superiority over mainstream methods in quantitative metrics and visual effects but also achieves a 20× speedup compared with an advanced diffusion-based method. Full article

(This article belongs to the Special Issue Knowledge-Driven and/or Data-Driven Methods for Remote Sensing Image Processing (2nd Edition))

► Show Figures

Figure 1

14 pages, 954 KB

Open AccessArticle

A Benchmark for Symbolic Reasoning from Pixel Sequences: Grid-Level Visual Completion and Correction

by Lei Kang, Xuanshuo Fu, Mohamed Ali Souibgui, Andrey Barsky, Lluis Gomez, Javier Vazquez-Corral, Alicia Fornés, Ernest Valveny and Dimosthenis Karatzas

Mathematics 2025, 13(17), 2851; https://doi.org/10.3390/math13172851 - 4 Sep 2025

Viewed by 561

Abstract

Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly [...] Read more.

Grid structured visual data such as forms, tables, and game boards require models that pair pixel level perception with symbolic consistency under global constraints. Recent Pixel Language Models (PLMs) map images to token sequences with promising flexibility, yet we find they generalize poorly when observable evidence becomes sparse or corrupted. We present GridMNIST-Sudoku, a benchmark that renders large numbers of Sudoku instances with style diverse handwritten digits and provides parameterized stress tracks for two tasks: Completion (predict missing cells) and Correction (detect and repair incorrect cells) across difficulty levels ranging from 1 to 90 altered positions in a 9 × 9 grid. Attention diagnostics on PLMs trained with conventional one dimensional positional encodings reveal weak structure awareness outside the natural Sudoku sparsity band. Motivated by these findings, we propose a lightweight Row-Column-Box (RCB) positional prior that injects grid aligned coordinates and combine it with simple sparsity and corruption augmentations. Trained only on the natural distribution, the resulting model substantially improves out of distribution accuracy across wide sparsity and corruption ranges while maintaining strong in distribution performance. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

23 pages, 5998 KB

Open AccessArticle

An Enhanced Feature Extraction and Multi-Branch Occlusion Discrimination Network for Road Detection from Satellite Imagery

by Ruixiang Wu, Lun Zhang, Longkai Guan, Xiangrong Ni and Jianxing Gong

Remote Sens. 2025, 17(17), 3037; https://doi.org/10.3390/rs17173037 - 1 Sep 2025

Viewed by 840

Abstract

Extracting road network information from satellite remote sensing images is an effective method of dealing with dynamic changes in road networks. At present, the use of deep learning methods to automatically segment road networks from remote sensing images has become mainstream. However, existing [...] Read more.

Extracting road network information from satellite remote sensing images is an effective method of dealing with dynamic changes in road networks. At present, the use of deep learning methods to automatically segment road networks from remote sensing images has become mainstream. However, existing methods often produce fragmented extraction results. This is usually caused by insufficient feature extraction and occlusion. In order to solve these problems, we propose an enhanced feature extraction and multi-branch occlusion discrimination network (EFMOD-Net) based on an encoder–decoder architecture. Firstly, a multi-directional feature extraction (MFE) module was proposed as the input for the network, which utilizes multi-directional strip convolution for feature extraction to better capture the linear features of the road. Subsequently, an enhanced feature extraction (EFE) module was designed to enhance the performance of the model in the feature extraction stage by using a dual-branch structure. The proposed multi-branch occlusion discrimination (MOD) module combines the attention mechanism and strip convolution to learn the topological relationship between pixels, enhance the network’s detection of occlusion and complex backgrounds, and reduce the generation of road debris. On the public dataset, the proposed method is compared with other SOTA methods. The experimental results show that the network designed in this paper achieves an IoU of 64.73 and 63.58 on the DeepGlobe and CHN6-CUG datasets, respectively, which is 1.66% and 1.84% higher than the IoU of performance-based methods. The proposed method combines multi-directional bar convolution and a multi-branch structure for road extraction, which provides a new idea for linear object segmentation in complex backgrounds that could be applied directly to urban renewal, disaster assessment, and other application scenarios. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

Search Results (592)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (592)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI