MDPI - Publisher of Open Access Journals

18 pages, 3641 KB

Open AccessArticle

A Wavelet-Enhanced Detector for Tiny Objects in Remote-Sensing Images

by Weifan Xu and Yong Hu

Remote Sens. 2026, 18(8), 1109; https://doi.org/10.3390/rs18081109 - 8 Apr 2026

Accurate and efficient detection is pivotal for tiny objects in remote sensing. However, achieving a favorable accuracy-efficiency trade-off remains challenging due to the few informative pixels of small targets, frequent occlusions, cluttered backgrounds, and detail degradation introduced by downsampling and multi-scale fusion. To [...] Read more.

Accurate and efficient detection is pivotal for tiny objects in remote sensing. However, achieving a favorable accuracy-efficiency trade-off remains challenging due to the few informative pixels of small targets, frequent occlusions, cluttered backgrounds, and detail degradation introduced by downsampling and multi-scale fusion. To address these challenges, we propose WEYOLO, a wavelet-enhanced detector that explicitly models frequency components and adaptively strengthens high-frequency cues to improve tiny-object robustness while maintaining competitive efficiency in inference speed and model size for remote-sensing deployment. To preserve edges and textures when spatial resolution is reduced, we design a Frequency-Aware Lifting Haar (FaLH) backbone that decomposes features into directional sub-bands and retains them during downsampling, preventing the loss of high-frequency information. Next, to address the blurring and detail loss caused by conventional pooling during multi-scale fusion, we introduce a Frequency-Domain Pyramid-Pooling (FDPP) module that performs wavelet-based multi-resolution analysis for frequency-aware feature-pyramid fusion. Additionally, we propose a stable size-aware quality focal regression loss that unifies Focaler-CIoU and size-aware DFL into a single objective, improving robustness and overall accuracy for small objects. Comprehensive experiments show that WEYOLO improves precision and recall over the baseline by 3.2%/4.2% on VisDrone and 2.6%/9.7% on TT100K; on AI-TOD, it achieves 47.5% mAP@0.5 and 21.3% mAP@0.5:0.95. Meanwhile, it reduces the parameter count by 60%, achieving a strong accuracy-efficiency balance for practical aerial sensing deployment. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

16 pages, 9801 KB

Open AccessArticle

Monitoring Koyna Dam Displacements Using Persistent Scatterer Interferometry

by Sara Zouriq, Gehan Hamdy, Amr Fawzy, Rejoice Thomas, Hesham El-Askary, Eehab Khalil, Mohamed ElSayad and Tarik El-Salawaky

Hydropower 2026, 1(1), 3; https://doi.org/10.3390/hydropower1010003 - 7 Apr 2026

Abstract

Monitoring dam stability is critical to ensure structural safety and operational reliability. This study integrates Persistent Scatterer Interferometry (PSI) based on Sentinel-1 SAR imagery (2020–2023) with Finite Element Method (FEM) simulations to assess the behavior of the Koyna Dam in India. PSI detected [...] Read more.

Monitoring dam stability is critical to ensure structural safety and operational reliability. This study integrates Persistent Scatterer Interferometry (PSI) based on Sentinel-1 SAR imagery (2020–2023) with Finite Element Method (FEM) simulations to assess the behavior of the Koyna Dam in India. PSI detected crest displacements between −1.0 and −1.8 mm yr⁻¹, while FEM simulations predicted a maximum vertical displacement of approximately −3.2 mm at the crest. Although these results represent different quantities (time-averaged displacement rates versus peak static displacement), both approaches indicate millimeter-scale deformation and a consistent pattern of settlement at the dam crest, supporting the interpretation of hydrologically driven structural response. The observed differences are primarily attributed to differences in spatial resolution and methodology between point-based FEM outputs and pixel-averaged satellite observations. The study demonstrates that combining satellite-based monitoring with numerical simulations provides a robust and cost-effective framework for dam safety assessment. This integrated approach supports improved interpretation of deformation behavior and offers practical value in extreme conditions, such as during flood events or climate-driven hydrological changes. Furthermore, continued advances in remote sensing and numerical modeling are expected to enhance the reliability of such approaches, making this methodology a transferable and sustainable solution for dam management worldwide. Full article

► Show Figures

Figure 1

25 pages, 7467 KB

Open AccessArticle

Double Cost-Volume Stereo Matching with Entropy-Difference-Guided Fusion

by Huanchun Yang, Hongshe Dang, Xuande Zhang and Quanping Chen

Electronics 2026, 15(7), 1525; https://doi.org/10.3390/electronics15071525 - 6 Apr 2026

Viewed by 64

Abstract

To address the reduced accuracy of stereo matching networks near object boundaries and disparity discontinuities, a double cost–volume stereo matching network with entropy-difference-guided fusion is proposed. The proposed network was built based on RAFT-Stereo. It employs a pretrained backbone to extract multi-scale features [...] Read more.

To address the reduced accuracy of stereo matching networks near object boundaries and disparity discontinuities, a double cost–volume stereo matching network with entropy-difference-guided fusion is proposed. The proposed network was built based on RAFT-Stereo. It employs a pretrained backbone to extract multi-scale features and uses deformable attention for cross-scale feature fusion. A shallow image-guided branch was used to generate pixel-wise constraint information to limit the magnitude of sampling offsets and alleviate cross-structure sampling. Based on the extracted features, a group-wise correlation cost–volume and a normalized correlation cost–volume were constructed. Both cost–volumes were regularized by 3D Hourglass networks, and a structure-consistent intra-scale aggregation module was introduced during the regularization of the group-wise correlation cost–volume. The two aggregated results were then fused by the entropy-difference-guided fusion module to obtain the final cost–volume. The experimental results show the effectiveness of the proposed network in the Scene Flow, KITTI, and ETH3D datasets, achieving an endpoint error of 0.45 px and a >3 px error rate of 2.41% on the Scene Flow dataset. Full article

(This article belongs to the Section Artificial Intelligence)

► Show Figures

Figure 1

24 pages, 766 KB

Open AccessArticle

Systematic Evaluation of YOLOv8 Variants for UAV-Based Object Detection

by Chieh-Min Liu and Jyh-Ching Juang

Appl. Sci. 2026, 16(7), 3559; https://doi.org/10.3390/app16073559 - 6 Apr 2026

Viewed by 156

Abstract

Detecting small objects in drone imagery remains challenging because of extreme object scale variations, dense scenes, and limited pixel information. Although recent YOLOv8 variants provide multiple model scales and architectural options, systematic guidance on their practical use in UAV-based detection remains limited. Rather [...] Read more.

Detecting small objects in drone imagery remains challenging because of extreme object scale variations, dense scenes, and limited pixel information. Although recent YOLOv8 variants provide multiple model scales and architectural options, systematic guidance on their practical use in UAV-based detection remains limited. Rather than proposing novel network architectures, this study provides a quantitative cost–benefit analysis and empirical deployment guidelines by comprehensively evaluating the complete YOLOv8 family on the VisDrone dataset to assess the effects of the model capacity, input resolution, and architectural modifications on the small-object detection performance. The results showed that increasing the model capacity exhibited diminishing returns: YOLOv8l achieved the best overall accuracy (15.9% mAP50), while the larger YOLOv8x model exhibited a substantial performance degradation (7.32% mAP50) owing to training instability under data-constrained conditions. Scaling the input resolution from 640 to 1280 yielded a 25% improvement in detection performance, substantially exceeding the gains obtained through architectural modifications, such as adding a P2 detection layer (+6%). The optimal configuration (YOLOv8l @ 1280) achieved a 488% improvement compared to the YOLOv5 baseline. These findings demonstrate that, for UAV-based small-object detection, prioritizing an appropriate model capacity and input resolution is more effective than increasing the architectural complexity. Full article

► Show Figures

Figure 1

17 pages, 12185 KB

Open AccessArticle

Adjustable Complexity Transformer Architecture for Image Denoising

by Jan-Ray Liao, Wen Lin and Li-Wen Chang

Signals 2026, 7(2), 33; https://doi.org/10.3390/signals7020033 - 6 Apr 2026

Viewed by 115

Abstract

In recent years, image denoising has seen a shift from traditional non-local self-similarity methods like BM3D to deep-learning based approaches that use learnable convolutions and attention mechanisms. While pixel-level attention is effective at capturing long-range relationships similar to non-local self-similarity based methods, it [...] Read more.

In recent years, image denoising has seen a shift from traditional non-local self-similarity methods like BM3D to deep-learning based approaches that use learnable convolutions and attention mechanisms. While pixel-level attention is effective at capturing long-range relationships similar to non-local self-similarity based methods, it incurs extremely high computational costs that scale quadratically with image resolution. As an alternative, channel-wise attention is resolution-independent and computationally efficient but may miss crucial spatial details. In this paper, an adjustable attention mechanism is introduced that bridges the gap between pixel and channel attentions. In the proposed model, average pooling and variable-size convolutions are added before attention calculation to adjust spatial resolution and, thus, allow dynamical adjustment of computational complexity. This adjustable attention is applied in a transformer-based U-Net architecture and achieves performance comparable to state-of-the-art methods in both real and Gaussian blind denoising tasks. To be more concrete, the proposed method achieves a Peak Signal-to-Noise Ratio of 39.65 dB and a Structural Similarity Index Measure of 0.913 on the Smartphone Image Denoising Dataset. Therefore, the proposed method demonstrates a balance between efficiency and denoising quality. Full article

(This article belongs to the Topic Image Processing, Signal Processing and Their Applications)

► Show Figures

Figure 1

32 pages, 43664 KB

Open AccessArticle

MVFF: Multi-View Feature Fusion Network for Small UAV Detection

by Kunlin Zou, Haitao Zhao, Xingwei Yan, Wei Wang, Yan Zhang and Yaxiu Zhang

Drones 2026, 10(4), 264; https://doi.org/10.3390/drones10040264 - 4 Apr 2026

Viewed by 287

Abstract

With the widespread adoption of various types of Unmanned Aerial Vehicles (UAVs), their non-compliant operations pose a severe challenge to public safety, necessitating the urgent identification and detection of UAV targets. However, in complex backgrounds, UAV targets exhibit small-scale dimensions and low contrast, [...] Read more.

With the widespread adoption of various types of Unmanned Aerial Vehicles (UAVs), their non-compliant operations pose a severe challenge to public safety, necessitating the urgent identification and detection of UAV targets. However, in complex backgrounds, UAV targets exhibit small-scale dimensions and low contrast, coupled with extremely low signal-to-noise ratios. This forces conventional target detection methods to confront issues such as feature convergence, missed detections, and false alarms. To address these challenges, we propose a Multi-View Feature Fusion Network (MVFF) that achieves precise identification of small, low-contrast UAV targets by leveraging complementary multi-view information. First, we design a collaborative view alignment fusion module. This module employs a cross-map feature fusion attention mechanism to establish pixel-level mapping relationships and perform deep fusion, effectively resolving geometric distortion and semantic overlap caused by imaging angle differences. Furthermore, we introduce a view feature smoothing module that employs displacement operators to construct a lightweight long-range modeling mechanism. This overcomes the limitations of traditional convolutional local receptive fields, effectively eliminating ghosting artifacts and response discontinuities arising from multi-view fusion. Additionally, we developed a small object binary cross-entropy loss function. By incorporating scale-adaptive gain factors and confidence-aware weights, this function enhances the learning capability of edge features in small objects, significantly reducing prediction uncertainty caused by background noise. Comparative experiments conducted on a multi-perspective UAV dataset demonstrate that our approach consistently outperforms existing state-of-the-art methods across multiple performance metrics. Specifically, it achieves a Structure-measure of 91.50% and an F-measure of 85.14%, validating the effectiveness and superiority of the proposed method. Full article

(This article belongs to the Special Issue Detection, Identification and Tracking of UAVs and Drones: 2nd Edition)

► Show Figures

Figure 1

18 pages, 3975 KB

Open AccessTechnical Note

SAS-SemiUNet++: A Stochastic Consistency Regularized Framework with Scale-Aware Semantic Recalibration for Cardiac MRI Segmentation

by Jie Rao, Xinhao Ma and Xiang Li

Appl. Sci. 2026, 16(7), 3507; https://doi.org/10.3390/app16073507 - 3 Apr 2026

Viewed by 148

Abstract

Precise segmentation of cardiac substructures in magnetic resonance imaging is pivotal for diagnosis and treatment planning but remains impeded by anatomical scale heterogeneity and the scarcity of high-quality pixel-level annotations. Existing deep learning paradigms often struggle to simultaneously resolve the global geometry of [...] Read more.

Precise segmentation of cardiac substructures in magnetic resonance imaging is pivotal for diagnosis and treatment planning but remains impeded by anatomical scale heterogeneity and the scarcity of high-quality pixel-level annotations. Existing deep learning paradigms often struggle to simultaneously resolve the global geometry of ventricular cavities and the fine-grained boundaries of the myocardium, particularly in low-data regimes. To address these challenges, we propose SAS-SemiUNet++, a holistic semi-supervised segmentation framework. This architecture incorporates two novel mechanisms: (1) The Scale-Aware Semantic Recalibration (SASR) unit, which functions as a dynamic semantic gate to adaptively adjust receptive fields, mimicking a radiologist’s variable-focus mechanism to capture multi-scale anatomical details, and (2) Stochastic Consistency Regularization (SCR), a dual-path perturbation strategy that enforces geometric invariance on unlabeled data, thereby mitigating overfitting to noisy pseudo-labels. Comprehensive evaluations on the ACDC benchmark demonstrate that SAS-SemiUNet++ significantly outperforms state-of-the-art methods, achieving superior segmentation accuracy and boundary fidelity, particularly in reducing the 95% Hausdorff distance. This study presents a data-efficient and robust solution for cardiac image analysis, offering potential for scalable clinical deployment. Full article

(This article belongs to the Special Issue Cardiac Imaging and Heart Diseases: Recent Progress)

► Show Figures

Figure 1

18 pages, 692 KB

Open AccessReview

From Pixels to Prediction: Developing Integrated AI Foundation Models for Personalized Thyroid Cancer Care

by Jae Hyun Park, Younghyun Park, Yong Moon Lee, Sejung Yang and Jong Ho Yoon

Cancers 2026, 18(7), 1155; https://doi.org/10.3390/cancers18071155 - 3 Apr 2026

Viewed by 197

Abstract

Background: Thyroid cancer incidence continues to rise globally, yet current diagnostic methods, reliant on ultrasound-guided fine-needle aspiration, suffer from substantial inter-observer variability and indeterminate results. Objective: This review explores the transformative potential of integrated artificial intelligence (AI) foundation models in thyroid cancer management. [...] Read more.

Background: Thyroid cancer incidence continues to rise globally, yet current diagnostic methods, reliant on ultrasound-guided fine-needle aspiration, suffer from substantial inter-observer variability and indeterminate results. Objective: This review explores the transformative potential of integrated artificial intelligence (AI) foundation models in thyroid cancer management. We propose a paradigm shift using foundation models—large-scale, multimodal architectures pre-trained on diverse datasets—to bridge the gap between initial pixels and long-term prognostic prediction. Proposed Models: We introduce two integrated conceptual frameworks: ThyroSight-Prognos for high-precision assessment in specialized tertiary settings and SonoPredict-AI for cost-effective screening in primary care. Key Innovations: By synthesizing data from ultrasound, pathology (WSI), genomics, and clinical parameters through explainable AI (XAI), these models aim to reduce unnecessary surgeries and personalize treatment pathways. Challenges and Outlook: This paper addresses critical implementation challenges, including data heterogeneity, hardware requirements, and regulatory trust, ultimately providing a strategic blueprint for future multi-center prospective clinical validation to revolutionize thyroid care through precision oncology. Full article

(This article belongs to the Special Issue The Changing Paradigms in the Management of Thyroid Cancer)

► Show Figures

Figure 1

23 pages, 8466 KB

Open AccessArticle

Spatiotemporal Variation in Understory Litter Coverage Based on Multi-Angle Remote Sensing Inversion Using Sentinel-2 and MODIS BRDF Imagery

by Zhujun Gu, Jiasheng Wu, Qinghua Fu, Xiaofeng Yue, Guanghui Liao, Yanzi He, Xianzhi Mai, Jia Liu, Qiuyin He and Quanman Lin

Remote Sens. 2026, 18(7), 1070; https://doi.org/10.3390/rs18071070 - 2 Apr 2026

Viewed by 245

Abstract

The forest understory litter fraction

({F V C}_{y})

is a critical indicator for evaluating the effectiveness of “understory erosion” control in red soil regions; however, its high-precision, large-scale monitoring remains challenging due to canopy occlusion. This study proposes an [...] Read more.

The forest understory litter fraction

({F V C}_{y})

is a critical indicator for evaluating the effectiveness of “understory erosion” control in red soil regions; however, its high-precision, large-scale monitoring remains challenging due to canopy occlusion. This study proposes an

{F V C}_{y}

inversion framework that integrates high-spatial-resolution Sentinel-2 imagery with multi-angular prior knowledge from MODIS BRDF products. First, a linear mapping model between multi-band reflectances at 0° and 45° view angles was constructed using 500 m MODIS MCD43A1 products

(R^{2} > 0.8)

. This model was subsequently employed as a physical prior for anisotropic characterization and transferred to 10 m Sentinel-2 imagery to generate a long-term, dual-angle reflectance dataset. Subsequently, the four-scale geometric-optical model was utilized to decouple canopy and understory background signals, followed by quantitative

{F V C}_{y}

inversion using a pixel-based dimidiate model. Validation results confirmed the reliability of the framework

(R^{2} = 0.74, R M S E = 0.1073)

. Spatiotemporal evolution analysis indicated a significant upward trend in

{F V C}_{y}

across Changting County from 2016 to 2025, with over 90% of the area showing improvement. The proportion of high-coverage areas

({F V C}_{y} > 0.75)

increased from 10% to 38%, exhibiting a “high in the center, low in the periphery” spatial pattern that aligns closely with core ecological restoration zones. Stability and persistence analyses further revealed that 61.18% of the study area reached moderate-to-high stability, and 70% of pixels exhibited a “positive persistence-improvement” trend, highlighting a pronounced inertia-driven enhancement in ecological recovery. This study provides a refined technical pathway for assessing soil and water conservation benefits in red soil regions. Full article

(This article belongs to the Section Ecological Remote Sensing)

► Show Figures

Figure 1

21 pages, 13827 KB

Open AccessArticle

An Integrated Model Based on CNN-Transformer and PLUS for Urban Expansion Simulation in the Yangtze River Delta, China

by Linyu Ma, Jue Xiao, Gan Teng, Ting Zhang and Longqian Chen

Remote Sens. 2026, 18(7), 1071; https://doi.org/10.3390/rs18071071 - 2 Apr 2026

Viewed by 272

Abstract

Land use changes within urban agglomerations exhibit significant spatiotemporal heterogeneity and regional diversity. In urban agglomeration land simulation, traditional models often struggle to systematically capture these variations. We introduce the GCTP, a novel framework that integrates guided Geographical zoning, Convolutional Neural Networks (CNN)-Transformer, [...] Read more.

Land use changes within urban agglomerations exhibit significant spatiotemporal heterogeneity and regional diversity. In urban agglomeration land simulation, traditional models often struggle to systematically capture these variations. We introduce the GCTP, a novel framework that integrates guided Geographical zoning, Convolutional Neural Networks (CNN)-Transformer, and the Patch-generating Land Use Simulation (PLUS) model. Initially, guided K-means clustering was employed for geographic zoning to characterize regional spatial non-stationarity. Then, a CNN-Transformer network leveraged self-attention mechanisms to capture multi-scale spatial correlations, obtaining pixel-level development probabilities. Finally, these probabilities were fused with PLUS- Land Expansion Analysis Strategy (LEAS) outputs to drive PLUS- Cellular Automata with multi-type Random Seeds (CARS) for patch-level simulation. The results demonstrate the following: (1) The embedding of guided zoning enabled the model to achieve an Overall Accuracy (OA) of 0.941, effectively mitigating global simulation bias. (2) The optimal simulation performance occurred at a fusion weight of 0.81, yielding a Kappa of 0.8917 and an Figure of Merit (FoM) of 0.3830, significantly exceeding a single model. (3) The 2030 simulation indicates that the GCTP model effectively reduces isolated pixels at urban fringes. The GCTP generates neighborhood patterns with high spatial compactness and geographic consistency. This study highlights the significant advantages of integrating long-range spatial perception with geographical heterogeneity constraints in the land expansion simulation of urban agglomerations. The findings support more precise territorial spatial planning practices. Full article

(This article belongs to the Special Issue Machine Learning of Remote Sensing Imagery for Land Cover Mapping)

► Show Figures

Figure 1

14 pages, 5017 KB

Open AccessArticle

Calibrated Feature Fusion: Enhancing Few-Shot Industrial Anomaly Detection via Cross-Stage Representation Alignment

by Shuangjun Zheng, Songtao Zhang, Zhihuan Huang, Kuoteng Sun, Yuzhong Gong, Jiayan Wen and Eryun Liu

Sensors 2026, 26(7), 2164; https://doi.org/10.3390/s26072164 - 31 Mar 2026

Viewed by 324

Abstract

Few-shot industrial anomaly detection technology has received more and more attention because it does not require a large number of abnormal samples to train. Recent few-shot industrial anomaly detection methods commonly fuse multi-stage features from frozen vision transformers for anomaly scoring. However, we [...] Read more.

Few-shot industrial anomaly detection technology has received more and more attention because it does not require a large number of abnormal samples to train. Recent few-shot industrial anomaly detection methods commonly fuse multi-stage features from frozen vision transformers for anomaly scoring. However, we find that such direct fusion suffers from cross-stage representation misalignment—shallow and deep features differ significantly in scale and semantic granularity, leading to inconsistent anomaly maps and degraded localization. To address this problem, we propose Calibrated Feature Fusion (CFF), a lightweight adapter that enhances feature fusion via cross-stage representation alignment. The CFF module can be integrated into existing state-of-the-art frameworks and operates effectively in few-shot settings. Experiments on MVTec AD and VisA show that CFF consistently improves the state-of-the-art method across 1/2/4-shot settings, achieving gains of up to +1.6% AUROC and +4.1% AP in pixel-level segmentation. Notably, CFF enhances both precision and recall in four-shot scenarios. Ablation studies confirm that cross-stage alignment is key to stable multi-stage fusion. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

21 pages, 6938 KB

Open AccessArticle

IllumiSIFT: A Cascade Framework for DoG Pyramid Learning in Darkness

by Dewan Fahim Noor, Mohammed Rashid Chowdhury and Sadia Sikder

Sensors 2026, 26(7), 2147; https://doi.org/10.3390/s26072147 - 31 Mar 2026

Viewed by 221

Abstract

In visual object recognition problems, low light exposure and low-quality images present significant challenges in navigation, surveillance, and image retrieval applications, where reliable feature detection is critical. Although recent deep learning–based image enhancement methods improve visual quality in the pixel domain, these improvements [...] Read more.

In visual object recognition problems, low light exposure and low-quality images present significant challenges in navigation, surveillance, and image retrieval applications, where reliable feature detection is critical. Although recent deep learning–based image enhancement methods improve visual quality in the pixel domain, these improvements often do not translate to downstream machine vision performance, as important local gradient structures required for stable key point detection are frequently suppressed. In this work, we propose IllumiSIFT, a task-driven dark image enhancement framework that focuses on preserving Scale-Invariant Feature Transform (SIFT) key points by directly learning the Difference-of-Gaussian (DoG) pyramid from low-light image inputs. Unlike conventional pixel-level recovery approaches, the proposed method employs a cascaded residual learning architecture to predict Gaussian-blurred representations at multiple scales, enabling the generation of enhanced DoG images that are inherently aligned with the SIFT detection process. Extensive experiments conducted on the CDVS, Oxford Buildings, and Paris datasets demonstrate that the proposed approach consistently outperforms state-of-the-art enhancement methods in downstream SIFT matching performance under severe low-light conditions. These results confirm that gradient-domain, task-aligned enhancement provides a more effective and practical solution for recognition-centric low-light imaging applications. Full article

(This article belongs to the Special Issue Advanced Signal and Image Processing Techniques for Sensor Applications)

► Show Figures

Figure 1

14 pages, 2547 KB

Open AccessArticle

A Real Maritime Infrared Image Denoising Network Based on Joint Spatial and Wavelet Domains

by He Xu, Lili Dong, Mengge Wang, Yingjie Ji and Fang Tang

J. Mar. Sci. Eng. 2026, 14(7), 644; https://doi.org/10.3390/jmse14070644 - 31 Mar 2026

Viewed by 144

Abstract

High-quality maritime infrared images are crucial for accurate object detection, classification, and segmentation in maritime environments. However, maritime infrared images are often degraded by various types of noise, including non-uniform noise and detector non-uniformity-induced fixed-pattern noise (e.g., vertical stripe noise), which pose significant [...] Read more.

High-quality maritime infrared images are crucial for accurate object detection, classification, and segmentation in maritime environments. However, maritime infrared images are often degraded by various types of noise, including non-uniform noise and detector non-uniformity-induced fixed-pattern noise (e.g., vertical stripe noise), which pose significant challenges for the aforementioned high-level vision tasks. A novel network, termed SWDNet (Spatial–Wavelet Joint Denoising Network), is proposed to jointly model spatial- and wavelet-domain features, enabling the effective enhancement of maritime infrared image quality while preserving fine image details. Two parallel sub-networks with distinct architectures are employed to extract complementary information for maritime infrared image denoising. In the upper branch, hierarchical spatial attention aggregation (HSAA) modules are employed at multiple scales to extract spatial features and adaptively assign importance weights to different spatial locations. The lower branch employs a Haar-based DWT for sub-band decomposition, a pixel-grouped self-attention module for boundary refinement, and parallel multi-scale horizontal convolutions to suppress vertical stripe noise in the HL sub-band. Finally, the directional edge enhancement (DEE) module employs learnable Sobel operators in conjunction with multi-layer convolutions to effectively extract and enhance directional edge features. Experimental results demonstrate that, compared with state-of-the-art methods, the proposed SWDNet achieves superior denoising performance on both synthetic and real maritime infrared datasets. Full article

(This article belongs to the Section Ocean Engineering)

► Show Figures

Figure 1

16 pages, 2115 KB

Open AccessArticle

Multi-Scale Structural Response in Calligraphic Layout Deviation Detection

by Xun Shen, Zhanyang Xu, Liangchen Dai and Yaohui Niu

Appl. Sci. 2026, 16(7), 3346; https://doi.org/10.3390/app16073346 - 30 Mar 2026

Viewed by 180

Abstract

Structural deviation detection in calligraphic layout is an important problem in intelligent calligraphy tutoring systems. Existing approaches typically rely on isolated geometric or pixel-level statistics and lack a unified representation across spatial levels and scales. To address this issue, this study formulated a [...] Read more.

Structural deviation detection in calligraphic layout is an important problem in intelligent calligraphy tutoring systems. Existing approaches typically rely on isolated geometric or pixel-level statistics and lack a unified representation across spatial levels and scales. To address this issue, this study formulated a layout analysis for hard-pen regular script written in Tianzigē grids as a structural deviation detection task. A continuous writing density field was first constructed from the binary stroke foreground, and a three-level spatial partition consisting of page level, row-column level, and single cell level regions was established. Multi-scale structural responses (MSRs) were then computed within these regions to characterize layout deviations in a unified manner. Under controlled parametric perturbations, an original dataset of 1200 pages was evaluated to assess detection performance. In repeated experiments, the joint MSR features achieved an AUC of 0.94 and an F1-score of 0.90, outperforming geometric, pixel-statistical, page-level structural, and traditional machine-learning baselines. The results indicate that multi-level MSRs provide complementary structural information for reliable layout deviation detection and offer a useful basis for hierarchical diagnostic feedback in intelligent calligraphy tutoring systems. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

41 pages, 22723 KB

Open AccessArticle

Parameter-Efficient Adaptation of Generative-Foundation (Flux, Qwen) vs. Zero-Shot (Gemini, SAM3) Models for Aerial Image Segmentation

by Dina Shata, Simon Denman, Sara Omrani, Robin Drogemuller, Hend Ali and Ayman Wagdy

Buildings 2026, 16(7), 1369; https://doi.org/10.3390/buildings16071369 - 30 Mar 2026

Viewed by 358

Abstract

Accurate rooftop segmentation from aerial imagery is essential for large-scale urban analysis, including applications such as solar potential assessment and urban monitoring. However, it remains constrained by the high cost of dense annotation and the limited generalisation of supervised models across heterogeneous urban [...] Read more.

Accurate rooftop segmentation from aerial imagery is essential for large-scale urban analysis, including applications such as solar potential assessment and urban monitoring. However, it remains constrained by the high cost of dense annotation and the limited generalisation of supervised models across heterogeneous urban morphologies. This study investigates binary rooftop segmentation for fine-tuning large image-editing foundation models using parameter-efficient Low-Rank Adaptation (LoRA). Using parts of Brisbane metropolitan dataset (split 80/20 into 97 training and 24 testing tiles), three paradigms were evaluated under a unified protocol: zero-shot image-editing models (including Gemini 3 Pro), a segmentation-first baseline (Segment Anything Model 3, SAM3), and LoRA-adapted diffusion models (FLUX.1 Kontext, FLUX.2, and Qwen Image Edit 2509) fine-tuned each 250 steps up to 5000 steps. Evaluated under zero-shot conditions, the generative models demonstrated varying levels of boundary fidelity. The Gemini model achieved a strong zero-shot baseline with [IoU, Dice] scores of [85%, 91%], followed by the SAM3 baseline, which also achieved a stable [84%, 91%] but exhibited increased false negatives in visually complex scenes. The tested diffusion models (FLUX.1 Kontext, FLUX.2, and Qwen) showed more limited initial spatial overlap, scoring [45%, 55%], [67%, 78%], and [33%, 46%], respectively. Following LoRA adaptation, the FLUX and Qwen models showed substantial improvements, with their respective [IoU, Dice] metrics increasing to [89%, 94%], [82%, 90%], and [87%, 93%]. FLUX.1 Kontext achieved the strongest overall performance at step 4250, yielding a mean IoU of 89% (SD = 3.16%) and a pixel accuracy exceeding 96%. These results demonstrate that parameter-efficient fine-tuning, combined with rigorous evaluation under class-imbalanced conditions, can transform general-purpose generative models into competitive, scalable spatial analysis tools that match or exceed both dedicated segmentation baselines and strong zero-shot multimodal models. Full article

(This article belongs to the Topic Application of Smart Technologies in Buildings)

► Show Figures

Figure 1

Search Results (2,794)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2,794)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI