MDPI - Publisher of Open Access Journals

18 pages, 10514 KB

Open AccessArticle

Hierarchical Compositional Alignment for Zero-Shot Part-Level Segmentation

by Shan Yang, Shujie Ji, Zhendong Xiao, Xiongding Liu and Wu Wei

Sensors 2026, 26(7), 2130; https://doi.org/10.3390/s26072130 - 30 Mar 2026

In robotic fine-grained tasks (e.g., grasping and assembly), precise interaction requires a detailed understanding of object components. While Visual Language Models (VLMs) excel at object-level recognition, they struggle with part-level segmentation (e.g., knife handles), limiting performance in complex scenarios. VLMs face three key [...] Read more.

In robotic fine-grained tasks (e.g., grasping and assembly), precise interaction requires a detailed understanding of object components. While Visual Language Models (VLMs) excel at object-level recognition, they struggle with part-level segmentation (e.g., knife handles), limiting performance in complex scenarios. VLMs face three key challenges: (1) Visual granularity mismatch—object-level features lack part-level details; (2) Semantic hierarchy gaps—parts and objects differ significantly in semantics; (3) Cross-modal bias—CLIP’s text–image alignment favors global over local features. To address these, we propose a one-stage VLM-based part segmentation method. First, the Hierarchy-Aware Feature Selection mechanism analyzes Transformer features in different hierarchies to enhance spatial and semantic precision for part segmentation. Second, the Multi-Hierarchy Feature Adapter bridges object-to-part feature granularity via the hierarchical adaptation. Finally, the Hierarchical Multimodal Alignment Module harmonizes classification accuracy and mask integrity via hierarchical alignment of vision–language, mitigating the bias of CLIP’s object-level priori knowledge. Experiments show the proposed method improves part segmentation performance for Zero-Shot, achieving 25.86% on Pascal-Part and 13.09% on ADE20K-Part (gains of +0.81% hIoU and +2.96% hIoU over baseline). This work advances robotic visual perception, with applications in intelligent manufacturing and intelligent service. Full article

(This article belongs to the Section Sensors and Robotics)

► Show Figures

Figure 1

19 pages, 1666 KB

Open AccessArticle

MTLL: A Novel Multi-Task Learning Approach for Lymphocytic Leukemia Classification and Nucleus Segmentation

by Cuisi Ou, Zhigang Hu, Xinzheng Wang, Kaiwen Cao and Yipei Wang

Electronics 2026, 15(7), 1419; https://doi.org/10.3390/electronics15071419 - 28 Mar 2026

Abstract

Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for [...] Read more.

Bone marrow cell classification and nucleus segmentation in microscopic images are fundamental tasks for computer-aided diagnosis of lymphocytic leukemia. However, bone marrow cells from different subtypes exhibit high morphological similarity, and structural information is often constrained under optical microscopic imaging, posing challenges for stable and effective feature representation. To address this issue, we propose MTLL (Multitask Model on Lymphocytic Leukemia), a novel multitask approach that performs cell classification and nucleus segmentation within a unified network to exploit their complementary information. The model constructs a hybrid backbone for shared feature representation based on a CNN-Transformer architecture, in which Fuse-MBConv modules are tightly integrated with multilayer multi-scale transformers to enable deep fusion of local texture and global semantic information. For the segmentation branch, we design an AM (Atrous Multilayer Perceptron) decoder that combines atrous spatial pyramid pooling with multilayer perceptrons to fuse multi-scale information and accurately delineate nucleus boundaries. The classification branch incorporates prior knowledge of cell nuclei structures to capture subtle variations in cellular morphology and texture, thereby enhancing the model’s ability to distinguish between leukemia subtypes. Experimental results demonstrate that the MTLL model significantly outperforms existing advanced single-task and multi-task models in both lymphocytic leukemia classification and cell nucleus segmentation. These results validate the effectiveness of the multi-task feature-sharing strategy for lymphocytic leukemia diagnosis using bone marrow microscopic images. Full article

► Show Figures

Figure 1

33 pages, 172200 KB

Open AccessArticle

HDCGAN+: A Low-Illumination UAV Remote Sensing Image Enhancement and Evaluation Method Based on WPID

by Kelly Chen Ke, Min Sun, Xinyi Wang, Dong Liu and Hanjun Yang

Remote Sens. 2026, 18(7), 999; https://doi.org/10.3390/rs18070999 - 26 Mar 2026

Viewed by 143

Abstract

Remote sensing images acquired by UAVs under nighttime or low-illumination conditions suffer from insufficient illumination, leading to degraded image quality, detail loss, and noise, which restrict their application in public security and disaster emergency scenarios. Although existing machine learning-based enhancement methods can recover [...] Read more.

Remote sensing images acquired by UAVs under nighttime or low-illumination conditions suffer from insufficient illumination, leading to degraded image quality, detail loss, and noise, which restrict their application in public security and disaster emergency scenarios. Although existing machine learning-based enhancement methods can recover part of the missing information, they often cause color distortion and texture inconsistency. This study proposes an improved low-illumination image enhancement method based on a Weakly Paired Image Dataset (WPID), combining the Hierarchical Deep Convolutional Generative Adversarial Network (HDCGAN) with a low-rank image fusion strategy to enhance the quality of low-illumination UAV remote sensing images. First, YCbCr color channel separation is applied to preserve color information from visible images. Then, a Low-Rank Representation Fusion Network (LRRNet) is employed to perform structure-aware fusion between thermal infrared (TIR) and visible images, thereby enabling effective preservation of structural details and realistic color appearance. Furthermore, a weakly paired training mechanism is incorporated into HDCGAN to enhance detail restoration and structural fidelity. To achieve objective evaluation, a structural consistency assessment framework is constructed based on semantic segmentation results from the Segment Anything Model (SAM). Experimental results demonstrate that the proposed method outperforms state-of-the-art approaches in both visual quality and application-oriented evaluation metrics. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

24 pages, 4289 KB

Open AccessArticle

Floor Plan Generation of Existing Buildings Based on Deep Learning and Stereo Vision

by Dejiang Wang and Taoyu Peng

Buildings 2026, 16(7), 1310; https://doi.org/10.3390/buildings16071310 - 26 Mar 2026

Viewed by 200

Abstract

The reinforcement and renovation of existing buildings constitute an important component of the future development of the civil engineering industry. Such projects typically require the original construction drawings of the building. However, for older structures, the original paper-based drawings may be damaged or [...] Read more.

The reinforcement and renovation of existing buildings constitute an important component of the future development of the civil engineering industry. Such projects typically require the original construction drawings of the building. However, for older structures, the original paper-based drawings may be damaged or lost. Moreover, traditional manual surveying and mapping methods are time-consuming, labor-intensive, and limited in accuracy. To address these issues, this paper proposes a floor plan generation method for existing buildings that integrates deep learning and stereo vision based on a fusion of synthetic and real data. First, collaborative modeling and automated rendering between a large language model and Blender are implemented based on the Model Context Protocol (MCP), enabling indoor scene modeling and image acquisition to construct a synthetic dataset containing structural components such as doors, windows, and walls. Meanwhile, manually annotated real indoor images are incorporated. Synthetic and real data are mixed in different proportions to form multiple dataset configurations for model training and validation. Subsequently, the SegFormer model is employed to perform semantic segmentation of indoor components. Combined with stereo camera calibration results, disparity computation is conducted to extract the three-dimensional spatial coordinates of component corner points. On this basis, the architectural floor plan is generated according to the spatial geometric relationships among structural components. Experimental results demonstrate that the proposed method effectively reduces the need for manual annotation and on-site measurement, providing an efficient technical solution for indoor floor plan generation of existing buildings. Full article

(This article belongs to the Topic Application of Smart Technologies in Buildings)

► Show Figures

Figure 1

26 pages, 3329 KB

Open AccessArticle

Multi-Class Weed Quantification Based on U-Net Convolutional Neural Networks Using UAV Imagery

by Lucía Sandoval-Pillajo, Marco Pusdá-Chulde, Jorge Pazos-Morillo, Pedro Granda-Gudiño and Iván García-Santillán

Appl. Sci. 2026, 16(7), 3149; https://doi.org/10.3390/app16073149 - 25 Mar 2026

Viewed by 445

Abstract

Weed identification and quantification are processes that are usually manual, subjective, and error-prone. Weeds compete with crops for nutrients, minerals, physical space, sunlight, and water. Thus, weed identification is a crucial component of precision agriculture for autonomous removal and site-specific treatments, efficient weed [...] Read more.

Weed identification and quantification are processes that are usually manual, subjective, and error-prone. Weeds compete with crops for nutrients, minerals, physical space, sunlight, and water. Thus, weed identification is a crucial component of precision agriculture for autonomous removal and site-specific treatments, efficient weed control, and sustainability. Convolutional Neural Networks (CNNs) are very common in weed identification. This work implemented CNN models for semantic segmentation based on the U-Net architecture for automatically segmenting and quantifying weeds in potato crops using RGB images acquired by a drone at 9–10 m height, flying at 1 m/s. Remote sensing images are affected by factors that degrade image quality and the model’s accuracy. Five U-Net variants were evaluated: the original U-Net, Residual U-Net, Double U-Net, Modified U-Net, and AU-Net. The models were trained using the TensorFlow/Keras frameworks on Google Colab Pro+, following the Knowledge Discovery in Databases (KDD) methodology for image analysis. Each model was trained using a diverse custom dataset in uncontrolled environments, considering six classes: background, Broadleaf dock (Rumex obtusifolius), Dandelion (Taraxacum officinale), Kikuyu grass (Cenchrus clandestinum), other weed species, and the crop potato (Solanum tuberosum L.). The models’ segmentation was widely assessed using Mean Dice Coefficient, Mean IoU, and Dice Loss metrics. The results showed that the Residual U-Net model performed the best in multi-class segmentation, achieving a Mean IoU of 0.8021, a performance comparable to or superior to that reported by other authors. Additionally, a Student’s t-test was applied to complement the data analysis, suggesting that the model is reliable for weed quantification. Full article

(This article belongs to the Collection Agriculture 4.0: From Precision Agriculture to Smart Agriculture)

► Show Figures

Figure 1

25 pages, 13685 KB

Open AccessArticle

Vision and Language Reference for a Segment Anything Model for Few-Shot Segmentation

by Kosuke Sakurai, Ryotaro Shimizu and Masayuki Goto

J. Imaging 2026, 12(4), 143; https://doi.org/10.3390/jimaging12040143 - 24 Mar 2026

Viewed by 215

Abstract

Segment Anything Model (SAM)-based few-shot segmentation models traditionally rely solely on annotated reference images as prompts, which inherently limits their accuracy due to an over-reliance on visual cues and a lack of semantic context. This reliance leads to incorrect segmentation, where visually similar [...] Read more.

Segment Anything Model (SAM)-based few-shot segmentation models traditionally rely solely on annotated reference images as prompts, which inherently limits their accuracy due to an over-reliance on visual cues and a lack of semantic context. This reliance leads to incorrect segmentation, where visually similar objects from different categories are incorrectly identified as the target object. We propose Vision and Language Reference Prompt into SAM (VLP-SAM), a novel few-shot segmentation model that integrates both visual information of reference images and semantic information of text labels into SAM. VLP-SAM introduces a vision-language model (VLM) with pixel–text matching into the prompt encoder for SAM, effectively leveraging textual semantic consistency while preserving SAM’s extensive segmentation knowledge. By incorporating task-specific structures such as an attention mask, our model achieves superior few-shot segmentation performance with only 1.4 M learnable parameters. Evaluations on PASCAL-5ⁱ and COCO-20ⁱ datasets demonstrate that VLP-SAM significantly outperforms previous methods by 6.8% and 9.3% in mIoU, respectively. Furthermore, VLP-SAM exhibits strong generalization across unseen objects and cross-domain scenarios, highlighting the robustness provided by textual semantic guidance. This study offers an effective and scalable framework for few-shot segmentation with multimodal prompts. Full article

(This article belongs to the Special Issue Trustworthy Multimodal Vision Models: Generalization, Robustness, and Explainability)

► Show Figures

Figure 1

18 pages, 6071 KB

Open AccessArticle

DFENet: A Novel Dual-Path Feature Extraction Network for Semantic Segmentation of Remote Sensing Images

by Li Cao, Zishang Liu, Yan Wang and Run Gao

J. Imaging 2026, 12(3), 141; https://doi.org/10.3390/jimaging12030141 - 23 Mar 2026

Viewed by 205

Abstract

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while [...] Read more.

Semantic segmentation of remote sensing images (RSIs) is a fundamental task in geoscience research. However, designing efficient feature fusion modules remains challenging for existing dual-branch or multi-branch architectures. Furthermore, existing deep learning-based architectures predominantly concentrate on spatial feature modeling and context capturing while inherently neglecting the exploration and utilization of critical frequency-domain features, which is crucial for addressing issues of semantic confusion and blurred boundaries in complex remote sensing scenes. To address the challenges of feature fusion and the lack of frequency-domain information, we propose a novel dual-path feature extraction network (DFENet) in this paper. Specifically, a dual-path module (DPM) is developed in DFENet to extract global and local features, respectively. In the global path, after applying the channel splitting strategy, four feature extraction strategies are innovatively integrated to extract global features from different granularities. According to the strategy of supplementing frequency-domain information, a frequency-domain feature extraction block (FFEB) dominated by discrete Wavelet transform (DWT) is designed to effectively captures both high- and low-frequency components. Experimental results show that our method outperforms existing state-of-the-art methods in terms of segmentation performance, achieving a mean intersection over union (mIoU) of 83.09% on the ISPRS Vaihingen dataset and 86.05% on the ISPRS Potsdam dataset. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

30 pages, 2362 KB

Open AccessArticle

SGCAD: A SAR-Guided Confidence-Gated Distillation Framework of Optical and SAR Images for Water-Enhanced Land-Cover Semantic Segmentation

by Junjie Ma, Zhiyi Wang, Yanyi Yuan and Fengming Hu

Remote Sens. 2026, 18(6), 962; https://doi.org/10.3390/rs18060962 - 23 Mar 2026

Viewed by 186

Abstract

Multimodal fusion of synthetic aperture radar (SAR) and optical imagery is widely used in Earth observation for applications such as land-cover mapping and surface-water mapping (including post-event flood mapping under near-synchronous acquisitions) and land-use inventory. Optical images provide rich spectral and texture cues, [...] Read more.

Multimodal fusion of synthetic aperture radar (SAR) and optical imagery is widely used in Earth observation for applications such as land-cover mapping and surface-water mapping (including post-event flood mapping under near-synchronous acquisitions) and land-use inventory. Optical images provide rich spectral and texture cues, whereas SAR offers all-weather structural information that is complementary but heterogeneous. In practice, this heterogeneity often introduces fusion conflicts in multi-class segmentation, causing critical categories such as water bodies to be under-optimized. To address this issue, this paper presents a SAR-guided class-aware knowledge distillation (SGCAD) method for multimodal semantic segmentation. First, a SAR-only HRNet is trained as a water-expert teacher to learn discriminative backscattering and boundary priors for water extraction. Second, a lightweight multimodal student model (LightMCANet) is optimized using a class-aware distillation strategy that transfers teacher knowledge only within high-confidence water regions, thereby suppressing noisy supervision and reducing interference to other classes. Third, a SAR edge guidance module (SEGM) is introduced in the decoder to enhance boundary continuity for slender structures such as water bodies and roads. Overall, SGCAD improves targeted category learning while maintaining stable performance across the remaining classes. Experiments on a self-built dataset from GF-1 optical and LuTan-1 SAR imagery demonstrate higher overall accuracy and more coherent water/road predictions than representative baselines. Future work will extend the proposed distillation scheme to additional categories and broader geographic scenes. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

30 pages, 18176 KB

Open AccessArticle

CRECA-Net: Class Representation-Enhanced Class-Aware Network for Semantic Segmentation of High-Resolution Remote Sensing Images

by Ruolan Liu, Bingcai Chen, Lin Yu and Shaodong Zhang

Remote Sens. 2026, 18(6), 950; https://doi.org/10.3390/rs18060950 - 21 Mar 2026

Viewed by 152

Abstract

High-resolution remote sensing (RS) images exhibit complex backgrounds, large intra-class variability, and low inter-class differences, posing substantial challenges for semantic segmentation. Although existing class-level contextual modeling methods partially alleviate these issues, they often overlook the importance of accurate and discriminative class representations and [...] Read more.

High-resolution remote sensing (RS) images exhibit complex backgrounds, large intra-class variability, and low inter-class differences, posing substantial challenges for semantic segmentation. Although existing class-level contextual modeling methods partially alleviate these issues, they often overlook the importance of accurate and discriminative class representations and fail to effectively handle hard samples during training. To address these limitations, we propose CRECA-Net, a class representation-enhanced class-aware network designed from two complementary perspectives: class prototype refinement and difficulty-aware learning. Specifically, we introduce a class prototype refinement (CPR) module that improves class representations through pixel selection, confidence-aware contribution weighting, and an inter-class prototype separation loss, yielding more reliable and discriminative class centers. In addition, class-level context aggregation (CLCA) modules capture pixel-to-class prototype correlations via cross-attention to inject class-aware semantics into decoder features, thereby reducing interference from cluttered backgrounds and visually similar categories. Furthermore, a difficulty-aware (DA) loss dynamically estimates pixel-wise difficulty and redistributes the loss weights within each image, gradually shifting the learning focus from easy to hard samples while maintaining training stability. Extensive experiments on two benchmark RS segmentation datasets demonstrate that CRECA-Net consistently outperforms state-of-the-art methods across multiple evaluation metrics. Full article

► Show Figures

Figure 1

31 pages, 12141 KB

Open AccessArticle

A Reliability-Guided Unsupervised Domain Adaptation Framework for Robust Semantic Segmentation Under Adverse Driving Conditions

by Nan Xia and Guoqing Hu

Appl. Sci. 2026, 16(6), 3036; https://doi.org/10.3390/app16063036 - 20 Mar 2026

Viewed by 143

Abstract

Adverse weather and low illumination remain major challenges for autonomous driving perception, where semantic segmentation must stay reliable despite severe appearance degradation. In unsupervised domain adaptation without target annotations, self-training is widely used, but it is often limited by the inconsistent quality of [...] Read more.

Adverse weather and low illumination remain major challenges for autonomous driving perception, where semantic segmentation must stay reliable despite severe appearance degradation. In unsupervised domain adaptation without target annotations, self-training is widely used, but it is often limited by the inconsistent quality of teacher-generated pseudo labels across samples, regions, and training stages. This paper presents RaDA, a reliability-aware self-training framework that regulates pseudo supervision at three levels. First, a progressive exposure strategy determines which target images are admitted for training. Second, spatial reliability weighting suppresses gradients from degraded regions while retaining informative supervision. Third, adaptive teacher update scheduling stabilizes pseudo label generation over time. Experiments on real-world adverse driving benchmarks show that RaDA improves robustness, training stability, and cross-dataset generalization compared with strong baselines. Compared with the previous state-of-the-art method MIC, RaDA achieves mIoU gains of 10.6 percentage points on Foggy Zurich and 8.8 percentage points on the Foggy Driving benchmark. These results indicate that explicit reliability regulation can strengthen self-training domain adaptation for semantic segmentation in autonomous driving under challenging environmental conditions. Full article

► Show Figures

Figure 1

23 pages, 7102 KB

Open AccessArticle

Detection of Uniform Corrosion in Steel Pipes Using a Mobile Artificial Vision System

by Rafael Antonio Rodríguez Ospino, Cristhian Manuel Durán Acevedo and Jeniffer Katerine Carrillo Gómez

Corros. Mater. Degrad. 2026, 7(1), 21; https://doi.org/10.3390/cmd7010021 - 20 Mar 2026

Viewed by 232

Abstract

Corrosion in steel pipelines can cause critical failures in industrial systems, while conventional inspection methods such as radiography and ultrasonic testing are costly and require specialized personnel. This study presents a mobile computer vision system for automated corrosion detection inside steel pipes using [...] Read more.

Corrosion in steel pipelines can cause critical failures in industrial systems, while conventional inspection methods such as radiography and ultrasonic testing are costly and require specialized personnel. This study presents a mobile computer vision system for automated corrosion detection inside steel pipes using deep learning-based visual analysis. The proposed system consists of a Raspberry Pi 4-based mobile robot equipped with a high-resolution camera for internal inspection. Acquired images were processed using color-space transformations (RGB–HSV), filtering, and segmentation. Convolutional neural networks and semantic segmentation models, including YOLOv8-seg (Instance segmentation) and DeepLabV3 (Semantic segmentation), were trained on a custom corrosion image dataset to identify corroded regions. Real-time visualization was implemented via Flask-based video streaming. Experimental results demonstrated high detection accuracy for uniform corrosion, achieving a mean Intersection over Union (mIoU) above 0.98 and a precision of 0.99 with the YOLOv8-seg model. These results indicate that the proposed system enables reliable and automated corrosion inspection, with the potential to reduce inspection costs and improve operational efficiency. Future work will focus on enhancing real-time performance through hardware optimization. Full article

► Show Figures

Figure 1

15 pages, 20835 KB

Open AccessArticle

A Boundary-Assisted Multi-Scale Transformer for Object-Level Building Extraction from Satellite Remote Sensing Imagery

by Suju Li, Haoran Wang, Jing Yao, Zhaoming Wu and Zhengchao Chen

Electronics 2026, 15(6), 1301; https://doi.org/10.3390/electronics15061301 - 20 Mar 2026

Viewed by 189

Abstract

Building extraction is a core task in the semantic segmentation of satellite remote sensing imagery. Conventional pixel-level segmentation methods often prioritize texture over geometric structure, resulting in suboptimal performance in complex scenes affected by illumination variations, shadows, and scale changes. In this article, [...] Read more.

Building extraction is a core task in the semantic segmentation of satellite remote sensing imagery. Conventional pixel-level segmentation methods often prioritize texture over geometric structure, resulting in suboptimal performance in complex scenes affected by illumination variations, shadows, and scale changes. In this article, an innovative object-level building extraction approach is introduced to better capture the geometric structure of buildings, which incorporates superpixel segmentation to represent images as a set of adjacent regions. The proposed model consists of a cascade multi-scale fusion module (CMSFM) that progressively integrates contextual information across different receptive fields, along with a boundary-assisted loss function designed to enhance edge delineation and improve object-level accuracy. The experimental results on the WHU building dataset and the Massachusetts Buildings Dataset show that the proposed method notably outperforms other representative semantic segmentation approaches, such as FCN, UNet, DeepLab V3, and SETR. On the WHU dataset, MRLNet achieves the largest MIoU of 90.14% and the highest F1 score of 92.47%. On the Massachusetts Buildings Dataset, MRLNet attains the best MIoU of 83.14% and the highest F1 score of 90.46%. In addition, our building extraction model achieves a substantial performance improvement after the addition of the CMSFM module and the boundary-assisted loss function, demonstrating the effectiveness of these two enhancements used in our proposed model. It is expected that this research can provide a promising tool for the accurate extraction of buildings using satellite remote sensing images, which is indispensable in urban planning, disaster assessment, and other fields. Full article

► Show Figures

Figure 1

46 pages, 33541 KB

Open AccessArticle

AIFloodSense: A Global Aerial Imagery Dataset for Semantic Segmentation and Understanding of Flooded Environments

by Georgios Simantiris, Konstantinos Bacharidis, Apostolos Papanikolaou, Petros Giannakakis and Costas Panagiotakis

Remote Sens. 2026, 18(6), 938; https://doi.org/10.3390/rs18060938 - 19 Mar 2026

Viewed by 198

Abstract

Accurate flood detection is critical for disaster response, yet the scarcity of diverse annotated datasets hinders robust model development. Existing resources typically suffer from limited geographic scope and insufficient annotation granularity, restricting the generalization capabilities of computer vision methods. To bridge this gap, [...] Read more.

Accurate flood detection is critical for disaster response, yet the scarcity of diverse annotated datasets hinders robust model development. Existing resources typically suffer from limited geographic scope and insufficient annotation granularity, restricting the generalization capabilities of computer vision methods. To bridge this gap, we introduce AIFloodSense, a comprehensive evaluation benchmark designed to advance domain-generalized Artificial Intelligence for climate resilience. The dataset comprises 470 high-resolution aerial images capturing 230 distinct flood events across 64 countries and six continents. Unlike prior benchmarks, AIFloodSense ensures exceptional global diversity and temporal relevance (2022–2024), supporting three complementary tasks: (i) Image Classification, featuring novel sub-tasks for environment type, camera angle, and continent recognition; (ii) Semantic Segmentation, providing precise pixel-level masks for flood, sky, buildings, and background; and (iii) Visual Question Answering (VQA), enabling natural language reasoning for disaster assessment. We provide baseline benchmarks for all tasks using state-of-the-art architectures, demonstrating the dataset’s complexity and its utility in fostering robust AI tools for environmental monitoring. Crucially, we show that despite its compact size, AIFloodSense enables better generalization on external test sets than much larger alternatives, validating the premise that rigorous diversity is more effective than scale for training robust flood detection models, and is made publicly available to accelerate further research in the field. Full article

(This article belongs to the Special Issue Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing (Fourth Edition))

► Show Figures

Figure 1

25 pages, 233246 KB

Open AccessArticle

Seamlessly Natural: Image Stitching with Natural Appearance Preservation

by Gaetane Lorna N. Tchana, Damaris Belle M. Fotso, Antonio Hendricks and Christophe Bobda

Technologies 2026, 14(3), 186; https://doi.org/10.3390/technologies14030186 - 19 Mar 2026

Viewed by 171

Abstract

Conventional image stitching pipelines predominantly rely on homographic alignment, whose planar assumption often breaks down in dual-camera configurations capturing non-planar scenes, producing geometric warping, bulging, and structural distortion. To address these limitations, this paper presents SENA (Seamlessly Natural), a geometry-driven image stitching approach [...] Read more.

Conventional image stitching pipelines predominantly rely on homographic alignment, whose planar assumption often breaks down in dual-camera configurations capturing non-planar scenes, producing geometric warping, bulging, and structural distortion. To address these limitations, this paper presents SENA (Seamlessly Natural), a geometry-driven image stitching approach with three complementary contributions. First, we propose a hierarchical affine-based warping strategy that combines global affine initialization, local affine refinement, and a smooth free-form deformation field regulated by seamguard adaptive smoothing. This multi-scale design preserves local shape, parallelism, and aspect ratios, thereby reducing the hallucinated distortions commonly associated with homography-based models. Second, SENA incorporates a geometry-driven adequate zone detection mechanism that identifies regions with reduced parallax directly from the disparity consistency of correspondences filtered by RANSAC, without relying on semantic segmentation or depth estimation. Third, within this zone, anchor-based seamline cutting and segmentation enforce one-to-one geometric correspondence between image pairs, reducing ghosting and smearing artifacts. Extensive experiments demonstrate that SENA achieves 26.2 dB PSNR and 0.84 SSIM, obtains the lowest BRISQUE score (33.4) among compared methods, and reduces runtime by 79% on average across resolutions. These results confirm improved structural fidelity and computational efficiency while maintaining competitive alignment accuracy. Full article

(This article belongs to the Special Issue Image Analysis and Processing)

► Show Figures

Figure 1

22 pages, 8609 KB

Open AccessArticle

Integrating SimAM Attention and S-DRU Feature Reconstruction for Sentinel-2 Imagery-Based Soybean Planting Area Extraction

by Haotong Wu, Xinwen Wan, Rong Qian, Chao Ruan, Jinling Zhao and Chuanjian Wang

Agriculture 2026, 16(6), 693; https://doi.org/10.3390/agriculture16060693 - 19 Mar 2026

Viewed by 221

Abstract

Accurate and stable acquisition of the spatial distribution of soybean planting areas is essential for supporting precision agricultural monitoring and ensuring food security. However, crop remote-sensing mapping for specific regions still faces critical data bottlenecks: high-precision, large-scale pixel-level annotation is costly, resulting in [...] Read more.

Accurate and stable acquisition of the spatial distribution of soybean planting areas is essential for supporting precision agricultural monitoring and ensuring food security. However, crop remote-sensing mapping for specific regions still faces critical data bottlenecks: high-precision, large-scale pixel-level annotation is costly, resulting in scarce available labeled samples that make it difficult to construct large-scale training datasets. Although parameter-intensive models such as FCN and SegNet can achieve sufficient end-to-end training on large-scale public remote sensing datasets like LoveDA, when directly applied to the data-limited dataset in this study area, the models are prone to overfitting, leading to a significant decline in generalization ability. To address these issues, this study proposes a lightweight U-shaped semantic segmentation model, SimSDRU-Net. The model utilizes a pre-trained VGG-16 backbone to extract shallow texture and deep semantic features. The pre-trained weights mitigate the impact of overfitting in data-limited settings. In the decoding stage, a parameter-free lightweight SimAM attention module enhances effective soybean features and suppresses soil background redundancy, while an embedded S-DRU unit fuses multi-scale features for deep complementary reconstruction to improve edge detail capture. A label dataset was constructed using Sentinel-2 images as the data source and Menard County (USA) as the study area. The USDA CDL was used as a foundation for the dataset, with Google high-resolution images serving as visual interpretation aids. In the context of the experiment, Deeplabv3+ and U-Net++ were compared with U-Net under identical conditions. The results demonstrated that SimSDRU-Net exhibited optimal performance, with MIoU of 89.03%, MPA of 93.81%, and OA of 95.96%. Specifically, SimSDRU-Net uses the SimAM attention module to generate spatial attention weights by analyzing feature statistical differences through an energy function, so as to adaptively enhance soybean texture features. Meanwhile, the S-DRU unit groups, dynamically weights, and cross-branch reconstructs multi-scale convolutional features to preserve fine boundary details and achieve accurate segmentation of soybean plots. The present study demonstrates that SimSDRU-Net integrates lightweight design and high precision in data-limited scenarios, thereby providing effective technical support for the rapid extraction of soybean planting areas in North America. Full article

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

► Show Figures

Figure 1

Search Results (2,500)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (2,500)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI