MDPI - Publisher of Open Access Journals

22 pages, 2485 KiB

Open AccessArticle

Infrared and Visible Image Fusion Using a State-Space Adversarial Model with Cross-Modal Dependency Learning

by Qingqing Hu, Yiran Peng, KinTak U and Siyuan Zhao

Mathematics 2025, 13(15), 2333; https://doi.org/10.3390/math13152333 - 22 Jul 2025

Infrared and visible image fusion plays a critical role in multimodal perception systems, particularly under challenging conditions such as low illumination, occlusion, or complex backgrounds. However, existing approaches often struggle with global feature modelling, cross-modal dependency learning, and preserving structural details in the [...] Read more.

Infrared and visible image fusion plays a critical role in multimodal perception systems, particularly under challenging conditions such as low illumination, occlusion, or complex backgrounds. However, existing approaches often struggle with global feature modelling, cross-modal dependency learning, and preserving structural details in the fused images. In this paper, we propose a novel adversarial fusion framework driven by a state-space modelling paradigm to address these limitations. In the feature extraction phase, a computationally efficient state-space model is utilized to capture global semantic context from both infrared and visible inputs. A cross-modality state-space architecture is then introduced in the fusion phase to model long-range dependencies between heterogeneous features effectively. Finally, a multi-class discriminator, trained under an adversarial learning scheme, enhances the structural fidelity and detail consistency of the fused output. Extensive experiments conducted on publicly available infrared–visible fusion datasets demonstrate that the proposed method achieves superior performance in terms of information retention, contrast enhancement, and visual realism. The results confirm the robustness and generalizability of our framework for complex scene understanding and downstream tasks such as object detection under adverse conditions. Full article

(This article belongs to the Special Issue Advances in Artificial Intelligence, Machine Learning and Optimization)

► Show Figures

Figure 1

18 pages, 2570 KiB

Open AccessArticle

Applicability of Visible–Near-Infrared Spectroscopy to Predicting Water Retention in Japanese Forest Soils

by Rando Sekiguchi, Tatsuya Tsurita, Masahiro Kobayashi and Akihiro Imaya

Forests 2025, 16(7), 1182; https://doi.org/10.3390/f16071182 - 17 Jul 2025

Viewed by 173

Abstract

This study assessed the applicability of visible–near-infrared (vis-NIR) spectroscopy to predicting the water retention characteristics of forest soils in Japan, which vary widely owing to the presence of volcanic ash. Soil samples were collected from 34 sites, and the volumetric water content was [...] Read more.

This study assessed the applicability of visible–near-infrared (vis-NIR) spectroscopy to predicting the water retention characteristics of forest soils in Japan, which vary widely owing to the presence of volcanic ash. Soil samples were collected from 34 sites, and the volumetric water content was measured at eight levels of matric suction. Spectral data were processed by using the second derivative of the absorbance, and regression models were developed by using explainable boosting machine (EBM), which is an interpretable machine learning method. Although the prediction accuracy was limited owing to the small sample size and soil heterogeneity, EBM performed better under saturated conditions (R² = 0.30), which suggests that vis-NIR spectroscopy can capture water-related features, especially under wet conditions. Importance analysis consistently selected wavelengths that were associated with organic matter and hydrated clay minerals. The important wavelengths clearly shifted from free-water bands in wet soils to mineral-related absorption bands in dry soils. These findings highlight the potential of coupling vis-NIR spectroscopy with interpretable models like EBM for estimating the hydraulic properties of forest soils. Improved accuracy is expected with larger datasets and stratified models by soil type, which can facilitate more efficient soil monitoring in forests. Full article

(This article belongs to the Section Forest Soil)

► Show Figures

Figure 1

23 pages, 9575 KiB

Open AccessArticle

Infrared and Visible Image Fusion via Residual Interactive Transformer and Cross-Attention Fusion

by Liquan Zhao, Chen Ke, Yanfei Jia, Cong Xu and Zhijun Teng

Sensors 2025, 25(14), 4307; https://doi.org/10.3390/s25144307 - 10 Jul 2025

Viewed by 259

Abstract

Infrared and visible image fusion combines infrared and visible images of the same scene to produce a more informative and comprehensive fused image. Existing deep learning-based fusion methods fail to establish dependencies between global and local information during feature extraction. This results in [...] Read more.

Infrared and visible image fusion combines infrared and visible images of the same scene to produce a more informative and comprehensive fused image. Existing deep learning-based fusion methods fail to establish dependencies between global and local information during feature extraction. This results in unclear scene texture details and low contrast of the infrared thermal targets in the fused image. This paper proposes an infrared and visible image fusion network to address this issue via the use of a residual interactive transformer and cross-attention fusion. The network first introduces a residual dense module to extract shallow features from the input infrared and visible images. Next, the residual interactive transformer extracts global and local features from the source images and establishes interactions between them. Two identical residual interactive transformers are used for further feature extraction. A cross-attention fusion module is also designed to fuse the infrared and visible feature maps extracted by the residual interactive transformer. Finally, an image reconstruction network generates the fused image. The proposed method is evaluated on the RoadScene, TNO, and M3FD datasets. The experimental results show that the fused images produced by the proposed method contain more visible texture details and infrared thermal information. Compared to nine other methods, the proposed approach achieves superior fusion performance. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

20 pages, 2516 KiB

Open AccessArticle

Visual Attention Fusion Network (VAFNet): Bridging Bottom-Up and Top-Down Features in Infrared and Visible Image Fusion

by Yaochen Liu, Yunke Wang and Zixuan Jing

Symmetry 2025, 17(7), 1104; https://doi.org/10.3390/sym17071104 - 9 Jul 2025

Viewed by 168

Abstract

Infrared and visible image fusion aims to integrate useful information from the source image to obtain a fused image that not only has excellent visual perception but also promotes the performance of the subsequent object detection task. However, due to the asymmetry between [...] Read more.

Infrared and visible image fusion aims to integrate useful information from the source image to obtain a fused image that not only has excellent visual perception but also promotes the performance of the subsequent object detection task. However, due to the asymmetry between image fusion and object detection tasks, obtaining superior visual effects while facilitating object detection tasks remains challenging in real-world applications. Addressing this issue, we propose a novel visual attention fusion network for infrared and visible image fusion (VAFNet), which can bridge bottom-up and top-down features to achieve high-quality visual perception while improving the performance of object detection tasks. The core idea is that bottom-up visual attention is utilized to extract multi-layer bottom-up features for ensuring superior visual perception, while top-down visual attention determines object attention signals related to object detection tasks. Then, a bidirectional attention integration mechanism is designed to naturally integrate two forms of attention into the fused image. Experiments on public and collection datasets demonstrate that VAFNet not only outperforms seven state-of-the-art (SOTA) fusion methods in qualitative and quantitative evaluation but also has advantages in facilitating object detection tasks. Full article

(This article belongs to the Special Issue Symmetry in Next-Generation Intelligent Information Technologies)

► Show Figures

Figure 1

21 pages, 2471 KiB

Open AccessArticle

Attention-Based Mask R-CNN Enhancement for Infrared Image Target Segmentation

by Liang Wang and Kan Ren

Symmetry 2025, 17(7), 1099; https://doi.org/10.3390/sym17071099 - 9 Jul 2025

Viewed by 314

Abstract

Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to [...] Read more.

Image segmentation is an important method in the field of image processing, while infrared (IR) image segmentation is one of the challenges in this field due to the unique characteristics of IR data. Infrared imaging utilizes the infrared radiation emitted by objects to produce images, which can supplement the performance of visible-light images under adverse lighting conditions to some extent. However, the low spatial resolution and limited texture details in IR images hinder the achievement of high-precision segmentation. To address these issues, an attention mechanism based on symmetrical cross-channel interaction—motivated by symmetry principles in computer vision—was integrated into a Mask Region-Based Convolutional Neural Network (Mask R-CNN) framework. A Bottleneck-enhanced Squeeze-and-Attention (BNSA) module was incorporated into the backbone network, and novel loss functions were designed for both the bounding box (Bbox) regression and mask prediction branches to enhance segmentation performance. Furthermore, a dedicated infrared image dataset was constructed to validate the proposed method. The experimental results demonstrate that the optimized model achieves higher segmentation accuracy and better segmentation performance compared to the original network and other mainstream segmentation models on our dataset, demonstrating how symmetrical design principles can effectively improve complex vision tasks. Full article

(This article belongs to the Special Issue Symmetry and Its Applications in Computer Vision)

► Show Figures

Figure 1

14 pages, 2742 KiB

Open AccessArticle

Non-Invasive Painting Pigment Classification Through Supervised Machine Learning

by Michal Piotr Markowski, Solongo Gansukh, Mateusz Madry, Robert Borowiec, Jaroslaw Rogoz and Boguslaw Szczupak

Appl. Sci. 2025, 15(13), 7594; https://doi.org/10.3390/app15137594 - 7 Jul 2025

Viewed by 211

Abstract

Accurate pigment classification is essential for the analysis and conservation of historical paintings. This study presents a non-invasive approach based on supervised machine learning for classifying pigments using image data acquired under three distinct spectral illumination conditions: visible-light reflectography (VIS), ultraviolet false-color imaging [...] Read more.

Accurate pigment classification is essential for the analysis and conservation of historical paintings. This study presents a non-invasive approach based on supervised machine learning for classifying pigments using image data acquired under three distinct spectral illumination conditions: visible-light reflectography (VIS), ultraviolet false-color imaging (UVFC), and infrared false-color imaging (IRFC). A dataset was constructed by extracting 32 × 32 pixel patches from pigment samples, resulting in 200 classes representing 40 unique pigments with five preparation variants each. A total of 600 initial raw images were acquired, from which 4000 image patches were extracted for feature engineering. Feature vectors were obtained from visible reflectography, ultraviolet false-color imaging (UVFC), and infrared false-color imaging (IRFC) using statistical descriptors derived from RGB channels. This study demonstrates that accurate pigment classification can be achieved with a minimal set of three illumination types, offering a practical and cost-effective alternative to hyperspectral imaging in real-world conservation practice. Among the evaluated classifiers, the random forest model achieved the highest accuracy (99.30

\pm 0.15

%). The trained model was subsequently validated on annotated regions of historical paintings, demonstrating its robustness and applicability. The proposed framework offers a lightweight, interpretable, and scalable solution for non-invasive pigment analysis in cultural heritage research that can be implemented with accessible imaging hardware and minimal post-processing. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

39 pages, 18642 KiB

Open AccessArticle

SDRFPT-Net: A Spectral Dual-Stream Recursive Fusion Network for Multispectral Object Detection

by Peida Zhou, Xiaoyong Sun, Bei Sun, Runze Guo, Zhaoyang Dang and Shaojing Su

Remote Sens. 2025, 17(13), 2312; https://doi.org/10.3390/rs17132312 - 5 Jul 2025

Viewed by 408

Abstract

Multispectral object detection faces challenges in effectively integrating complementary information from different modalities in complex environmental conditions. This paper proposes SDRFPT-Net (Spectral Dual-stream Recursive Fusion Perception Target Network), a novel architecture that integrates three key innovative modules: (1) the Spectral Hierarchical Perception Architecture [...] Read more.

Multispectral object detection faces challenges in effectively integrating complementary information from different modalities in complex environmental conditions. This paper proposes SDRFPT-Net (Spectral Dual-stream Recursive Fusion Perception Target Network), a novel architecture that integrates three key innovative modules: (1) the Spectral Hierarchical Perception Architecture (SHPA), which adopts a dual-stream separated structure with independently parameterized feature extraction paths for visible and infrared modalities; (2) the Spectral Recursive Fusion Module (SRFM), which combines hybrid attention mechanisms with recursive progressive fusion strategies to achieve deep feature interaction through parameter-sharing multi-round recursive processing; and (3) the Spectral Target Perception Enhancement Module (STPEM), which adaptively enhances target region representation and suppresses background interference. Extensive experiments on the VEDAI, FLIR-aligned, and LLVIP datasets demonstrate that SDRFPT-Net significantly outperforms state-of-the-art methods, with improvements of 2.5% in mAP50 and 5.4% in mAP50:95 on VEDAI, 11.5% in mAP50 on FLIR-aligned, and 9.5% in mAP50:95 on LLVIP. Ablation studies further validate the effectiveness of each proposed module. The proposed method provides an efficient and robust solution for multispectral object detection in remote sensing image interpretation, making it particularly suitable for all-weather monitoring applications from aerial and satellite platforms, as well as in intelligent surveillance and autonomous driving domains. Full article

► Show Figures

Figure 1

20 pages, 1935 KiB

Open AccessArticle

Residual Attention Network with Atrous Spatial Pyramid Pooling for Soil Element Estimation in LUCAS Hyperspectral Data

by Yun Deng, Yuchen Cao, Shouxue Chen and Xiaohui Cheng

Appl. Sci. 2025, 15(13), 7457; https://doi.org/10.3390/app15137457 - 3 Jul 2025

Viewed by 245

Abstract

Visible and near-infrared (Vis–NIR) spectroscopy enables the rapid prediction of soil properties but faces three limitations with conventional machine learning: information loss and overfitting from high-dimensional spectral features; inadequate modeling of nonlinear soil–spectra relationships; and failure to integrate multi-scale spatial features. To address [...] Read more.

Visible and near-infrared (Vis–NIR) spectroscopy enables the rapid prediction of soil properties but faces three limitations with conventional machine learning: information loss and overfitting from high-dimensional spectral features; inadequate modeling of nonlinear soil–spectra relationships; and failure to integrate multi-scale spatial features. To address these challenges, we propose ReSE-AP Net, a multi-scale attention residual network with spatial pyramid pooling. Built on convolutional residual blocks, the model incorporates a squeeze-and-excitation channel attention mechanism to recalibrate feature weights and an atrous spatial pyramid pooling (ASPP) module to extract multi-resolution spectral features. This architecture synergistically represents weak absorption peaks (400–1000 nm) and broad spectral bands (1000–2500 nm), overcoming single-scale modeling limitations. Validation on the LUCAS2009 dataset demonstrated that ReSE-AP Net outperformed conventional machine learning by improving the R² by 2.8–36.5% and reducing the RMSE by 14.2–69.2%. Compared with existing deep learning methods, it increased the R² by 0.4–25.5% for clay, silt, sand, organic carbon, calcium carbonate, and phosphorus predictions, and decreased the RMSE by 0.7–39.0%. Our contributions include statistical analysis of LUCAS2009 spectra, identification of conventional method limitations, development of the ReSE-AP Net model, ablation studies, and comprehensive comparisons with alternative approaches. Full article

(This article belongs to the Special Issue Advanced Agricultural Technologies: Monitoring, Modeling, and Machine Learning Techniques)

► Show Figures

Figure 1

25 pages, 2723 KiB

Open AccessArticle

A Human-Centric, Uncertainty-Aware Event-Fused AI Network for Robust Face Recognition in Adverse Conditions

by Akmalbek Abdusalomov, Sabina Umirzakova, Elbek Boymatov, Dilnoza Zaripova, Shukhrat Kamalov, Zavqiddin Temirov, Wonjun Jeong, Hyoungsun Choi and Taeg Keun Whangbo

Appl. Sci. 2025, 15(13), 7381; https://doi.org/10.3390/app15137381 - 30 Jun 2025

Cited by 1 | Viewed by 271

Abstract

Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into [...] Read more.

Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into a single framework. This study introduces HUE-Net—a Human-centric, Uncertainty-aware, Event-fused Network—designed specifically to thrive under severe environmental stress. HUE-Net marries the visible RGB band with near-infrared (NIR) imagery and high-temporal-event data through an early-fusion pipeline, proven more responsive than serial approaches. A custom hybrid backbone that couples convolutional networks with transformers keeps the model nimble enough for edge devices. Central to the architecture is the perturbed multi-branch variational module, which distills probabilistic identity embeddings while delivering calibrated confidence scores. Complementing this, an Adaptive Spectral Attention mechanism dynamically reweights each stream to amplify the most reliable facial features in real time. Unlike previous efforts that compartmentalize uncertainty handling, spectral blending, or computational thrift, HUE-Net unites all three in a lightweight package. Benchmarks on the IJB-C and N-SpectralFace datasets illustrate that the system not only secures state-of-the-art accuracy but also exhibits unmatched spectral robustness and reliable probability calibration. The results indicate that HUE-Net is well-positioned for forensic missions and humanitarian scenarios where trustworthy identification cannot be deferred. Full article

(This article belongs to the Special Issue New Technologies and Applications of Visual-Based Human-Computer Interactions)

► Show Figures

Figure 1

23 pages, 2463 KiB

Open AccessArticle

MCDet: Target-Aware Fusion for RGB-T Fire Detection

by Yuezhu Xu, He Wang, Yuan Bi, Guohao Nie and Xingmei Wang

Forests 2025, 16(7), 1088; https://doi.org/10.3390/f16071088 - 30 Jun 2025

Viewed by 269

Abstract

Forest fire detection is vital for ecological conservation and disaster management. Existing visual detection methods exhibit instability in smoke-obscured or illumination-variable environments. Although multimodal fusion has demonstrated potential, effectively resolving inconsistencies in smoke features across diverse modalities remains a significant challenge. This issue [...] Read more.

Forest fire detection is vital for ecological conservation and disaster management. Existing visual detection methods exhibit instability in smoke-obscured or illumination-variable environments. Although multimodal fusion has demonstrated potential, effectively resolving inconsistencies in smoke features across diverse modalities remains a significant challenge. This issue stems from the inherent ambiguity between regions characterized by high temperatures in infrared imagery and those with elevated brightness levels in visible-light imaging systems. In this paper, we propose MCDet, an RGB-T forest fire detection framework incorporating target-aware fusion. To alleviate feature cross-modal ambiguity, we design a Multidimensional Representation Collaborative Fusion module (MRCF), which constructs global feature interactions via a state-space model and enhances local detail perception through deformable convolution. Then, a content-guided attention network (CGAN) is introduced to aggregate multidimensional features by dynamic gating mechanism. Building upon this foundation, the integration of WIoU further suppresses vegetation occlusion and illumination interference on a holistic level, thereby reducing the false detection rate. Evaluated on three forest fire datasets and one pedestrian dataset, MCDet achieves a mean detection accuracy of 77.5%, surpassing advanced methods. This performance makes MCDet a practical solution to enhance early warning system reliability. Full article

(This article belongs to the Special Issue Advanced Technologies for Forest Fire Detection and Monitoring)

► Show Figures

Figure 1

23 pages, 10930 KiB

Open AccessArticle

Geospatial Analysis of Patterns and Trends of Mangrove Forest in Saudi Arabia: Identifying At-Risk Zone-Based Land Use

by Amal H. Aljaddani

Sustainability 2025, 17(13), 5957; https://doi.org/10.3390/su17135957 - 28 Jun 2025

Viewed by 629

Abstract

Mangrove ecosystems are crucial coastal habitats that support life and regulate the Earth’s atmosphere. However, these ecosystems face prominent threats due to anthropogenic activities and environmental constraints. For instance, the Saudi Arabian coast is particularly vulnerable to species extinction and biodiversity loss due [...] Read more.

Mangrove ecosystems are crucial coastal habitats that support life and regulate the Earth’s atmosphere. However, these ecosystems face prominent threats due to anthropogenic activities and environmental constraints. For instance, the Saudi Arabian coast is particularly vulnerable to species extinction and biodiversity loss due to the fragility of the ecosystem; this highlights the need to understand the spatial and temporal dynamics of mangrove forests in desert environments. Hence, this is the first national study to quantify mangrove forests and analyze at-risk zone-based land use along Saudi Arabian coasts over 40 years. Thus, the primary contents of this research were (1) to produce a new long-term dataset covering the entire Saudi coastline, (2) to identify the patterns, analyze the trends, and quantify the change of mangrove areas, and (3) to determine vulnerability zoning of mangrove area-based land use and transportation networks. This study used Landsat satellite imagery via Google Earth Engine for national-scale mangrove mapping of Saudi Arabia between 1985 and 2024. Visible and infrared bands and seven spectral indices were employed as input features for the random forest classifier. The two classes used were mangrove and non-mangrove; the latter class included non-mangrove land-use and land-cover areas. Then, the study employed the output mangrove mapping to delineate vulnerable mangrove forest-based land use. The overall results showed a substantial increase in mangrove areas, ranging from 27.74 to 59.31 km² in the Red Sea and from 1.05 to 8.65 km² in the Arabian Gulf between 1985 and 2024, respectively. However, within this decadal trend, there were noticeable periods of decline. The spatial coverage of mangroves was larger on Saudi Arabia’s western coasts, especially the southwestern coasts, than on its eastern coasts. The overall accuracy, conducted annually, ranged between 91.00% and 98.50%. The results also show that expanding land uses and transportation networks within at-risk zones of mangrove forests may have a high potential effect. This study aimed to benefit the government, conservation agencies, coastal planners, and policymakers concerned with the preservation of mangrove habitats. Full article

► Show Figures

Figure 1

21 pages, 4961 KiB

Open AccessArticle

Application of Vis/NIR Spectroscopy in the Rapid and Non-Destructive Prediction of Soluble Solid Content in Milk Jujubes

by Yinhai Yang, Shibang Ma, Feiyang Qi, Feiyue Wang and Hubo Xu

Agriculture 2025, 15(13), 1382; https://doi.org/10.3390/agriculture15131382 - 27 Jun 2025

Viewed by 214

Abstract

Milk jujube has become an increasingly popular tropical fruit. The sugar content, which is commonly represented by the soluble solid content (SSC), is a key indicator of the flavor, internal quality, and market value of milk jujubes. Traditional methods for assessing SSC are [...] Read more.

Milk jujube has become an increasingly popular tropical fruit. The sugar content, which is commonly represented by the soluble solid content (SSC), is a key indicator of the flavor, internal quality, and market value of milk jujubes. Traditional methods for assessing SSC are time-consuming, labor-intensive, and destructive. These methods fail to meet the practical demands of the fruit market. A rapid, stable, and effective non-destructive detection method based on visible/near-infrared (Vis/NIR) spectroscopy is proposed here. A Vis/NIR reflectance spectroscopy system covering 340–1031 nm was constructed to detect SSC in milk jujubes. A structured spectral modeling framework was adopted, consisting of outlier elimination, dataset partitioning, spectral preprocessing, feature selection, and model construction. Comparative experiments were conducted at each step of the framework. Special emphasis was placed on the impact of outlier detection and dataset partitioning strategies on modeling accuracy. A data-augmentation-based unsupervised anomaly sample elimination (DAUASE) strategy was proposed to enhance the data validity. Multiple data partitioning strategies were evaluated, including random selection (RS), Kennard–Stone (KS), and SPXY methods. The KS method achieved the best preservation of the original data distribution, improving the model generalization. Several spectral preprocessing and feature selection methods were used to enhance the modeling performance. Regression models, including support vector regression (SVR), partial least squares regression (PLSR), multiple linear regression (MLR), and backpropagation neural network (BP), were compared. Based on a comprehensive analysis of the above results, the DAUASE + KS + SG + SNV + CARS + SVR model exhibited the highest prediction performance. Specifically, it achieved an average precision (AP_p) of 99.042% on the prediction set, a high coefficient of determination (R_P²) of 0.976, and a low root-mean-square error of prediction (RMSEP) of 0.153. These results indicate that Vis/NIR spectroscopy is highly effective and reliable for the rapid and non-destructive detection of SSC in milk jujubes, and it may also provide a theoretical basis for the practical application of rapid and non-destructive detection in milk jujubes and other jujube varieties. Full article

(This article belongs to the Section Agricultural Product Quality and Safety)

► Show Figures

Figure 1

20 pages, 5393 KiB

Open AccessArticle

A Semantic Segmentation Dataset and Real-Time Localization Model for Anti-UAV Applications

by Sang-Chul Kim and Yeong Min Jang

Appl. Sci. 2025, 15(13), 7183; https://doi.org/10.3390/app15137183 - 26 Jun 2025

Viewed by 342

Abstract

With the rapid development of the unmanned aerial vehicle (UAV) industry and applications, the integration of UAVs into daily life has increased significantly. However, this growing presence raises security concerns, leading to the emergence of anti-UAV technologies. Most existing anti-UAV systems rely on [...] Read more.

With the rapid development of the unmanned aerial vehicle (UAV) industry and applications, the integration of UAVs into daily life has increased significantly. However, this growing presence raises security concerns, leading to the emergence of anti-UAV technologies. Most existing anti-UAV systems rely on object detection techniques. Yet, these methods often struggle to detect small-sized UAVs accurately. Semantic segmentation, which predicts object locations at the pixel level, offers improved localization for such small targets. Due to the lack of existing datasets for anti-UAV semantic segmentation, we propose a new dataset comprising both infrared (IR) and visible light images. Our dataset includes a total of 605,045 paired UAV images and corresponding segmentation masks. To enhance object diversity and improve model robustness, the dataset integrates multiple existing sources. In addition to the dataset, we evaluate the performance of several baseline models on the semantic segmentation task. We also propose a lightweight model to demonstrate the feasibility of real-time UAV localization using semantic segmentation on VL and IR data. Full article

► Show Figures

Figure 1

28 pages, 11793 KiB

Open AccessArticle

Unsupervised Multimodal UAV Image Registration via Style Transfer and Cascade Network

by Xiaoye Bi, Rongkai Qie, Chengyang Tao, Zhaoxiang Zhang and Yuelei Xu

Remote Sens. 2025, 17(13), 2160; https://doi.org/10.3390/rs17132160 - 24 Jun 2025

Cited by 1 | Viewed by 346

Abstract

Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% [...] Read more.

Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% reduction in mean squared error (from 0.0106 to 0.0068), 9.27% enhancement in normalized cross-correlation, 26% improvement in local normalized cross-correlation, and 8% increase in mutual information compared to state-of-the-art methods. The architecture integrates a cross-modal style transfer network (CSTNet) that transforms visible images into pseudo-infrared representations to unify modality characteristics, and a multi-scale cascaded registration network (MCRNet) that performs progressive spatial alignment across multiple resolution scales using diffeomorphic deformation modeling to ensure smooth and invertible transformations. A self-supervised learning paradigm based on image reconstruction eliminates reliance on manually annotated data while maintaining registration accuracy through synthetic deformation generation. Extensive experiments on the LLVIP dataset demonstrate the method’s robustness under challenging conditions involving large-scale transformations, with ablation studies confirming that style transfer contributes 28% MSE improvement and diffeomorphic registration prevents 10.6% performance degradation. The proposed approach provides a robust solution for cross-modal image registration in dynamic UAV environments, offering significant implications for downstream applications such as target detection, tracking, and surveillance. Full article

(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)

► Show Figures

Graphical abstract

18 pages, 3118 KiB

Open AccessArticle

AetherGeo: A Spectral Analysis Interface for Geologic Mapping

by Gonçalo Santos, Joana Cardoso-Fernandes and Ana C. Teodoro

Algorithms 2025, 18(7), 378; https://doi.org/10.3390/a18070378 - 21 Jun 2025

Viewed by 387

Abstract

AetherGeo is a standalone piece of software (current version 1.0) that aims to enable the user to analyze raster data, with a special focus on processing multi- and hyperspectral images. Being developed in Python 3.12.4, this application is a free, open-source alternative for [...] Read more.

AetherGeo is a standalone piece of software (current version 1.0) that aims to enable the user to analyze raster data, with a special focus on processing multi- and hyperspectral images. Being developed in Python 3.12.4, this application is a free, open-source alternative for spectral analysis, something considered beneficial for researchers, allowing for a flexible approach to start working on the topic without acquiring proprietary software licenses. It provides the user with a set of tools for spectral data analysis through classical approaches, such as band ratios and RGB combinations, but also more elaborate techniques, such as endmember extraction and unsupervised image classification with partial spectral unmixing techniques. While it has been tested on visible and near-infrared (VNIR), short-wave infrared (SWIR), and VNIR-SWIR datasets, the functions implemented have the potential to be applied to other spectral ranges. On top of this, all results can be visualized within the software, and some tools allow for the inspection and comparison of spectra and spectral libraries. Providing software with these capabilities in a unified platform has the potential to positively impact research and education, as students and educators usually have limited access to proprietary software. Full article

(This article belongs to the Section Algorithms for Multidisciplinary Applications)

► Show Figures

Graphical abstract

Search Results (533)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (533)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI