MDPI - Publisher of Open Access Journals

28 pages, 3531 KiB

Open AccessReview

Review of Acoustic Emission Detection Technology for Valve Internal Leakage: Mechanisms, Methods, Challenges, and Application Prospects

by Dongjie Zheng, Xing Wang, Lingling Yang, Yunqi Li, Hui Xia, Haochuan Zhang and Xiaomei Xiang

Sensors 2025, 25(14), 4487; https://doi.org/10.3390/s25144487 - 18 Jul 2025

Abstract

Internal leakage within the valve body constitutes a severe potential safety hazard in industrial fluid control systems, attributable to its high concealment and the resultant difficulty in detection via conventional methodologies. Acoustic emission (AE) technology, functioning as an efficient non-destructive testing approach, is [...] Read more.

Internal leakage within the valve body constitutes a severe potential safety hazard in industrial fluid control systems, attributable to its high concealment and the resultant difficulty in detection via conventional methodologies. Acoustic emission (AE) technology, functioning as an efficient non-destructive testing approach, is capable of capturing the transient stress waves induced by leakage, thereby furnishing an effective means for the real-time monitoring and quantitative assessment of internal leakage within the valve body. This paper conducts a systematic review of the theoretical foundations, signal-processing methodologies, and the latest research advancements related to the technology for detecting internal leakage in the valve body based on acoustic emission. Firstly, grounded in Lechlier’s acoustic analogy theory, the generation mechanism of acoustic emission signals arising from valve body leakage is elucidated. Secondly, a detailed analysis is conducted on diverse signal processing techniques and their corresponding optimization strategies, encompassing parameter analysis, time–frequency analysis, nonlinear dynamics methods, and intelligent algorithms. Moreover, this paper recapitulates the current challenges encountered by this technology and delineates future research orientations, such as the fusion of multi-modal sensors, the deployment of lightweight deep learning models, and integration with the Internet of Things. This study provides a systematic reference for the engineering application and theoretical development of the acoustic emission-based technology for detecting internal leakage in valves. Full article

(This article belongs to the Topic Advances in Non-Destructive Testing Methods, 3rd Edition)

► Show Figures

Figure 1

26 pages, 6798 KiB

Open AccessArticle

Robust Optical and SAR Image Matching via Attention-Guided Structural Encoding and Confidence-Aware Filtering

by Qi Kang, Jixian Zhang, Guoman Huang and Fei Liu

Remote Sens. 2025, 17(14), 2501; https://doi.org/10.3390/rs17142501 - 18 Jul 2025

Abstract

Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and [...] Read more.

Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and efficient optical–SAR image registration. The proposed method integrates a structure-enhanced feature extractor, RS2FNet, which combines dual-stage Res2Net modules with a bi-level routing attention mechanism to capture multi-scale local textures and global structural semantics. A context-aware matching module refines correspondences through self- and cross-attention, coupled with a confidence-driven early-exit pruning strategy to reduce computational cost while maintaining accuracy. Additionally, a match-aware multi-task loss function jointly enforces spatial consistency, affine invariance, and structural coherence for end-to-end optimization. Experiments on public datasets (SEN1-2 and WHU-OPT-SAR) and a self-collected Gaofen (GF) dataset demonstrated that ACAMatch significantly outperformed existing state-of-the-art methods in terms of the number of correct matches, matching accuracy, and inference speed, especially under challenging conditions such as resolution differences and severe structural distortions. These results indicate the effectiveness and generalizability of the proposed approach for multimodal image registration, making ACAMatch a promising solution for remote sensing applications such as change detection and multi-sensor data fusion. Full article

(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)

► Show Figures

Figure 1

25 pages, 732 KiB

Open AccessArticle

Accuracy-Aware MLLM Task Offloading and Resource Allocation in UAV-Assisted Satellite Edge Computing

by Huabing Yan, Hualong Huang, Zijia Zhao, Zhi Wang and Zitian Zhao

Drones 2025, 9(7), 500; https://doi.org/10.3390/drones9070500 - 16 Jul 2025

Viewed by 67

Abstract

This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in [...] Read more.

This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in remote areas. However, cloud computing dependency introduces latency, bandwidth, and privacy challenges, while IoT device limitations require efficient distributed computing solutions. SEC, utilizing low-earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs), extends mobile edge computing to provide ubiquitous computational resources for remote IoTDs. We formulate the joint optimization of MLLM task offloading and resource allocation as a mixed-integer nonlinear programming (MINLP) problem, minimizing latency and energy consumption while optimizing offloading decisions, power allocation, and UAV trajectories. To address the dynamic SEC environment characterized by satellite mobility, we propose an action-decoupled soft actor–critic (AD-SAC) algorithm with discrete–continuous hybrid action spaces. The simulation results demonstrate that our approach significantly outperforms conventional deep reinforcement learning methods in convergence and system cost reduction compared to baseline algorithms. Full article

► Show Figures

Figure 1

26 pages, 6371 KiB

Open AccessArticle

Growth Stages Discrimination of Multi-Cultivar Navel Oranges Using the Fusion of Near-Infrared Hyperspectral Imaging and Machine Vision with Deep Learning

by Chunyan Zhao, Zhong Ren, Yue Li, Jia Zhang and Weinan Shi

Agriculture 2025, 15(14), 1530; https://doi.org/10.3390/agriculture15141530 - 15 Jul 2025

Viewed by 78

Abstract

To noninvasively and precisely discriminate among the growth stages of multiple cultivars of navel oranges simultaneously, the fusion of the technologies of near-infrared (NIR) hyperspectral imaging (HSI) combined with machine vision (MV) and deep learning is employed. NIR reflectance spectra and hyperspectral and [...] Read more.

To noninvasively and precisely discriminate among the growth stages of multiple cultivars of navel oranges simultaneously, the fusion of the technologies of near-infrared (NIR) hyperspectral imaging (HSI) combined with machine vision (MV) and deep learning is employed. NIR reflectance spectra and hyperspectral and RGB images for 740 Gannan navel oranges of five cultivars are collected. Based on preprocessed spectra, optimally selected hyperspectral images, and registered RGB images, a dual-branch multi-modal feature fusion convolutional neural network (CNN) model is established. In this model, a spectral branch is designed to extract spectral features reflecting internal compositional variations, while the image branch is utilized to extract external color and texture features from the integration of hyperspectral and RGB images. Finally, growth stages are determined via the fusion of features. To validate the availability of the proposed method, various machine-learning and deep-learning models are compared for single-modal and multi-modal data. The results demonstrate that multi-modal feature fusion of HSI and MV combined with the constructed dual-branch CNN deep-learning model yields excellent growth stage discrimination in navel oranges, achieving an accuracy, recall rate, precision, F1 score, and kappa coefficient on the testing set are 95.95%, 96.66%, 96.76%, 96.69%, and 0.9481, respectively, providing a prominent way to precisely monitor the growth stages of fruits. Full article

(This article belongs to the Special Issue Multi- and Hyper-Spectral Imaging Technologies for Crop Monitoring—2nd Edition)

► Show Figures

Figure 1

30 pages, 2023 KiB

Open AccessReview

Fusion of Computer Vision and AI in Collaborative Robotics: A Review and Future Prospects

by Yuval Cohen, Amir Biton and Shraga Shoval

Appl. Sci. 2025, 15(14), 7905; https://doi.org/10.3390/app15147905 - 15 Jul 2025

Viewed by 128

Abstract

The integration of advanced computer vision and artificial intelligence (AI) techniques into collaborative robotic systems holds the potential to revolutionize human–robot interaction, productivity, and safety. Despite substantial research activity, a systematic synthesis of how vision and AI are jointly enabling context-aware, adaptive cobot [...] Read more.

The integration of advanced computer vision and artificial intelligence (AI) techniques into collaborative robotic systems holds the potential to revolutionize human–robot interaction, productivity, and safety. Despite substantial research activity, a systematic synthesis of how vision and AI are jointly enabling context-aware, adaptive cobot capabilities across perception, planning, and decision-making remains lacking (especially in recent years). Addressing this gap, our review unifies the latest advances in visual recognition, deep learning, and semantic mapping within a structured taxonomy tailored to collaborative robotics. We examine foundational technologies such as object detection, human pose estimation, and environmental modeling, as well as emerging trends including multimodal sensor fusion, explainable AI, and ethically guided autonomy. Unlike prior surveys that focus narrowly on either vision or AI, this review uniquely analyzes their integrated use for real-world human–robot collaboration. Highlighting industrial and service applications, we distill the best practices, identify critical challenges, and present key performance metrics to guide future research. We conclude by proposing strategic directions—from scalable training methods to interoperability standards—to foster safe, robust, and proactive human–robot partnerships in the years ahead. Full article

(This article belongs to the Special Issue Integrating AI into Mechatronics and Robotics: Innovations and Applications)

► Show Figures

Figure 1

14 pages, 1509 KiB

Open AccessArticle

A Multi-Modal Deep Learning Approach for Predicting Eligibility for Adaptive Radiation Therapy in Nasopharyngeal Carcinoma Patients

by Zhichun Li, Zihan Li, Sai Kit Lam, Xiang Wang, Peilin Wang, Liming Song, Francis Kar-Ho Lee, Celia Wai-Yi Yip, Jing Cai and Tian Li

Cancers 2025, 17(14), 2350; https://doi.org/10.3390/cancers17142350 - 15 Jul 2025

Viewed by 143

Abstract

Background: Adaptive radiation therapy (ART) can improve prognosis for nasopharyngeal carcinoma (NPC) patients. However, the inter-individual variability in anatomical changes, along with the resulting extension of treatment duration and increased workload for the radiologists, makes the selection of eligible patients a persistent challenge [...] Read more.

Background: Adaptive radiation therapy (ART) can improve prognosis for nasopharyngeal carcinoma (NPC) patients. However, the inter-individual variability in anatomical changes, along with the resulting extension of treatment duration and increased workload for the radiologists, makes the selection of eligible patients a persistent challenge in clinical practice. The purpose of this study was to predict eligible ART candidates prior to radiation therapy (RT) for NPC patients using a classification neural network. By leveraging the fusion of medical imaging and clinical data, this method aimed to save time and resources in clinical workflows and improve treatment efficiency. Methods: We collected retrospective data from 305 NPC patients who received RT at Hong Kong Queen Elizabeth Hospital. Each patient sample included pre-treatment computed tomographic (CT) images, T1-weighted magnetic resonance imaging (MRI) data, and T2-weighted MRI images, along with clinical data. We developed and trained a novel multi-modal classification neural network that combines ResNet-50, cross-attention, multi-scale features, and clinical data for multi-modal fusion. The patients were categorized into two labels based on their re-plan status: patients who received ART during RT treatment, as determined by the radiation oncologist, and those who did not. Results: The experimental results demonstrated that the proposed multi-modal deep prediction model outperformed other commonly used deep learning networks, achieving an area under the curve (AUC) of 0.9070. These results indicated the ability of the model to accurately classify and predict ART eligibility for NPC patients. Conclusions: The proposed method showed good performance in predicting ART eligibility among NPC patients, highlighting its potential to enhance clinical decision-making, optimize treatment efficiency, and support more personalized cancer care. Full article

(This article belongs to the Special Issue Image-Guided Adaptive Radiation Therapy (IGART): Advancing Precision Oncology)

► Show Figures

Figure 1

20 pages, 5700 KiB

Open AccessArticle

Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features

by Hyeonuk Bhin and Jongsuk Choi

Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025

Viewed by 186

Abstract

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.

Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article

(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)

► Show Figures

Figure 1

21 pages, 4147 KiB

Open AccessArticle

AgriFusionNet: A Lightweight Deep Learning Model for Multisource Plant Disease Diagnosis

by Saleh Albahli

Agriculture 2025, 15(14), 1523; https://doi.org/10.3390/agriculture15141523 - 15 Jul 2025

Viewed by 163

Abstract

Timely and accurate identification of plant diseases is critical to mitigating crop losses and enhancing yield in precision agriculture. This paper proposes AgriFusionNet, a lightweight and efficient deep learning model designed to diagnose plant diseases using multimodal data sources. The framework integrates RGB [...] Read more.

Timely and accurate identification of plant diseases is critical to mitigating crop losses and enhancing yield in precision agriculture. This paper proposes AgriFusionNet, a lightweight and efficient deep learning model designed to diagnose plant diseases using multimodal data sources. The framework integrates RGB and multispectral drone imagery with IoT-based environmental sensor data (e.g., temperature, humidity, soil moisture), recorded over six months across multiple agricultural zones. Built on the EfficientNetV2-B4 backbone, AgriFusionNet incorporates Fused-MBConv blocks and Swish activation to improve gradient flow, capture fine-grained disease patterns, and reduce inference latency. The model was evaluated using a comprehensive dataset composed of real-world and benchmarked samples, showing superior performance with 94.3% classification accuracy, 28.5 ms inference time, and a 30% reduction in model parameters compared to state-of-the-art models such as Vision Transformers and InceptionV4. Extensive comparisons with both traditional machine learning and advanced deep learning methods underscore its robustness, generalization, and suitability for deployment on edge devices. Ablation studies and confusion matrix analyses further confirm its diagnostic precision, even in visually ambiguous cases. The proposed framework offers a scalable, practical solution for real-time crop health monitoring, contributing toward smart and sustainable agricultural ecosystems. Full article

(This article belongs to the Special Issue Computational, AI and IT Solutions Helping Agriculture)

► Show Figures

Figure 1

23 pages, 3492 KiB

Open AccessArticle

A Multimodal Deep Learning Framework for Accurate Biomass and Carbon Sequestration Estimation from UAV Imagery

by Furkat Safarov, Ugiloy Khojamuratova, Misirov Komoliddin, Xusinov Ibragim Ismailovich and Young Im Cho

Drones 2025, 9(7), 496; https://doi.org/10.3390/drones9070496 - 14 Jul 2025

Viewed by 126

Abstract

Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to [...] Read more.

Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to capture the intricate spectral and structural heterogeneity of forest canopies, particularly at fine spatial resolutions. To address these limitations, we introduce ForestIQNet, a novel end-to-end multimodal deep learning framework designed to estimate AGB and associated carbon stocks from UAV-acquired imagery with high spatial fidelity. ForestIQNet combines dual-stream encoders for processing multispectral UAV imagery and a voxelized Canopy Height Model (CHM), fused via a Cross-Attentional Feature Fusion (CAFF) module, enabling fine-grained interaction between spectral reflectance and 3D structure. A lightweight Transformer-based regression head then performs multitask prediction of AGB and CO₂e, capturing long-range spatial dependencies and enhancing generalization. Proposed method achieves an R² of 0.93 and RMSE of 6.1 kg for AGB prediction, compared to 0.78 R² and 11.7 kg RMSE for XGBoost and 0.73 R² and 13.2 kg RMSE for Random Forest. Despite its architectural complexity, ForestIQNet maintains a low inference cost (27 ms per patch) and generalizes well across species, terrain, and canopy structures. These results establish a new benchmark for UAV-enabled biomass estimation and provide scalable, interpretable tools for climate monitoring and forest management. Full article

(This article belongs to the Special Issue UAVs for Nature Conservation Tasks in Complex Environments)

► Show Figures

Figure 1

23 pages, 3008 KiB

Open AccessArticle

Quantitative Analysis of Sulfur Elements in Mars-like Rocks Based on Multimodal Data

by Yuhang Dong, Zhengfeng Shi, Junsheng Yao, Li Zhang, Yongkang Chen and Junyan Jia

Sensors 2025, 25(14), 4388; https://doi.org/10.3390/s25144388 - 14 Jul 2025

Viewed by 186

Abstract

The Zhurong rover of the Tianwen-1 mission has detected sulfates in its landing area. The analysis of these sulfates provides scientific evidence for exploring past hydration conditions and atmospheric evolution on Mars. As a non-contact technique with long-range detection capability, Laser-Induced Breakdown Spectroscopy [...] Read more.

The Zhurong rover of the Tianwen-1 mission has detected sulfates in its landing area. The analysis of these sulfates provides scientific evidence for exploring past hydration conditions and atmospheric evolution on Mars. As a non-contact technique with long-range detection capability, Laser-Induced Breakdown Spectroscopy (LIBS) is widely used for elemental identification on Mars. However, quantitative analysis of anionic elements using LIBS remains challenging due to the weak characteristic spectral lines of evaporite salt elements, such as sulfur, in LIBS spectra, which provide limited quantitative information. This study proposes a quantitative analysis method for sulfur in sulfate-containing Martian analogs by leveraging spectral line correlations, full-spectrum information, and prior knowledge, aiming to address the challenges of sulfur identification and quantification in Martian exploration. To enhance the accuracy of sulfur quantification, two analytical models for high and low sulfur concentrations were developed. Samples were classified using infrared spectroscopy based on sulfur content levels. Subsequently, multimodal deep learning models were developed for quantitative analysis by integrating LIBS and infrared spectra, based on varying concentrations. Compared to traditional unimodal models, the multimodal method simultaneously utilizes elemental chemical information from LIBS spectra and molecular structural and vibrational characteristics from infrared spectroscopy. Considering that sulfur exhibits distinct absorption bands in infrared spectra but demonstrates weak characteristic lines in LIBS spectra due to its low ionization energy, the combination of both spectral techniques enables the model to capture complementary sample features, thereby effectively improving prediction accuracy and robustness. To validate the advantages of the multimodal approach, comparative analyses were conducted against unimodal methods. Furthermore, to optimize model performance, different feature selection algorithms were evaluated. Ultimately, an XGBoost-based feature selection method incorporating prior knowledge was employed to identify optimal LIBS spectral features, and the selected feature subsets were utilized in multimodal modeling to enhance stability. Experimental results demonstrate that, compared to the BPNN, SVR, and Inception unimodal methods, the proposed multimodal approach achieves at least a 92.36% reduction in RMSE and a 46.3% improvement in R². Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

19 pages, 709 KiB

Open AccessArticle

Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks

by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng

Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025

Viewed by 141

Abstract

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.

Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article

(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)

► Show Figures

Figure 1

16 pages, 2169 KiB

Open AccessArticle

Leveraging Feature Fusion of Image Features and Laser Reflectance for Automated Fish Freshness Classification

by Caner Balım, Nevzat Olgun and Mücahit Çalışan

Sensors 2025, 25(14), 4374; https://doi.org/10.3390/s25144374 - 12 Jul 2025

Viewed by 223

Abstract

Fish is important for human health due to its high nutritional value. However, it is prone to spoilage due to its structural characteristics. Traditional freshness assessment methods, such as visual inspection, are subjective and prone to inconsistency. This study proposes a novel, cost-effective [...] Read more.

Fish is important for human health due to its high nutritional value. However, it is prone to spoilage due to its structural characteristics. Traditional freshness assessment methods, such as visual inspection, are subjective and prone to inconsistency. This study proposes a novel, cost-effective hybrid methodology for automated three-level fish freshness classification (Day 1, Day 2, Day 3) by integrating single-wavelength laser reflectance data with deep learning-based image features. A comprehensive dataset was created by collecting visual and laser data from 130 mackerel specimens over three consecutive days under controlled conditions. Image features were extracted using four pre-trained CNN architectures and fused with laser features to form a unified representation. The combined features were classified using SVM, MLP, and RF algorithms. The experimental results demonstrated that the proposed multimodal approach significantly outperformed single-modality methods, achieving average classification accuracy of 88.44%. This work presents an original contribution by demonstrating, for the first time, the effectiveness of combining low-cost laser sensing and deep visual features for freshness prediction, with potential for real-time mobile deployment. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

11 pages, 2054 KiB

Open AccessArticle

Polarization-Enhanced Multi-Target Underwater Salient Object Detection

by Jiayi Song, Peikai Zhao, Jiangtao Li, Liming Zhu, Khian-Hooi Chew and Rui-Pin Chen

Photonics 2025, 12(7), 707; https://doi.org/10.3390/photonics12070707 - 12 Jul 2025

Viewed by 118

Abstract

Salient object detection (SOD) plays a critical role in underwater exploration systems. Traditional SOD approaches encounter notable constraints in underwater image analysis, primarily stemming from light scattering and absorption effects induced by suspended particulate matter in complex underwater environments. In this work, we [...] Read more.

Salient object detection (SOD) plays a critical role in underwater exploration systems. Traditional SOD approaches encounter notable constraints in underwater image analysis, primarily stemming from light scattering and absorption effects induced by suspended particulate matter in complex underwater environments. In this work, we propose a deep learning-based multimodal method guided by multi-polarization parameters that integrates polarization de-scattering mechanisms with the powerful feature learning capability of neural networks to achieve adaptive multi-target SOD in an underwater turbid scattering environment. The proposed polarization-enhanced salient object detection network (PESODNet) employs a multi-polarization-parameter-guided, material-aware attention mechanism and a contrastive feature calibration unit, significantly enhancing its multi-material, multi-target detection capabilities in underwater scattering environments. The experimental results confirm that the proposed method achieves substantial performance improvements in multi-target underwater SOD tasks, outperforming state-of-the-art models of salient object detection in detection accuracy. Full article

(This article belongs to the Section New Applications Enabled by Photonics Technologies and Systems)

► Show Figures

Figure 1

30 pages, 8143 KiB

Open AccessArticle

An Edge-Deployable Multi-Modal Nano-Sensor Array Coupled with Deep Learning for Real-Time, Multi-Pollutant Water Quality Monitoring

by Zhexu Xi, Robert Nicolas and Jiayi Wei

Water 2025, 17(14), 2065; https://doi.org/10.3390/w17142065 - 10 Jul 2025

Viewed by 233

Abstract

Real-time, high-resolution monitoring of chemically diverse water pollutants remains a critical challenge for smart water management. Here, we report a fully integrated, multi-modal nano-sensor array, combining graphene field-effect transistors, Ag/Au-nanostar surface-enhanced Raman spectroscopy substrates, and CdSe/ZnS quantum dot fluorescence, coupled to an edge-deployable [...] Read more.

Real-time, high-resolution monitoring of chemically diverse water pollutants remains a critical challenge for smart water management. Here, we report a fully integrated, multi-modal nano-sensor array, combining graphene field-effect transistors, Ag/Au-nanostar surface-enhanced Raman spectroscopy substrates, and CdSe/ZnS quantum dot fluorescence, coupled to an edge-deployable CNN-LSTM architecture that fuses raw electrochemical, vibrational, and photoluminescent signals without manual feature engineering. The 45 mm × 20 mm microfluidic manifold enables continuous flow-through sampling, while 8-bit-quantised inference executes in 31 ms at <12 W. Laboratory calibration over 28,000 samples achieved limits of detection of 12 ppt (Pb²⁺), 17 pM (atrazine) and 87 ng L⁻¹ (nanoplastics), with R² ≥ 0.93 and a mean absolute percentage error <6%. A 24 h deployment in the Cherwell River reproduced natural concentration fluctuations with field R² ≥ 0.92. SHAP and Grad-CAM analyses reveal that the network bases its predictions on Dirac-point shifts, characteristic Raman bands, and early-time fluorescence-quenching kinetics, providing mechanistic interpretability. The platform therefore offers a scalable route to smart water grids, point-of-use drinking water sentinels, and rapid environmental incident response. Future work will address sensor drift through antifouling coatings, enhance cross-site generalisation via federated learning, and create physics-informed digital twins for self-calibrating global monitoring networks. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence (AI) in Water Quality Monitoring)

► Show Figures

Figure 1

30 pages, 5474 KiB

Open AccessArticle

WHU-RS19 ABZSL: An Attribute-Based Dataset for Remote Sensing Image Understanding

by Mattia Balestra, Marina Paolanti and Roberto Pierdicca

Remote Sens. 2025, 17(14), 2384; https://doi.org/10.3390/rs17142384 - 10 Jul 2025

Viewed by 197

Abstract

The advancement of artificial intelligence (AI) in remote sensing (RS) increasingly depends on datasets that offer rich and structured supervision beyond traditional scene-level labels. Although existing benchmarks for aerial scene classification have facilitated progress in this area, their reliance on single-class annotations restricts [...] Read more.

The advancement of artificial intelligence (AI) in remote sensing (RS) increasingly depends on datasets that offer rich and structured supervision beyond traditional scene-level labels. Although existing benchmarks for aerial scene classification have facilitated progress in this area, their reliance on single-class annotations restricts their application to more flexible, interpretable and generalisable learning frameworks. In this study, we introduce WHU-RS19 ABZSL: an attribute-based extension of the widely adopted WHU-RS19 dataset. This new version comprises 1005 high-resolution aerial images across 19 scene categories, each annotated with a vector of 38 features. These cover objects (e.g., roads and trees), geometric patterns (e.g., lines and curves) and dominant colours (e.g., green and blue), and are defined through expert-guided annotation protocols. To demonstrate the value of the dataset, we conduct baseline experiments using deep learning models that had been adapted for multi-label classification—ResNet18, VGG16, InceptionV3, EfficientNet and ViT-B/16—designed to capture the semantic complexity characteristic of real-world aerial scenes. The results, which are measured in terms of macro F1-score, range from 0.7385 for ResNet18 to 0.7608 for EfficientNet-B0. In particular, EfficientNet-B0 and ViT-B/16 are the top performers in terms of the overall macro F1-score and consistency across attributes, while all models show a consistent decline in performance for infrequent or visually ambiguous categories. This confirms that it is feasible to accurately predict semantic attributes in complex scenes. By enriching a standard benchmark with detailed, image-level semantic supervision, WHU-RS19 ABZSL supports a variety of downstream applications, including multi-label classification, explainable AI, semantic retrieval, and attribute-based ZSL. It thus provides a reusable, compact resource for advancing the semantic understanding of remote sensing and multimodal AI. Full article

(This article belongs to the Special Issue Remote Sensing Datasets and 3D Visualization of Geospatial Big Data)

► Show Figures

Figure 1

Search Results (1,039)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,039)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI