Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,039)

Search Parameters:
Keywords = multimodal deep learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 3531 KiB  
Review
Review of Acoustic Emission Detection Technology for Valve Internal Leakage: Mechanisms, Methods, Challenges, and Application Prospects
by Dongjie Zheng, Xing Wang, Lingling Yang, Yunqi Li, Hui Xia, Haochuan Zhang and Xiaomei Xiang
Sensors 2025, 25(14), 4487; https://doi.org/10.3390/s25144487 - 18 Jul 2025
Abstract
Internal leakage within the valve body constitutes a severe potential safety hazard in industrial fluid control systems, attributable to its high concealment and the resultant difficulty in detection via conventional methodologies. Acoustic emission (AE) technology, functioning as an efficient non-destructive testing approach, is [...] Read more.
Internal leakage within the valve body constitutes a severe potential safety hazard in industrial fluid control systems, attributable to its high concealment and the resultant difficulty in detection via conventional methodologies. Acoustic emission (AE) technology, functioning as an efficient non-destructive testing approach, is capable of capturing the transient stress waves induced by leakage, thereby furnishing an effective means for the real-time monitoring and quantitative assessment of internal leakage within the valve body. This paper conducts a systematic review of the theoretical foundations, signal-processing methodologies, and the latest research advancements related to the technology for detecting internal leakage in the valve body based on acoustic emission. Firstly, grounded in Lechlier’s acoustic analogy theory, the generation mechanism of acoustic emission signals arising from valve body leakage is elucidated. Secondly, a detailed analysis is conducted on diverse signal processing techniques and their corresponding optimization strategies, encompassing parameter analysis, time–frequency analysis, nonlinear dynamics methods, and intelligent algorithms. Moreover, this paper recapitulates the current challenges encountered by this technology and delineates future research orientations, such as the fusion of multi-modal sensors, the deployment of lightweight deep learning models, and integration with the Internet of Things. This study provides a systematic reference for the engineering application and theoretical development of the acoustic emission-based technology for detecting internal leakage in valves. Full article
(This article belongs to the Topic Advances in Non-Destructive Testing Methods, 3rd Edition)
Show Figures

Figure 1

26 pages, 6798 KiB  
Article
Robust Optical and SAR Image Matching via Attention-Guided Structural Encoding and Confidence-Aware Filtering
by Qi Kang, Jixian Zhang, Guoman Huang and Fei Liu
Remote Sens. 2025, 17(14), 2501; https://doi.org/10.3390/rs17142501 - 18 Jul 2025
Abstract
Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and [...] Read more.
Accurate feature matching between optical and synthetic aperture radar (SAR) images remains a significant challenge in remote sensing due to substantial modality discrepancies in texture, intensity, and geometric structure. In this study, we proposed an attention-context-aware deep learning framework (ACAMatch) for robust and efficient optical–SAR image registration. The proposed method integrates a structure-enhanced feature extractor, RS2FNet, which combines dual-stage Res2Net modules with a bi-level routing attention mechanism to capture multi-scale local textures and global structural semantics. A context-aware matching module refines correspondences through self- and cross-attention, coupled with a confidence-driven early-exit pruning strategy to reduce computational cost while maintaining accuracy. Additionally, a match-aware multi-task loss function jointly enforces spatial consistency, affine invariance, and structural coherence for end-to-end optimization. Experiments on public datasets (SEN1-2 and WHU-OPT-SAR) and a self-collected Gaofen (GF) dataset demonstrated that ACAMatch significantly outperformed existing state-of-the-art methods in terms of the number of correct matches, matching accuracy, and inference speed, especially under challenging conditions such as resolution differences and severe structural distortions. These results indicate the effectiveness and generalizability of the proposed approach for multimodal image registration, making ACAMatch a promising solution for remote sensing applications such as change detection and multi-sensor data fusion. Full article
(This article belongs to the Special Issue Advancements of Vision-Language Models (VLMs) in Remote Sensing)
Show Figures

Figure 1

25 pages, 732 KiB  
Article
Accuracy-Aware MLLM Task Offloading and Resource Allocation in UAV-Assisted Satellite Edge Computing
by Huabing Yan, Hualong Huang, Zijia Zhao, Zhi Wang and Zitian Zhao
Drones 2025, 9(7), 500; https://doi.org/10.3390/drones9070500 - 16 Jul 2025
Viewed by 67
Abstract
This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in [...] Read more.
This paper presents a novel framework for optimizing multimodal large language model (MLLM) inference through task offloading and resource allocation in UAV-assisted satellite edge computing (SEC) networks. MLLMs leverage transformer architectures to integrate heterogeneous data modalities for IoT applications, particularly real-time monitoring in remote areas. However, cloud computing dependency introduces latency, bandwidth, and privacy challenges, while IoT device limitations require efficient distributed computing solutions. SEC, utilizing low-earth orbit (LEO) satellites and unmanned aerial vehicles (UAVs), extends mobile edge computing to provide ubiquitous computational resources for remote IoTDs. We formulate the joint optimization of MLLM task offloading and resource allocation as a mixed-integer nonlinear programming (MINLP) problem, minimizing latency and energy consumption while optimizing offloading decisions, power allocation, and UAV trajectories. To address the dynamic SEC environment characterized by satellite mobility, we propose an action-decoupled soft actor–critic (AD-SAC) algorithm with discrete–continuous hybrid action spaces. The simulation results demonstrate that our approach significantly outperforms conventional deep reinforcement learning methods in convergence and system cost reduction compared to baseline algorithms. Full article
Show Figures

Figure 1

26 pages, 6371 KiB  
Article
Growth Stages Discrimination of Multi-Cultivar Navel Oranges Using the Fusion of Near-Infrared Hyperspectral Imaging and Machine Vision with Deep Learning
by Chunyan Zhao, Zhong Ren, Yue Li, Jia Zhang and Weinan Shi
Agriculture 2025, 15(14), 1530; https://doi.org/10.3390/agriculture15141530 - 15 Jul 2025
Viewed by 78
Abstract
To noninvasively and precisely discriminate among the growth stages of multiple cultivars of navel oranges simultaneously, the fusion of the technologies of near-infrared (NIR) hyperspectral imaging (HSI) combined with machine vision (MV) and deep learning is employed. NIR reflectance spectra and hyperspectral and [...] Read more.
To noninvasively and precisely discriminate among the growth stages of multiple cultivars of navel oranges simultaneously, the fusion of the technologies of near-infrared (NIR) hyperspectral imaging (HSI) combined with machine vision (MV) and deep learning is employed. NIR reflectance spectra and hyperspectral and RGB images for 740 Gannan navel oranges of five cultivars are collected. Based on preprocessed spectra, optimally selected hyperspectral images, and registered RGB images, a dual-branch multi-modal feature fusion convolutional neural network (CNN) model is established. In this model, a spectral branch is designed to extract spectral features reflecting internal compositional variations, while the image branch is utilized to extract external color and texture features from the integration of hyperspectral and RGB images. Finally, growth stages are determined via the fusion of features. To validate the availability of the proposed method, various machine-learning and deep-learning models are compared for single-modal and multi-modal data. The results demonstrate that multi-modal feature fusion of HSI and MV combined with the constructed dual-branch CNN deep-learning model yields excellent growth stage discrimination in navel oranges, achieving an accuracy, recall rate, precision, F1 score, and kappa coefficient on the testing set are 95.95%, 96.66%, 96.76%, 96.69%, and 0.9481, respectively, providing a prominent way to precisely monitor the growth stages of fruits. Full article
Show Figures

Figure 1

30 pages, 2023 KiB  
Review
Fusion of Computer Vision and AI in Collaborative Robotics: A Review and Future Prospects
by Yuval Cohen, Amir Biton and Shraga Shoval
Appl. Sci. 2025, 15(14), 7905; https://doi.org/10.3390/app15147905 - 15 Jul 2025
Viewed by 128
Abstract
The integration of advanced computer vision and artificial intelligence (AI) techniques into collaborative robotic systems holds the potential to revolutionize human–robot interaction, productivity, and safety. Despite substantial research activity, a systematic synthesis of how vision and AI are jointly enabling context-aware, adaptive cobot [...] Read more.
The integration of advanced computer vision and artificial intelligence (AI) techniques into collaborative robotic systems holds the potential to revolutionize human–robot interaction, productivity, and safety. Despite substantial research activity, a systematic synthesis of how vision and AI are jointly enabling context-aware, adaptive cobot capabilities across perception, planning, and decision-making remains lacking (especially in recent years). Addressing this gap, our review unifies the latest advances in visual recognition, deep learning, and semantic mapping within a structured taxonomy tailored to collaborative robotics. We examine foundational technologies such as object detection, human pose estimation, and environmental modeling, as well as emerging trends including multimodal sensor fusion, explainable AI, and ethically guided autonomy. Unlike prior surveys that focus narrowly on either vision or AI, this review uniquely analyzes their integrated use for real-world human–robot collaboration. Highlighting industrial and service applications, we distill the best practices, identify critical challenges, and present key performance metrics to guide future research. We conclude by proposing strategic directions—from scalable training methods to interoperability standards—to foster safe, robust, and proactive human–robot partnerships in the years ahead. Full article
Show Figures

Figure 1

14 pages, 1509 KiB  
Article
A Multi-Modal Deep Learning Approach for Predicting Eligibility for Adaptive Radiation Therapy in Nasopharyngeal Carcinoma Patients
by Zhichun Li, Zihan Li, Sai Kit Lam, Xiang Wang, Peilin Wang, Liming Song, Francis Kar-Ho Lee, Celia Wai-Yi Yip, Jing Cai and Tian Li
Cancers 2025, 17(14), 2350; https://doi.org/10.3390/cancers17142350 - 15 Jul 2025
Viewed by 143
Abstract
Background: Adaptive radiation therapy (ART) can improve prognosis for nasopharyngeal carcinoma (NPC) patients. However, the inter-individual variability in anatomical changes, along with the resulting extension of treatment duration and increased workload for the radiologists, makes the selection of eligible patients a persistent challenge [...] Read more.
Background: Adaptive radiation therapy (ART) can improve prognosis for nasopharyngeal carcinoma (NPC) patients. However, the inter-individual variability in anatomical changes, along with the resulting extension of treatment duration and increased workload for the radiologists, makes the selection of eligible patients a persistent challenge in clinical practice. The purpose of this study was to predict eligible ART candidates prior to radiation therapy (RT) for NPC patients using a classification neural network. By leveraging the fusion of medical imaging and clinical data, this method aimed to save time and resources in clinical workflows and improve treatment efficiency. Methods: We collected retrospective data from 305 NPC patients who received RT at Hong Kong Queen Elizabeth Hospital. Each patient sample included pre-treatment computed tomographic (CT) images, T1-weighted magnetic resonance imaging (MRI) data, and T2-weighted MRI images, along with clinical data. We developed and trained a novel multi-modal classification neural network that combines ResNet-50, cross-attention, multi-scale features, and clinical data for multi-modal fusion. The patients were categorized into two labels based on their re-plan status: patients who received ART during RT treatment, as determined by the radiation oncologist, and those who did not. Results: The experimental results demonstrated that the proposed multi-modal deep prediction model outperformed other commonly used deep learning networks, achieving an area under the curve (AUC) of 0.9070. These results indicated the ability of the model to accurately classify and predict ART eligibility for NPC patients. Conclusions: The proposed method showed good performance in predicting ART eligibility among NPC patients, highlighting its potential to enhance clinical decision-making, optimize treatment efficiency, and support more personalized cancer care. Full article
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 186
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

21 pages, 4147 KiB  
Article
AgriFusionNet: A Lightweight Deep Learning Model for Multisource Plant Disease Diagnosis
by Saleh Albahli
Agriculture 2025, 15(14), 1523; https://doi.org/10.3390/agriculture15141523 - 15 Jul 2025
Viewed by 163
Abstract
Timely and accurate identification of plant diseases is critical to mitigating crop losses and enhancing yield in precision agriculture. This paper proposes AgriFusionNet, a lightweight and efficient deep learning model designed to diagnose plant diseases using multimodal data sources. The framework integrates RGB [...] Read more.
Timely and accurate identification of plant diseases is critical to mitigating crop losses and enhancing yield in precision agriculture. This paper proposes AgriFusionNet, a lightweight and efficient deep learning model designed to diagnose plant diseases using multimodal data sources. The framework integrates RGB and multispectral drone imagery with IoT-based environmental sensor data (e.g., temperature, humidity, soil moisture), recorded over six months across multiple agricultural zones. Built on the EfficientNetV2-B4 backbone, AgriFusionNet incorporates Fused-MBConv blocks and Swish activation to improve gradient flow, capture fine-grained disease patterns, and reduce inference latency. The model was evaluated using a comprehensive dataset composed of real-world and benchmarked samples, showing superior performance with 94.3% classification accuracy, 28.5 ms inference time, and a 30% reduction in model parameters compared to state-of-the-art models such as Vision Transformers and InceptionV4. Extensive comparisons with both traditional machine learning and advanced deep learning methods underscore its robustness, generalization, and suitability for deployment on edge devices. Ablation studies and confusion matrix analyses further confirm its diagnostic precision, even in visually ambiguous cases. The proposed framework offers a scalable, practical solution for real-time crop health monitoring, contributing toward smart and sustainable agricultural ecosystems. Full article
(This article belongs to the Special Issue Computational, AI and IT Solutions Helping Agriculture)
Show Figures

Figure 1

23 pages, 3492 KiB  
Article
A Multimodal Deep Learning Framework for Accurate Biomass and Carbon Sequestration Estimation from UAV Imagery
by Furkat Safarov, Ugiloy Khojamuratova, Misirov Komoliddin, Xusinov Ibragim Ismailovich and Young Im Cho
Drones 2025, 9(7), 496; https://doi.org/10.3390/drones9070496 - 14 Jul 2025
Viewed by 126
Abstract
Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to [...] Read more.
Accurate quantification of above-ground biomass (AGB) and carbon sequestration is vital for monitoring terrestrial ecosystem dynamics, informing climate policy, and supporting carbon neutrality initiatives. However, conventional methods—ranging from manual field surveys to remote sensing techniques based solely on 2D vegetation indices—often fail to capture the intricate spectral and structural heterogeneity of forest canopies, particularly at fine spatial resolutions. To address these limitations, we introduce ForestIQNet, a novel end-to-end multimodal deep learning framework designed to estimate AGB and associated carbon stocks from UAV-acquired imagery with high spatial fidelity. ForestIQNet combines dual-stream encoders for processing multispectral UAV imagery and a voxelized Canopy Height Model (CHM), fused via a Cross-Attentional Feature Fusion (CAFF) module, enabling fine-grained interaction between spectral reflectance and 3D structure. A lightweight Transformer-based regression head then performs multitask prediction of AGB and CO2e, capturing long-range spatial dependencies and enhancing generalization. Proposed method achieves an R2 of 0.93 and RMSE of 6.1 kg for AGB prediction, compared to 0.78 R2 and 11.7 kg RMSE for XGBoost and 0.73 R2 and 13.2 kg RMSE for Random Forest. Despite its architectural complexity, ForestIQNet maintains a low inference cost (27 ms per patch) and generalizes well across species, terrain, and canopy structures. These results establish a new benchmark for UAV-enabled biomass estimation and provide scalable, interpretable tools for climate monitoring and forest management. Full article
(This article belongs to the Special Issue UAVs for Nature Conservation Tasks in Complex Environments)
Show Figures

Figure 1

23 pages, 3008 KiB  
Article
Quantitative Analysis of Sulfur Elements in Mars-like Rocks Based on Multimodal Data
by Yuhang Dong, Zhengfeng Shi, Junsheng Yao, Li Zhang, Yongkang Chen and Junyan Jia
Sensors 2025, 25(14), 4388; https://doi.org/10.3390/s25144388 - 14 Jul 2025
Viewed by 186
Abstract
The Zhurong rover of the Tianwen-1 mission has detected sulfates in its landing area. The analysis of these sulfates provides scientific evidence for exploring past hydration conditions and atmospheric evolution on Mars. As a non-contact technique with long-range detection capability, Laser-Induced Breakdown Spectroscopy [...] Read more.
The Zhurong rover of the Tianwen-1 mission has detected sulfates in its landing area. The analysis of these sulfates provides scientific evidence for exploring past hydration conditions and atmospheric evolution on Mars. As a non-contact technique with long-range detection capability, Laser-Induced Breakdown Spectroscopy (LIBS) is widely used for elemental identification on Mars. However, quantitative analysis of anionic elements using LIBS remains challenging due to the weak characteristic spectral lines of evaporite salt elements, such as sulfur, in LIBS spectra, which provide limited quantitative information. This study proposes a quantitative analysis method for sulfur in sulfate-containing Martian analogs by leveraging spectral line correlations, full-spectrum information, and prior knowledge, aiming to address the challenges of sulfur identification and quantification in Martian exploration. To enhance the accuracy of sulfur quantification, two analytical models for high and low sulfur concentrations were developed. Samples were classified using infrared spectroscopy based on sulfur content levels. Subsequently, multimodal deep learning models were developed for quantitative analysis by integrating LIBS and infrared spectra, based on varying concentrations. Compared to traditional unimodal models, the multimodal method simultaneously utilizes elemental chemical information from LIBS spectra and molecular structural and vibrational characteristics from infrared spectroscopy. Considering that sulfur exhibits distinct absorption bands in infrared spectra but demonstrates weak characteristic lines in LIBS spectra due to its low ionization energy, the combination of both spectral techniques enables the model to capture complementary sample features, thereby effectively improving prediction accuracy and robustness. To validate the advantages of the multimodal approach, comparative analyses were conducted against unimodal methods. Furthermore, to optimize model performance, different feature selection algorithms were evaluated. Ultimately, an XGBoost-based feature selection method incorporating prior knowledge was employed to identify optimal LIBS spectral features, and the selected feature subsets were utilized in multimodal modeling to enhance stability. Experimental results demonstrate that, compared to the BPNN, SVR, and Inception unimodal methods, the proposed multimodal approach achieves at least a 92.36% reduction in RMSE and a 46.3% improvement in R2. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

19 pages, 709 KiB  
Article
Fusion of Multimodal Spatio-Temporal Features and 3D Deformable Convolution Based on Sign Language Recognition in Sensor Networks
by Qian Zhou, Hui Li, Weizhi Meng, Hua Dai, Tianyu Zhou and Guineng Zheng
Sensors 2025, 25(14), 4378; https://doi.org/10.3390/s25144378 - 13 Jul 2025
Viewed by 141
Abstract
Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the [...] Read more.
Sign language is a complex and dynamic visual language that requires the coordinated movement of various body parts, such as the hands, arms, and limbs—making it an ideal application domain for sensor networks to capture and interpret human gestures accurately. To address the intricate task of precise and expedient SLR from raw videos, this study introduces a novel deep learning approach by devising a multimodal framework for SLR. Specifically, feature extraction models are built based on two modalities: skeleton and RGB images. In this paper, we firstly propose a Multi-Stream Spatio-Temporal Graph Convolutional Network (MSGCN) that relies on three modules: a decoupling graph convolutional network, a self-emphasizing temporal convolutional network, and a spatio-temporal joint attention module. These modules are combined to capture the spatio-temporal information in multi-stream skeleton features. Secondly, we propose a 3D ResNet model based on deformable convolution (D-ResNet) to model complex spatial and temporal sequences in the original raw images. Finally, a gating mechanism-based Multi-Stream Fusion Module (MFM) is employed to merge the results of the two modalities. Extensive experiments are conducted on the public datasets AUTSL and WLASL, achieving competitive results compared to state-of-the-art systems. Full article
(This article belongs to the Special Issue Intelligent Sensing and Artificial Intelligence for Image Processing)
Show Figures

Figure 1

16 pages, 2169 KiB  
Article
Leveraging Feature Fusion of Image Features and Laser Reflectance for Automated Fish Freshness Classification
by Caner Balım, Nevzat Olgun and Mücahit Çalışan
Sensors 2025, 25(14), 4374; https://doi.org/10.3390/s25144374 - 12 Jul 2025
Viewed by 223
Abstract
Fish is important for human health due to its high nutritional value. However, it is prone to spoilage due to its structural characteristics. Traditional freshness assessment methods, such as visual inspection, are subjective and prone to inconsistency. This study proposes a novel, cost-effective [...] Read more.
Fish is important for human health due to its high nutritional value. However, it is prone to spoilage due to its structural characteristics. Traditional freshness assessment methods, such as visual inspection, are subjective and prone to inconsistency. This study proposes a novel, cost-effective hybrid methodology for automated three-level fish freshness classification (Day 1, Day 2, Day 3) by integrating single-wavelength laser reflectance data with deep learning-based image features. A comprehensive dataset was created by collecting visual and laser data from 130 mackerel specimens over three consecutive days under controlled conditions. Image features were extracted using four pre-trained CNN architectures and fused with laser features to form a unified representation. The combined features were classified using SVM, MLP, and RF algorithms. The experimental results demonstrated that the proposed multimodal approach significantly outperformed single-modality methods, achieving average classification accuracy of 88.44%. This work presents an original contribution by demonstrating, for the first time, the effectiveness of combining low-cost laser sensing and deep visual features for freshness prediction, with potential for real-time mobile deployment. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

11 pages, 2054 KiB  
Article
Polarization-Enhanced Multi-Target Underwater Salient Object Detection
by Jiayi Song, Peikai Zhao, Jiangtao Li, Liming Zhu, Khian-Hooi Chew and Rui-Pin Chen
Photonics 2025, 12(7), 707; https://doi.org/10.3390/photonics12070707 - 12 Jul 2025
Viewed by 118
Abstract
Salient object detection (SOD) plays a critical role in underwater exploration systems. Traditional SOD approaches encounter notable constraints in underwater image analysis, primarily stemming from light scattering and absorption effects induced by suspended particulate matter in complex underwater environments. In this work, we [...] Read more.
Salient object detection (SOD) plays a critical role in underwater exploration systems. Traditional SOD approaches encounter notable constraints in underwater image analysis, primarily stemming from light scattering and absorption effects induced by suspended particulate matter in complex underwater environments. In this work, we propose a deep learning-based multimodal method guided by multi-polarization parameters that integrates polarization de-scattering mechanisms with the powerful feature learning capability of neural networks to achieve adaptive multi-target SOD in an underwater turbid scattering environment. The proposed polarization-enhanced salient object detection network (PESODNet) employs a multi-polarization-parameter-guided, material-aware attention mechanism and a contrastive feature calibration unit, significantly enhancing its multi-material, multi-target detection capabilities in underwater scattering environments. The experimental results confirm that the proposed method achieves substantial performance improvements in multi-target underwater SOD tasks, outperforming state-of-the-art models of salient object detection in detection accuracy. Full article
Show Figures

Figure 1

30 pages, 8143 KiB  
Article
An Edge-Deployable Multi-Modal Nano-Sensor Array Coupled with Deep Learning for Real-Time, Multi-Pollutant Water Quality Monitoring
by Zhexu Xi, Robert Nicolas and Jiayi Wei
Water 2025, 17(14), 2065; https://doi.org/10.3390/w17142065 - 10 Jul 2025
Viewed by 233
Abstract
Real-time, high-resolution monitoring of chemically diverse water pollutants remains a critical challenge for smart water management. Here, we report a fully integrated, multi-modal nano-sensor array, combining graphene field-effect transistors, Ag/Au-nanostar surface-enhanced Raman spectroscopy substrates, and CdSe/ZnS quantum dot fluorescence, coupled to an edge-deployable [...] Read more.
Real-time, high-resolution monitoring of chemically diverse water pollutants remains a critical challenge for smart water management. Here, we report a fully integrated, multi-modal nano-sensor array, combining graphene field-effect transistors, Ag/Au-nanostar surface-enhanced Raman spectroscopy substrates, and CdSe/ZnS quantum dot fluorescence, coupled to an edge-deployable CNN-LSTM architecture that fuses raw electrochemical, vibrational, and photoluminescent signals without manual feature engineering. The 45 mm × 20 mm microfluidic manifold enables continuous flow-through sampling, while 8-bit-quantised inference executes in 31 ms at <12 W. Laboratory calibration over 28,000 samples achieved limits of detection of 12 ppt (Pb2+), 17 pM (atrazine) and 87 ng L−1 (nanoplastics), with R2 ≥ 0.93 and a mean absolute percentage error <6%. A 24 h deployment in the Cherwell River reproduced natural concentration fluctuations with field R2 ≥ 0.92. SHAP and Grad-CAM analyses reveal that the network bases its predictions on Dirac-point shifts, characteristic Raman bands, and early-time fluorescence-quenching kinetics, providing mechanistic interpretability. The platform therefore offers a scalable route to smart water grids, point-of-use drinking water sentinels, and rapid environmental incident response. Future work will address sensor drift through antifouling coatings, enhance cross-site generalisation via federated learning, and create physics-informed digital twins for self-calibrating global monitoring networks. Full article
Show Figures

Figure 1

30 pages, 5474 KiB  
Article
WHU-RS19 ABZSL: An Attribute-Based Dataset for Remote Sensing Image Understanding
by Mattia Balestra, Marina Paolanti and Roberto Pierdicca
Remote Sens. 2025, 17(14), 2384; https://doi.org/10.3390/rs17142384 - 10 Jul 2025
Viewed by 197
Abstract
The advancement of artificial intelligence (AI) in remote sensing (RS) increasingly depends on datasets that offer rich and structured supervision beyond traditional scene-level labels. Although existing benchmarks for aerial scene classification have facilitated progress in this area, their reliance on single-class annotations restricts [...] Read more.
The advancement of artificial intelligence (AI) in remote sensing (RS) increasingly depends on datasets that offer rich and structured supervision beyond traditional scene-level labels. Although existing benchmarks for aerial scene classification have facilitated progress in this area, their reliance on single-class annotations restricts their application to more flexible, interpretable and generalisable learning frameworks. In this study, we introduce WHU-RS19 ABZSL: an attribute-based extension of the widely adopted WHU-RS19 dataset. This new version comprises 1005 high-resolution aerial images across 19 scene categories, each annotated with a vector of 38 features. These cover objects (e.g., roads and trees), geometric patterns (e.g., lines and curves) and dominant colours (e.g., green and blue), and are defined through expert-guided annotation protocols. To demonstrate the value of the dataset, we conduct baseline experiments using deep learning models that had been adapted for multi-label classification—ResNet18, VGG16, InceptionV3, EfficientNet and ViT-B/16—designed to capture the semantic complexity characteristic of real-world aerial scenes. The results, which are measured in terms of macro F1-score, range from 0.7385 for ResNet18 to 0.7608 for EfficientNet-B0. In particular, EfficientNet-B0 and ViT-B/16 are the top performers in terms of the overall macro F1-score and consistency across attributes, while all models show a consistent decline in performance for infrequent or visually ambiguous categories. This confirms that it is feasible to accurately predict semantic attributes in complex scenes. By enriching a standard benchmark with detailed, image-level semantic supervision, WHU-RS19 ABZSL supports a variety of downstream applications, including multi-label classification, explainable AI, semantic retrieval, and attribute-based ZSL. It thus provides a reusable, compact resource for advancing the semantic understanding of remote sensing and multimodal AI. Full article
(This article belongs to the Special Issue Remote Sensing Datasets and 3D Visualization of Geospatial Big Data)
Show Figures

Figure 1

Back to TopTop