Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,722)

Search Parameters:
Keywords = modality fusion

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 32119 KB  
Article
NOAH: A Multi-Modal and Sensor Fusion Dataset for Generative Modeling in Remote Sensing
by Abdul Mutakabbir, Chung-Horng Lung, Marzia Zaman, Darshana Upadhyay, Kshirasagar Naik, Koreen Millard, Thambirajah Ravichandran and Richard Purcell
Remote Sens. 2026, 18(3), 466; https://doi.org/10.3390/rs18030466 (registering DOI) - 1 Feb 2026
Abstract
Earth Observation (EO) and Remote Sensing (RS) data are widely used in various fields, including weather, environment, and natural disaster modeling and prediction. EO and RS done through geostationary satellite constellations in fields such as these are limited to a smaller region, while [...] Read more.
Earth Observation (EO) and Remote Sensing (RS) data are widely used in various fields, including weather, environment, and natural disaster modeling and prediction. EO and RS done through geostationary satellite constellations in fields such as these are limited to a smaller region, while sun synchronous satellite constellations have discontinuous spatial and temporal coverage. This limits the ability of EO and RS data for near-real-time weather, environment, and natural disaster applications. To address these limitations, we introduce Now Observation Assemble Horizon (NOAH), a multi-modal, sensor fusion dataset that combines Ground-Based Sensors (GBS) of weather stations with topography, vegetation (land cover, biomass, and crown cover), and fuel types data from RS data sources. NOAH is collated using publicly available data from Environment and Climate Change Canada (ECCC), Spatialized CAnadian National Forest Inventory (SCANFI) and United States Geological Survey (USGS), which are well-maintained, documented, and reliable. Applications of the NOAH dataset include, but are not limited to, expanding RS data tiles, filling in missing data, and super-resolution of existing data sources. Additionally, Generative Artificial Intelligence (GenAI) or Generative Modeling (GM) can be applied for near-real-time model-generated or synthetic estimate data for disaster modeling in remote locations. This can complement the use of existing observations by field instruments, rather than replacing them. UNet backbone with Feature-wise Linear Modulation (FiLM) injection of GBS data was used to demonstrate the initial proof-of-concept modeling in this research. This research also lists ideal characteristics for GM or GenAI datasets for RS. The code and a subset of the NOAH dataset (NOAH mini) are made open-sourced. Full article
Show Figures

Figure 1

26 pages, 11755 KB  
Article
SAMKD: A Hybrid Lightweight Algorithm Based on Selective Activation and Masked Knowledge Distillation for Multimodal Object Detection
by Ruitao Lu, Zhanhong Zhuo, Siyu Wang, Jiwei Fan, Tong Shen and Xiaogang Yang
Remote Sens. 2026, 18(3), 450; https://doi.org/10.3390/rs18030450 (registering DOI) - 1 Feb 2026
Abstract
Multimodal object detection is currently a research hotspot in computer vision. However, the fusion of visible and infrared modalities inevitably increases computational complexity, making most high-performance detection models difficult to deploy on resource-constrained UAV edge devices. Although pruning and knowledge distillation are widely [...] Read more.
Multimodal object detection is currently a research hotspot in computer vision. However, the fusion of visible and infrared modalities inevitably increases computational complexity, making most high-performance detection models difficult to deploy on resource-constrained UAV edge devices. Although pruning and knowledge distillation are widely used for model compression, applying them independently often leads to an unstable accuracy–efficiency trade-off. Therefore, this paper proposes a hybrid lightweight algorithm named SAMKD, which combines selective activation pruning with masked knowledge distillation in a staged manner to improve efficiency while maintaining detection performance. Specifically, the selective activation network pruning model (SAPM) first reduces redundant computation by dynamically adjusting network weights and the activation state of input data to generate a lightweight student network. Then, the mask binary classification knowledge distillation (MBKD) strategy is introduced to compensate for this degradation by guiding the student network to recover missing representation patterns under masked feature learning. Moreover, MBKD reformulates classification logits into multiple foreground–background binary mappings, effectively alleviating the severe foreground–background imbalance commonly observed in UAV aerial imagery. This paper constructs a multimodal UAV aerial imagery object detection dataset, M2UD-18K, which includes 9 types of targets and over 18,000 pairs. Extensive experiments show that SAMKD performs well on the self-constructed M2UD-18K dataset, as well as the public DroneVehicle dataset, achieving a favorable trade-off between detection accuracy and detection speed. Full article
Show Figures

Figure 1

30 pages, 14668 KB  
Article
RAPT-Net: Reliability-Aware Precision-Preserving Tolerance-Enhanced Network for Tiny Target Detection in Wide-Area Coverage Aerial Remote Sensing
by Peida Zhou, Xiaojun Guo, Xiaoyong Sun, Bei Sun, Shaojing Su, Wei Jiang, Runze Guo, Zhaoyang Dang and Siyang Huang
Remote Sens. 2026, 18(3), 449; https://doi.org/10.3390/rs18030449 (registering DOI) - 1 Feb 2026
Abstract
Multi-platform aerial remote sensing supports critical applications including wide-area surveillance, traffic monitoring, maritime security, and search and rescue. However, constrained by observation altitude and sensor resolution, targets inherently exhibit small-scale characteristics, making small object detection a fundamental bottleneck. Aerial remote sensing faces three [...] Read more.
Multi-platform aerial remote sensing supports critical applications including wide-area surveillance, traffic monitoring, maritime security, and search and rescue. However, constrained by observation altitude and sensor resolution, targets inherently exhibit small-scale characteristics, making small object detection a fundamental bottleneck. Aerial remote sensing faces three unique challenges: (1) spatial heterogeneity of modality reliability due to scene diversity and illumination dynamics; (2) conflict between precise localization requirements and progressive spatial information degradation; (3) annotation ambiguity from imaging physics conflicting with IoU-based training. This paper proposes RAPT-Net with three core modules: MRAAF achieves scene-adaptive modality integration through two-stage progressive fusion; CMFE-SRP employs hierarchy-specific processing to balance spatial details and semantic enhancement; DS-STD increases positive sample coverage to 4× through spatial tolerance expansion. Experiments on VEDAI (satellite) and RGBT-Tiny (UAV) demonstrate mAP values of 62.22% and 18.52%, improving over the state of the art by 4.3% and 10.3%, with a 17.3% improvement on extremely tiny targets. Full article
(This article belongs to the Special Issue Small Target Detection, Recognition, and Tracking in Remote Sensing)
Show Figures

Figure 1

34 pages, 5749 KB  
Systematic Review
Remote Sensing and Machine Learning Approaches for Hydrological Drought Detection: A PRISMA Review
by Odwa August, Malusi Sibiya, Masengo Ilunga and Mbuyu Sumbwanyambe
Water 2026, 18(3), 369; https://doi.org/10.3390/w18030369 (registering DOI) - 31 Jan 2026
Abstract
Hydrological drought poses a significant threat to water security and ecosystems globally. While remote sensing offers vast spatial data, advanced analytical methods are required to translate this data into actionable insights. This review addresses this need by systematically synthesizing the state-of-the-art in using [...] Read more.
Hydrological drought poses a significant threat to water security and ecosystems globally. While remote sensing offers vast spatial data, advanced analytical methods are required to translate this data into actionable insights. This review addresses this need by systematically synthesizing the state-of-the-art in using convolutional neural networks (CNNs) and satellite-derived vegetation indices for hydrological drought detection. Following PRISMA guidelines, a systematic search of studies published between 1 January 2018 and August 2025 was conducted, resulting in 137 studies for inclusion. A narrative synthesis approach was adopted. Among the 137 studies included, 58% focused on hybrid CNN-LSTM models, with a marked increase in publications observed after 2020. The analysis reveals that hybrid spatiotemporal models are the most effective, demonstrating superior forecasting skill and in some cases achieving 10–20% higher accuracy than standalone CNNs. The most robust models employ multi-modal data fusion, integrating vegetation indices (VIs) with complementary data like Land Surface Temperature (LST). Future research should focus on enhancing model transferability and incorporating explainable AI (XAI) to strengthen the operational utility of drought early warning systems. Full article
(This article belongs to the Section Hydrology)
Show Figures

Figure 1

21 pages, 2562 KB  
Article
Drug–Target Interaction Prediction via Dual-Interaction Fusion
by Xingyang Li, Zepeng Li, Bo Wei and Yuni Zeng
Molecules 2026, 31(3), 498; https://doi.org/10.3390/molecules31030498 (registering DOI) - 31 Jan 2026
Abstract
Accurate prediction of drug–target interaction (DTI) is crucial for modern drug discovery. However, experimental assays are costly, and many existing computational models still face challenges in capturing multi-scale features, fusing cross-modal information, and modeling fine-grained drug–protein interactions. To address these challenges, We propose [...] Read more.
Accurate prediction of drug–target interaction (DTI) is crucial for modern drug discovery. However, experimental assays are costly, and many existing computational models still face challenges in capturing multi-scale features, fusing cross-modal information, and modeling fine-grained drug–protein interactions. To address these challenges, We propose Gated-Attention Dual-Fusion Drug–Target Interaction (GADFDTI), whose core contribution is a fusion module that constructs an explicit atom–residue similarity field, refines it with a lightweight 2D neighborhood operator, and performs gated bidirectional aggregation to obtain interaction-aware representations. To provide strong and width-aligned unimodal inputs to this fusion module, we integrate a compact multi-scale dense GCN for drug graphs and a masked multi-scale self-attention protein encoder augmented by a narrow 1D-CNN branch for local motif aggregation. Experiments on two benchmarks, Human and C. elegans, show that GADFDTI consistently outperforms several recently proposed DTI models, achieving AUC values of 0.986 and 0.996, respectively, with corresponding gains in precision and recall. A SARS-CoV-2 case study further demonstrates that GADFDTI can reliably prioritize clinically supported antiviral agents while suppressing inactive compounds, indicating its potential as an efficient in silico prescreening tool for lead-target discovery. Full article
18 pages, 10981 KB  
Article
Ensemble Entropy with Adaptive Deep Fusion for Short-Term Power Load Forecasting
by Yiling Wang, Yan Niu, Xuejun Li, Xianglong Dai, Xiaopeng Wang, Yong Jiang, Chenghu He and Li Zhou
Entropy 2026, 28(2), 158; https://doi.org/10.3390/e28020158 (registering DOI) - 31 Jan 2026
Abstract
Accurate power load forecasting is crucial for ensuring the safety and economic operation of power systems. However, the complex, non-stationary, and heterogeneous nature of power load data presents significant challenges for traditional prediction methods, particularly in capturing instantaneous dynamics and effectively fusing multi-feature [...] Read more.
Accurate power load forecasting is crucial for ensuring the safety and economic operation of power systems. However, the complex, non-stationary, and heterogeneous nature of power load data presents significant challenges for traditional prediction methods, particularly in capturing instantaneous dynamics and effectively fusing multi-feature information. This paper proposes a novel framework—Ensemble Entropy with Adaptive Deep Fusion (EEADF)—for short-term multi-feature power load forecasting. The framework introduces an ensemble instantaneous entropy extraction module to compute and fuse multiple entropy types (approximate, sample, and permutation entropies) in real-time within sliding windows, creating a sensitive representation of system states. A task-adaptive hierarchical fusion mechanism is employed to balance computational efficiency and model expressivity. For time-series forecasting tasks with relatively structured patterns, feature concatenation fusion is used that directly combines LSTM sequence features with multimodal entropy features. For complex multimodal understanding tasks requiring nuanced cross-modal interactions, multi-head self-attention fusion is implemented that dynamically weights feature importance based on contextual relevance. A dual-branch deep learning model is constructed that processes both raw sequences (via LSTM) and extracted entropy features (via MLP) in parallel. Extensive experiments on a carefully designed simulated multimodal dataset demonstrate the framework’s robustness in recognizing diverse dynamic patterns, achieving MSE of 0.0125, MAE of 0.0794, and R² of 0.9932. Validation on the real-world ETDataset for power load forecasting confirms that the proposed method significantly outperforms baseline models (LSTM, TCN, transformer, and informer) and traditional entropy methods across standard evaluation metrics (MSE, MAE, RMSE, MAPE, and R²). Ablation studies further verify the critical roles of both the entropy features and the fusion mechanism. Full article
(This article belongs to the Section Multidisciplinary Applications)
21 pages, 1289 KB  
Article
A Multi-Branch CNN–Transformer Feature-Enhanced Method for 5G Network Fault Classification
by Jiahao Chen, Yi Man and Yao Cheng
Appl. Sci. 2026, 16(3), 1433; https://doi.org/10.3390/app16031433 - 30 Jan 2026
Viewed by 24
Abstract
The deployment of 5G (Fifth-Generation) networks in industrial Internet of Things (IoT), intelligent transportation, and emergency communications introduces heterogeneous and dynamic network states, leading to frequent and diverse faults. Traditional fault detection methods typically emphasize either local temporal anomalies or global distributional characteristics, [...] Read more.
The deployment of 5G (Fifth-Generation) networks in industrial Internet of Things (IoT), intelligent transportation, and emergency communications introduces heterogeneous and dynamic network states, leading to frequent and diverse faults. Traditional fault detection methods typically emphasize either local temporal anomalies or global distributional characteristics, but rarely achieve an effective balance between the two. In this paper, we propose a parallel multi-branch convolutional neural network (CNN)–Transformer framework (MBCT) to improve fault diagnosis accuracy in 5G networks. Specifically, MBCT takes time-series network key performance indicator (KPI) data as input for training and performs feature extraction through three parallel branches: a CNN branch for local patterns and short-term fluctuations, a Transformer encoder branch for cross-layer and long-term dependencies, and a statistical branch for global features describing quality-of-experience (QoE) metrics. A gating mechanism and feature-weighted fusion are applied outside the branches to adjust inter-branch weights and intra-branch feature sensitivity. The fused representation is then nonlinearly mapped and fed into a classifier to generate the fault category. This paper evaluates the performance of the proposed model on both the publicly available TelecomTS multi-modal 5G network observability dataset and a self-collected SDR5GFD dataset based on software-defined radio (SDR). Experimental results demonstrate that the proposed model achieves superior performance in fault classification, achieving 87.7% accuracy on the TelecomTS dataset and 86.3% on the SDR5GFD dataset, outperforming the baseline models CNN, Transformer, and Random Forest. Moreover, the model contains approximately 0.57M parameters and requires about 0.3 MFLOPs per sample for inference, making it suitable for large-scale online fault diagnosis. Full article
Show Figures

Figure 1

21 pages, 2231 KB  
Article
Token Injection Transformer for Enhanced Fine-Grained Recognition
by Bing Ma, Zhengbei Jin, Junyi Li, Jindong Li, Pengfei Zhang, Xiaohui Song and Beibei Jin
Processes 2026, 14(3), 492; https://doi.org/10.3390/pr14030492 - 30 Jan 2026
Viewed by 19
Abstract
Fine-Grained Visual Classification (FGVC) involves distinguishing highly similar subordinate categories within the same basic-level class, presenting significant challenges due to subtle inter-class variations and substantial intra-class diversity. While Vision Transformer (ViT)-based approaches have demonstrated potential in this domain, they remain limited by two [...] Read more.
Fine-Grained Visual Classification (FGVC) involves distinguishing highly similar subordinate categories within the same basic-level class, presenting significant challenges due to subtle inter-class variations and substantial intra-class diversity. While Vision Transformer (ViT)-based approaches have demonstrated potential in this domain, they remain limited by two key issues: (1) the progressive loss of gradient-based edge and texture signals during hierarchical token aggregation and (2) insufficient extraction of discriminative fine-grained features. To overcome these limitations, we propose a Gradient-Aware Token Injection Transformer, a novel framework that explicitly incorporates gradient magnitude and orientation into token embeddings. This multi-modal feature fusion mechanism enhances the model’s capacity to preserve and leverage critical fine-grained visual cues. Extensive experiments on four standard FGVC benchmarks demonstrate the superiority of our approach, achieving 92.9% top-1 accuracy on CUB-200-2011, 90.5% on iNaturalist 2018, 93.2% on NABirds, and 95.3% on Stanford Cars, thereby validating its effectiveness and robustness. Full article
Show Figures

Figure 1

21 pages, 12301 KB  
Article
Visual Localization Algorithm with Dynamic Point Removal Based on Multi-Modal Information Association
by Jing Ni, Boyang Gao, Hongyuan Zhu, Minkun Zhao and Xiaoxiong Liu
ISPRS Int. J. Geo-Inf. 2026, 15(2), 60; https://doi.org/10.3390/ijgi15020060 - 30 Jan 2026
Viewed by 33
Abstract
To enhance the autonomous navigation capability of intelligent agents in complex environments, this paper presents a visual localization algorithm for dynamic scenes that leverages multi-source information fusion. The proposed approach is built upon an odometry framework integrating LiDAR, camera, and IMU data, and [...] Read more.
To enhance the autonomous navigation capability of intelligent agents in complex environments, this paper presents a visual localization algorithm for dynamic scenes that leverages multi-source information fusion. The proposed approach is built upon an odometry framework integrating LiDAR, camera, and IMU data, and incorporates the YOLOv8 model to extract semantic information from images, which is then fused with laser point cloud data. We design a dynamic point removal method based on multi-modal association, which links 2D image masks to 3D point cloud regions, applies Euclidean clustering to differentiate static and dynamic points, and subsequently employs PnP-RANSAC to eliminate any remaining undetected dynamic points. This process yields a robust localization algorithm for dynamic environments. Experimental results on datasets featuring dynamic objects and a custom-built hardware platform demonstrate that the proposed dynamic point removal method significantly improves both the robustness and accuracy of the visual localization system. These findings confirm the feasibility and effectiveness of our system, showcasing its capabilities in precise positioning and autonomous navigation in complex environments. Full article
Show Figures

Figure 1

21 pages, 2013 KB  
Article
Machine Learning Models for Reliable Gait Phase Detection Using Lower-Limb Wearable Sensor Data
by Muhammad Fiaz, Rosita Guido and Domenico Conforti
Appl. Sci. 2026, 16(3), 1397; https://doi.org/10.3390/app16031397 - 29 Jan 2026
Viewed by 82
Abstract
Accurate gait-phase detection is essential for rehabilitation monitoring, prosthetic control, and human–robot interaction. Artificial intelligence supports continuous, personalized mobility assessment by extracting clinically meaningful patterns from wearable sensors. A richer view of gait dynamics can be achieved by integrating additional signals, including inertial, [...] Read more.
Accurate gait-phase detection is essential for rehabilitation monitoring, prosthetic control, and human–robot interaction. Artificial intelligence supports continuous, personalized mobility assessment by extracting clinically meaningful patterns from wearable sensors. A richer view of gait dynamics can be achieved by integrating additional signals, including inertial, plantar flex, footswitch, and EMG data, leading to more accurate and informative gait analysis. Motivated by these needs, this study investigates discrete gait-phase recognition for the right leg using a multi-subject IMU dataset collected from lower-limb sensors. IMU recordings were segmented into 128-sample windows across 23 channels, and each window was flattened into a 2944-dimensional feature vector. To ensure reliable ground-truth labels, we developed an automatic relabeling pipeline incorporating heel-strike and toe-off detection, adaptive threshold tuning, and sensor fusion across sensor modalities. These windowed vectors were then used to train a comprehensive suite of machine learning models, including Random Forests, Extra Trees, k-Nearest Neighbors, XGBoost, and LightGBM. All models underwent systematic hyperparameter tuning, and their performance was assessed through k-fold cross-validation. The results demonstrate that tree-based ensemble models provide accurate and stable gait-phase classification with accuracy exceeding 97% across both test sets, underscoring their potential for future real-time gait analysis and lower-limb assistive technologies. Full article
Show Figures

Figure 1

22 pages, 45752 KB  
Article
Chrominance-Aware Multi-Resolution Network for Aerial Remote Sensing Image Fusion
by Shuying Li, Jiaxin Cheng, San Zhang and Wuwei Wang
Remote Sens. 2026, 18(3), 431; https://doi.org/10.3390/rs18030431 - 29 Jan 2026
Viewed by 67
Abstract
Spectral data obtained from upstream remote sensing tasks contain abundant complementary information. Infrared images are rich in radiative information, and visible images provide spatial details. Effective fusion of these two modalities improves the utilization of remote sensing data and provides a more comprehensive [...] Read more.
Spectral data obtained from upstream remote sensing tasks contain abundant complementary information. Infrared images are rich in radiative information, and visible images provide spatial details. Effective fusion of these two modalities improves the utilization of remote sensing data and provides a more comprehensive representation of target characteristics and texture details. The majority of current fusion methods focus primarily on intensity fusion between infrared and visible images. These methods ignore the chrominance information present in visible images and the interference introduced by infrared images on the color of fusion results. Consequently, the fused images exhibit inadequate color representation. To address these challenges, an infrared and visible image fusion method named Chrominance-Aware Multi-Resolution Network (CMNet) is proposed. CMNet integrates the Mamba module, which offers linear complexity and global awareness, into a U-Net framework to form the Multi-scale Spatial State Attention (MSSA) framework. Furthermore, the enhancement of the Mamba module through the design of the Chrominance-Enhanced Fusion (CEF) module leads to better color and detail representation in the fused image. Extensive experimental results show that the CMNet method delivers better performance compared to existing fusion methods across various evaluation metrics. Full article
(This article belongs to the Section Remote Sensing Image Processing)
17 pages, 1874 KB  
Article
A Large-Kernel and Scale-Aware 2D CNN with Boundary Refinement for Multimodal Ischemic Stroke Lesion Segmentation
by Omar Ibrahim Alirr
Eng 2026, 7(2), 59; https://doi.org/10.3390/eng7020059 - 29 Jan 2026
Viewed by 104
Abstract
Accurate segmentation of ischemic stroke lesions from multimodal magnetic resonance imaging (MRI) is fundamental for quantitative assessment, treatment planning, and outcome prediction; yet, it remains challenging due to highly heterogeneous lesion morphology, low lesion–background contrast, and substantial variability across scanners and protocols. This [...] Read more.
Accurate segmentation of ischemic stroke lesions from multimodal magnetic resonance imaging (MRI) is fundamental for quantitative assessment, treatment planning, and outcome prediction; yet, it remains challenging due to highly heterogeneous lesion morphology, low lesion–background contrast, and substantial variability across scanners and protocols. This work introduces Tri-UNetX-2D, a large-kernel and scale-aware 2D convolutional network with explicit boundary refinement for automated ischemic stroke lesion segmentation from DWI, ADC, and FLAIR MRI. The architecture is built on a compact U-shaped encoder–decoder backbone and integrates three key components: first, a Large-Kernel Inception (LKI) module that employs factorized depthwise separable convolutions and dilation to emulate very large receptive fields, enabling efficient long-range context modeling; second, a Scale-Aware Fusion (SAF) unit that learns adaptive weights to fuse encoder and decoder features, dynamically balancing coarse semantic context and fine structural detail; and third, a Boundary Refinement Head (BRH) that provides explicit contour supervision to sharpen lesion borders and reduce boundary error. Squeeze-and-Excitation (SE) attention is embedded within LKI and decoder stages to recalibrate channel responses and emphasize modality-relevant cues, such as DWI-dominant acute core and FLAIR-dominant subacute changes. On the ISLES 2022 multi-center benchmark, Tri-UNetX-2D improves Dice Similarity Coefficient from 0.78 to 0.86, reduces the 95th-percentile Hausdorff distance from 12.4 mm to 8.3 mm, and increases the lesion-wise F1-score from 0.71 to 0.81 compared with a plain 2D U-Net trained under identical conditions. These results demonstrate that the proposed framework achieves competitive performance with substantially lower complexity than typical 3D or ensemble-based models, highlighting its potential for scalable, clinically deployable stroke lesion segmentation. Full article
Show Figures

Figure 1

27 pages, 20805 KB  
Article
A Lightweight Radar–Camera Fusion Deep Learning Model for Human Activity Recognition
by Minkyung Jeon and Sungmin Woo
Sensors 2026, 26(3), 894; https://doi.org/10.3390/s26030894 - 29 Jan 2026
Viewed by 127
Abstract
Human activity recognition in privacy-sensitive indoor environments requires sensing modalities that remain robust under illumination variation and background clutter while preserving user anonymity. To this end, this study proposes a lightweight radar–camera fusion deep learning model that integrates motion signatures from FMCW radar [...] Read more.
Human activity recognition in privacy-sensitive indoor environments requires sensing modalities that remain robust under illumination variation and background clutter while preserving user anonymity. To this end, this study proposes a lightweight radar–camera fusion deep learning model that integrates motion signatures from FMCW radar with coarse spatial cues from ultra-low-resolution camera frames. The radar stream is processed as a Range–Doppler–Time cube, where each frame is flattened and sequentially encoded using a Transformer-based temporal model to capture fine-grained micro-Doppler patterns. The visual stream employs a privacy-preserving 4×5-pixel camera input, from which a temporal sequence of difference frames is extracted and modeled with a dedicated camera Transformer encoder. The two modality-specific feature vectors—each representing the temporal dynamics of motion—are concatenated and passed through a lightweight fully connected classifier to predict human activity categories. A multimodal dataset of synchronized radar cubes and ultra-low-resolution camera sequences across 15 activity classes was constructed for evaluation. Experimental results show that the proposed fusion model achieves 98.74% classification accuracy, significantly outperforming single-modality baselines (single-radar and single-camera). Despite its performance, the entire model requires only 11 million floating-point operations (11 MFLOPs), making it highly efficient for deployment on embedded or edge devices. Full article
(This article belongs to the Special Issue AI-Based Computer Vision Sensors & Systems—2nd Edition)
28 pages, 1521 KB  
Article
Image–Text Sentiment Analysis Based on Dual-Path Interaction Network with Multi-Level Consistency Learning
by Zhi Ji, Chunlei Wu, Qinfu Xu and Yixiang Wu
Electronics 2026, 15(3), 581; https://doi.org/10.3390/electronics15030581 - 29 Jan 2026
Viewed by 80
Abstract
With the continuous evolution of social media, users are increasingly inclined to express their personal emotions on digital platforms by integrating information presented in multiple modalities. Within this context, research on image–text sentiment analysis has garnered significant attention. Prior research efforts have made [...] Read more.
With the continuous evolution of social media, users are increasingly inclined to express their personal emotions on digital platforms by integrating information presented in multiple modalities. Within this context, research on image–text sentiment analysis has garnered significant attention. Prior research efforts have made notable progress by leveraging shared emotional concepts across visual and textual modalities. However, existing cross-modal sentiment analysis methods face two key challenges: Previous approaches often focus excessively on fusion, resulting in learned features that may not achieve emotional alignment; traditional fusion strategies are not optimized for sentiment tasks, leading to insufficient robustness in final sentiment discrimination. To address the aforementioned issues, this paper proposes a Dual-path Interaction Network with Multi-level Consistency Learning (DINMCL). It employs a multi-level feature representation module to decouple the global and local features of both text and image. These decoupled features are then fed into the Global Congruity Learning (GCL) and Local Crossing-Congruity Learning (LCL) modules, respectively. GCL models global semantic associations using Crossing Prompter, while LCL captures local consistency in fine-grained emotional cues across modalities through cross-modal attention mechanisms and adaptive prompt injection. Finally, a CLIP-based adaptive fusion layer integrates the multi-modal representations in a sentiment-oriented manner. Experiments on the MVSA_Single, MVSA_Multiple, and TumEmo datasets with baseline models such as CTMWA and CLMLF demonstrate that DINMCL significantly outperforms mainstream models in sentiment classification accuracy and F1-score and exhibits strong robustness when handling samples containing highly noisy symbols. Full article
(This article belongs to the Special Issue AI-Driven Image Processing: Theory, Methods, and Applications)
Show Figures

Figure 1

24 pages, 12770 KB  
Article
Multiscale RGB-Guided Fusion for Hyperspectral Image Super-Resolution
by Matteo Kolyszko, Marco Buzzelli, Simone Bianco and Raimondo Schettini
J. Imaging 2026, 12(2), 61; https://doi.org/10.3390/jimaging12020061 - 28 Jan 2026
Viewed by 167
Abstract
Hyperspectral imaging (HSI) enables fine spectral analysis but is often limited by low spatial resolution due to sensor constraints. To address this, we propose CGNet, a color-guided hyperspectral super-resolution network that leverages complementary information from low-resolution hyperspectral inputs and high-resolution RGB images. CGNet [...] Read more.
Hyperspectral imaging (HSI) enables fine spectral analysis but is often limited by low spatial resolution due to sensor constraints. To address this, we propose CGNet, a color-guided hyperspectral super-resolution network that leverages complementary information from low-resolution hyperspectral inputs and high-resolution RGB images. CGNet adopts a dual-encoder design: the RGB encoder extracts hierarchical spatial features, while the HSI encoder progressively upsamples spectral features. A multi-scale fusion decoder then combines both modalities in a coarse-to-fine manner to reconstruct the high-resolution HSI. Training is driven by a hybrid loss that balances L1 and Spectral Angle Mapper (SAM), which ablation studies confirm as the most effective formulation. Experiments on two benchmarks, ARAD1K and StereoMSI, at ×4 and ×6 upscaling factors demonstrate that CGNet consistently outperforms state-of-the-art baselines. CGNet achieves higher PSNR and SSIM, lower SAM, and reduced ΔE00, confirming its ability to recover sharp spatial structures while preserving spectral fidelity. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

Back to TopTop