Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (1,201)

Search Parameters:
Keywords = spatial modalities

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
41 pages, 10740 KB  
Article
Dynamic Multi-Relation Learning with Multi-Scale Hypergraph Transformer for Multi-Modal Traffic Forecasting
by Juan Chen and Meiqing Shan
Future Transp. 2026, 6(1), 51; https://doi.org/10.3390/futuretransp6010051 (registering DOI) - 22 Feb 2026
Abstract
Accurate multi-modal traffic demand forecasting is key to optimizing intelligent transportation systems (ITSs). To overcome the shortcomings of existing methods in capturing dynamic high-order correlations between heterogeneous spatial units and decoupling intra- and inter-mode dependencies at multiple time scales, this paper proposes a [...] Read more.
Accurate multi-modal traffic demand forecasting is key to optimizing intelligent transportation systems (ITSs). To overcome the shortcomings of existing methods in capturing dynamic high-order correlations between heterogeneous spatial units and decoupling intra- and inter-mode dependencies at multiple time scales, this paper proposes a Dynamic Multi-Relation Learning with Multi-Scale Hypergraph Transformer method (MST-Hype Trans). The model integrates three novel modules. Firstly, the Multi-Scale Temporal Hypergraph Convolutional Network (MSTHCN) achieves collaborative decoupling and captures periodic and cross-modal temporal interactions of transportation demand at multiple granularities, such as time, day, and week, by constructing a multi-scale temporal hypergraph. Secondly, the Dynamic Multi-Relationship Spatial Hypergraph Network (DMRSHN) innovatively integrates geographic proximity, passenger flow similarity, and transportation connectivity to construct structural hyperedges and combines KNN and K-means algorithms to generate dynamic hyperedges, thereby accurately modeling the high-order spatial correlations of dynamic evolution between heterogeneous nodes. Finally, the Conditional Meta Attention Gated Fusion Network (CMAGFN), as a lightweight meta network, introduces a gate control mechanism based on multi-head cross-attention. It can dynamically generate node features based on real-time traffic context and adaptively calibrate the fusion weights of multi-source information, achieving optimal prediction decisions for scene perception. Experiments on three real-world datasets (NYC-Taxi, -Bike, and -Subway) demonstrate that MST-Hyper Trans achieves an average reduction of 7.6% in RMSE and 9.2% in MAE across all modes compared to the strongest baseline, while maintaining interpretability of spatiotemporal interactions. This study not only provides good model interpretability but also offers a reliable solution for multi-modal traffic collaborative management. Full article
22 pages, 2918 KB  
Article
MV-RiskNet: Multi-View Attention-Based Deep Learning Model for Regional Epidemic Risk Prediction and Mapping
by Beyzanur Okudan and Abdullah Ammar Karcioglu
Appl. Sci. 2026, 16(4), 2135; https://doi.org/10.3390/app16042135 (registering DOI) - 22 Feb 2026
Abstract
Regional epidemic risk prediction requires holistic modeling of heterogeneous data sources such as demographic structure, health capacity, geographical features and human mobility. In this study, a unique and multi-modal epidemiological data set integrating demographic, health, geographic and mobility indicators of Türkiye and its [...] Read more.
Regional epidemic risk prediction requires holistic modeling of heterogeneous data sources such as demographic structure, health capacity, geographical features and human mobility. In this study, a unique and multi-modal epidemiological data set integrating demographic, health, geographic and mobility indicators of Türkiye and its neighboring countries was collected. Türkiye’s neighboring countries are Greece, Bulgaria, Georgia, Armenia, Iran, and Iraq. This dataset, created by combining raw data from these neighboring countries, provides a comprehensive regional representation that allows for both quantitative classification and spatial mapping of epidemiological risk. To address the class imbalance problem, Conditional GAN (CGAN), a class-conditional synthetic example generation approach that enhances high-risk category representation was used. In this study, we proposed a multi-view deep learning model named MV-RiskNet, which effectively models the multi-dimensional data structure by processing each view into independent subnetworks and integrating the representations with an attention-based fusion mechanism for regional epidemic risk prediction. Experimental studies were compared using Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Autoencoder classifier, and Graph Convolutional Network (GCN) models. The proposed MV-RiskNet with CGAN model achieved better results compared to other models, with 97.22% accuracy and 97.40% F1-score. The generated risk maps reveal regional clustering patterns in a spatially consistent manner, while attention analyses show that demographic and geographic features are the dominant determinants, while mobility plays a complementary role, especially in high-risk regions. Full article
Show Figures

Figure 1

16 pages, 2796 KB  
Article
MiMics-Net: A Multimodal Interaction Network for Blastocyst Component Segmentation
by Adnan Haider, Muhammad Arsalan and Kyungeun Cho
Diagnostics 2026, 16(4), 631; https://doi.org/10.3390/diagnostics16040631 (registering DOI) - 21 Feb 2026
Abstract
Objectives: Global infertility rates are rapidly increasing. Assisted reproductive technologies combined with artificial intelligence are the next hope for overcoming infertility. In vitro fertilization (IVF) is gaining popularity owing to its increasing success rates. The success rate of IVF essentially depends on the [...] Read more.
Objectives: Global infertility rates are rapidly increasing. Assisted reproductive technologies combined with artificial intelligence are the next hope for overcoming infertility. In vitro fertilization (IVF) is gaining popularity owing to its increasing success rates. The success rate of IVF essentially depends on the assessment and inspection of blastocysts. Blastocysts can be segmented into several important compartments, and advanced and precise assessment of these compartments is strongly associated with successful pregnancies. However, currently, embryologists must manually analyze blastocysts, which is a time-consuming, subjective, and error-prone process. Several AI-based techniques, including segmentation, have been recently proposed to fill this gap. However, most existing methods rely only on raw grayscale intensity and do not perform well under challenging blastocyst image conditions, such as low contrast, similarity in textures, shape variability, and class imbalance. Methods: To overcome this limitation, we developed a novel and lightweight architecture, the microscopic multimodal interaction segmentation network (MiMics-Net), to accurately segment blastocyst components. MiMics-Net employs a multimodal blastocyst stem to decompose and process each frame into three modalities (photometric intensity, local textures, and directional orientation), followed by feature fusion to enhance segmentation performance. Moreover, MiMic dual-path grouped blocks have been designed, in which parallel-grouped convolutional paths are fused through point-wise convolutional layers to increase diverse learning. A lightweight refinement decoder is employed to refine and restore the spatial features while maintaining computational efficiency. Finally, semantic skip pathways are induced to transfer low- and mid-level spatial features after passing through the grouped and point-wise convolutional layers. Results/Conclusions: MiMics-Net was evaluated using a publicly available human blastocyst dataset and achieved a Jaccard index score of 87.9% while requiring only 0.65 million trainable parameters. Full article
Show Figures

Figure 1

24 pages, 3947 KB  
Article
MDF-iTransformer: Multi Data Fusion-Based iTransformer for Load Prediction of Zero-Carbon Emission Integrated Energy System in Urban Park
by Yang Wei, Zhengwei Chang, Feng Yang, Han Zhang, Jie Zhang, Yumin Chen and Maomao Yan
Algorithms 2026, 19(2), 164; https://doi.org/10.3390/a19020164 (registering DOI) - 21 Feb 2026
Abstract
To predict the output power of integrated energy systems (IES) under zero-carbon conditions, this research presents a Multi Data Fusion-based iTransformer prediction network (MDF-iTransformer). The network uses Multivariate Singular Spectrum Analysis (MSSA) to identify nonlinear relationships among variables and extract dynamic features from [...] Read more.
To predict the output power of integrated energy systems (IES) under zero-carbon conditions, this research presents a Multi Data Fusion-based iTransformer prediction network (MDF-iTransformer). The network uses Multivariate Singular Spectrum Analysis (MSSA) to identify nonlinear relationships among variables and extract dynamic features from multi-modal data. It integrates an embedding block and multivariate attention module into the iTransformer network to capture complex patterns and long-term temporal dependencies in multi-dimensional data, thereby extracting dynamic features across different time scales and spatial dimensions. Subsequently, to address the issue of imbalanced datasets, the improved K-means-SMOTE (KS) algorithm is adopted to augment the number of small-class samples, effectively reducing model bias. Experimental results indicate that the proposed MDF-iTransformer achieves a root-mean-square error (RMSE) of 7.2 kW, mean absolute error (MAE) of 5.6 kW, mean absolute percentage error (MAPE) of 2.7%, and an R-squared value (R2) of 0.92 for a 1 h prediction horizon. It still maintains an RMSE of 14.4 kW, MAE of 11.9 kW, MAPE of 3.68%, and R2 of 0.74 at the 10 h horizon, with cross-season load forecasting errors consistently below 4%. Compared with other algorithms, MDF-iTransformer demonstrates higher accuracy and stronger robustness, playing a crucial role in the optimal operation of integrated energy systems. Full article
20 pages, 1514 KB  
Article
Decoupled Bidirectional Spatio-Temporal Fusion Network for Hybrid EEG-fNIRS Cognitive Task Classification
by Zirui Wang, Guanghao Huang, Zhuochao Chen, Xiaorui Liu, Yinhua Liu and Keum-Shik Hong
Brain Sci. 2026, 16(2), 241; https://doi.org/10.3390/brainsci16020241 (registering DOI) - 21 Feb 2026
Abstract
Background/Objectives: Multimodal neuroimaging, particularly the integration of electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a key methodology for investigating brain function and classifying neural activity. However, the efficient fusion of these two signals remains a formidable challenge due to their [...] Read more.
Background/Objectives: Multimodal neuroimaging, particularly the integration of electroencephalography (EEG) and functional near-infrared spectroscopy (fNIRS), has emerged as a key methodology for investigating brain function and classifying neural activity. However, the efficient fusion of these two signals remains a formidable challenge due to their significant spatio-temporal heterogeneity. This paper presents the BiSTF-Net, which integrates decoupled and bi-directional spatio-temporal fusion mechanisms to enhance the performance of cognitive task recognition. Methods: In BiSTF-Net, the spatial features of EEG and fNIRS are mutually guided and enhanced through an efficient bi-directional cross modal guidance (Bi-CMG). Then, the temporal latencies of fNIRS signals are aligned in a data-driven manner using adaptive temporal alignment (ATA). Subsequently, the aligned features are deeply fused into a modality-invariant, discriminative representation via a symmetric cross-attention fusion (SCAF) module. Results: Evaluated on the mental arithmetic (MA), motor imagery (MI), and word generation (WG) tasks, the BiSTF-Net achieves average accuracies of 83.33%, 82.09%, and 84.99% respectively. Conclusions: The BiSTF-Net exhibits superior performance compared to the existing methods, offers a robust and interpretable solution for multimodal EEG-fNIRS cognitive task classification, and provides a methodological foundation for future extensions to other multimodal data and broader real-world clinical applications. Full article
25 pages, 101353 KB  
Article
A Metaheuristic Optimization Algorithm for Task Clustering in Collaborative Multi-Cluster Systems
by Meixuan Li, Yongping Hao, Hui Zhang and Jiulong Xu
Sensors 2026, 26(4), 1364; https://doi.org/10.3390/s26041364 - 20 Feb 2026
Abstract
To address the task-grouping problem for air–ground integrated Unmanned Aerial Vehicle (UAV) swarm missions in three-dimensional (3D) environments, this study proposes a data-preprocessing and hybrid initialization clustering method based on 3D spatial features. A dual-modal prototype meta-heuristic optimization model, Dual-Prototype Metaheuristic K-Means (DPM-Kmeans), [...] Read more.
To address the task-grouping problem for air–ground integrated Unmanned Aerial Vehicle (UAV) swarm missions in three-dimensional (3D) environments, this study proposes a data-preprocessing and hybrid initialization clustering method based on 3D spatial features. A dual-modal prototype meta-heuristic optimization model, Dual-Prototype Metaheuristic K-Means (DPM-Kmeans), is constructed accordingly. First, to overcome spatial information loss in high-dimensional task allocation, a 3D spatial task data preprocessing technique and a hybrid initialization strategy based on the golden spiral distribution are designed. This ensures the diversity and environmental adaptability of the initial solutions. Second, a dual-modal prototype optimization framework incorporating row prototypes (local refinement) and column prototypes (global combination) was constructed using meta-heuristics and clustering algorithms. The prototype-driven replacement update mechanism simultaneously performs global and local search, balancing the algorithm’s exploration and exploitation capabilities while expanding the solution space. This effectively addresses premature convergence issues in complex search spaces. Simultaneously, a collaborative multi-constraint, dynamically weighted optimization model was constructed, incorporating task requirements and flight distance constraints to ensure that the grouping scheme approximates the global optimum. Simulation results demonstrate that compared to traditional K-means and mainstream meta-heuristic optimization algorithms, DPM-Kmeans achieves an overall improvement of 2–10% in Sum of Squared Errors (SSE), Silhouette Coefficient (SC), and Davies–Bouldin Index (DB) metrics. It exhibits superior convergence speed and solution quality, proving the method’s excellent scalability and robustness in multi-constraint, large-scale 3D scenarios. Full article
(This article belongs to the Section Sensors and Robotics)
19 pages, 358 KB  
Article
Edge-Level Forest Fire Prediction with Selective Communication in Hierarchical Wireless Sensor Networks
by Ahshanul Haque and Hamdy Soliman
Electronics 2026, 15(4), 881; https://doi.org/10.3390/electronics15040881 - 20 Feb 2026
Viewed by 47
Abstract
Wildfire events are increasing in frequency and severity, creating an urgent need for early, accurate, and energy-efficient forest fire prediction systems that can operate at a large scale. A fundamental challenge in edge-level forest fire prediction lies in jointly achieving high detection accuracy [...] Read more.
Wildfire events are increasing in frequency and severity, creating an urgent need for early, accurate, and energy-efficient forest fire prediction systems that can operate at a large scale. A fundamental challenge in edge-level forest fire prediction lies in jointly achieving high detection accuracy while minimizing wireless transmissions and communication-related energy consumption. This paper proposes a communication-aware hierarchical wireless sensor network (WSN) framework that performs fire versus normal environmental state classification directly at the network edge. Multi-modal physical and constrained virtual sensor readings are fused into short-term temporal supervectors and processed locally using lightweight random forest classifiers deployed on sensor nodes and cluster heads. A temporal 2-of-3 voting mechanism is applied at the edge to suppress transient noise and improve prediction reliability before triggering communication. The proposed design enables selective, event-driven transmission, where only temporally validated abnormal states are forwarded through the hierarchy, thereby decoupling detection accuracy from continuous data reporting. Extensive experiments using real multi-modal environmental sensor data and statistically rigorous 5-fold GroupKFold cross-validation—ensuring strict node-level separation between training and testing—demonstrate the effectiveness of the approach. The proposed framework achieves a node-level accuracy of 98.82 ± 1.75% and a scenario-level detection accuracy of 96.52 ± 0.89%. Compared to periodic reporting and the LEACH protocol, the system reduces wireless transmissions by over 66% and communication-related energy consumption by more than 66% across network sizes ranging from 100 to 1000 nodes. The main contributions of this work are summarized as follows: (1) a communication-aware hierarchical Edge-AI framework for early forest fire prediction that performs local inference and temporal validation directly at sensor nodes; (2) a constrained virtual sensing strategy integrated with temporal supervector modeling to enhance spatial coverage while preserving reliability; and (3) a statistically rigorous large-scale evaluation demonstrating joint optimization of prediction accuracy, transmission reduction, and communication energy efficiency across network sizes ranging from 100 to 1000 nodes. These results show that accurate early forest fire prediction can be achieved through edge-level inference and selective communication, substantially extending network lifetime while maintaining statistically reliable detection performance. Full article
(This article belongs to the Special Issue AI and Machine Learning in Recommender Systems and Customer Behavior)
Show Figures

Figure 1

17 pages, 1690 KB  
Article
Plugged or Unplugged? A Comparative Study of Computational Thinking Development in Early Childhood
by Maria-Emilia Garcia-Marques, Adrián Pérez-Suay and Ismael García-Bayona
Educ. Sci. 2026, 16(2), 333; https://doi.org/10.3390/educsci16020333 - 18 Feb 2026
Viewed by 120
Abstract
Computational thinking (CT) has increasingly been recognized as a fundamental skill that should be fostered from early childhood. This study investigated the comparative effectiveness of plugged (robot-based) and unplugged (without technology) instructional activities on the development of CT skills in young children. Two [...] Read more.
Computational thinking (CT) has increasingly been recognized as a fundamental skill that should be fostered from early childhood. This study investigated the comparative effectiveness of plugged (robot-based) and unplugged (without technology) instructional activities on the development of CT skills in young children. Two natural classroom groups participated, each receiving the same instructional content and assessment, differing only in intervention modality: one utilized the Bee-bot floor robot, while the other engaged in unplugged activities simulating the robot’s movements. Pre- and post-intervention assessments measured CT and spatial reasoning skills to evaluate learning gains. Results demonstrated significant improvements in CT across both groups, with no statistically significant differences in overall gains, suggesting that unplugged activities, when thoughtfully designed, can be as effective as technology-supported ones. These findings have important implications for designing inclusive and resource-sensitive early childhood CT curricula, emphasizing the value of developmentally appropriate and engaging learning experiences beyond technological availability. Full article
(This article belongs to the Special Issue Computational Thinking and Programming in Early Childhood Education)
Show Figures

Figure 1

29 pages, 6009 KB  
Article
Mamba-Based Infrared and Visible Images Fusion Method
by Jinsong He, Jianghua Cheng, Tong Liu, Bang Cheng, Xiaoyi Pan and Yahui Cai
Remote Sens. 2026, 18(4), 636; https://doi.org/10.3390/rs18040636 - 18 Feb 2026
Viewed by 114
Abstract
Visible-infrared image fusion is crucial for applications like autonomous driving and nighttime surveillance, yet it remains challenging due to the inherent limitations of existing deep learning models. Convolutional Neural Networks (CNNs) are constrained by their local receptive fields, while Transformers suffer from quadratic [...] Read more.
Visible-infrared image fusion is crucial for applications like autonomous driving and nighttime surveillance, yet it remains challenging due to the inherent limitations of existing deep learning models. Convolutional Neural Networks (CNNs) are constrained by their local receptive fields, while Transformers suffer from quadratic computational complexity. To address these issues, this paper investigates the application of the Mamba model—a novel State Space Model (SSM) with linear-complexity global modeling and selective scanning capabilities—to the task of visible-infrared image fusion. Building upon Mamba, we propose a novel fusion framework featuring two key designs: (1) A Multi-Path Mamba (MPMamba) module that orchestrates parallel Mamba blocks with convolutional streams to extract multi-scale, modality-specific features; and (2) a Dual-path Mamba Attention Fusion (DMAF) module that explicitly decouples and processes shared and complementary features via dual Mamba paths, followed by dynamic calibration with a Convolutional Block Attention Module (CBAM). Extensive experiments on the MSRS benchmark demonstrate that our framework achieves state-of-the-art performance, outperforming strong baselines such as U2Fusion and SwinFusion across key metrics including Information Entropy (EN), Spatial Frequency (SF), Mutual Information (MI), and edge-based fusion quality (Qabf). Visual results confirm its ability to produce fused images that saliently preserve thermal targets while retaining rich texture details. Full article
Show Figures

Figure 1

26 pages, 3774 KB  
Article
A Multimodal Dual-Stream Cross-Attention Deep Learning Framework for Diabetic Foot Ulcer Classification
by Mehmet Umut Salur
Appl. Sci. 2026, 16(4), 1993; https://doi.org/10.3390/app16041993 - 17 Feb 2026
Viewed by 208
Abstract
Finding diabetic foot ulcers (DFUs) early and accurately is essential for improving patients’ quality of life and lowering the risk of amputation. RGB images, commonly used in automated DFU detection, have limitations such as lighting variations, color inconsistencies, and inability to directly reflect [...] Read more.
Finding diabetic foot ulcers (DFUs) early and accurately is essential for improving patients’ quality of life and lowering the risk of amputation. RGB images, commonly used in automated DFU detection, have limitations such as lighting variations, color inconsistencies, and inability to directly reflect physiological information. Background/Objectives: Although thermal images can capture temperature anomalies associated with inflammation and circulatory disorders, they cannot provide consistent performance due to their low spatial resolution and limited availability in clinical datasets. Furthermore, the lack of paired RGB–thermal image pairs makes it difficult to develop effective multimodal deep learning models. Methods: This study proposes a two-stage multimodal deep learning approach to overcome these limitations. In the first stage, an RGB2T-cGAN (RGB to Thermal cGAN) model based on pix2pix was designed to generate synthetic thermal representations from RGB images that resemble clinical patterns, thereby addressing the missing modality problem. In the second stage, the Multimodal Dual-Stream Multi-Head Cross-Attention (MDS-MHCA) classifier model was developed, which processes DFU RGB and generated synthetic thermal images through separate streams, enabling the dynamic modeling of complementary information across modalities. Results: The proposed MDS-MHCA model achieved 99.06% accuracy, 99.09% recall, and 99.06% F1-score on the test set, demonstrating a clear advantage over models based solely on RGB (91.51% accuracy) or thermal (96.23% accuracy) modalities. Furthermore, patient-based 10-fold GroupKFold cross-validation results demonstrate that the model offers high generalization capability across different patient groups, with an average accuracy of 96.49 ± 1.04 and an AUC value of 0.9927 ± 0.0067. Conclusions: The findings reveal that the proposed approach, through the integration of synthetic thermal information and cross-attention-based multimodal fusion, overcomes the fundamental limitations of single-modality-based systems and offers a DFU detection system that is more robust and reliable and holds potential for integration into clinical decision support systems. Full article
Show Figures

Figure 1

24 pages, 11174 KB  
Article
JMSC: Joint Spatial–Temporal Modeling with Semantic Completion for Audio–Visual Learning
by Xinfu Xu, Fan Yang and Zhibin Yu
Sensors 2026, 26(4), 1288; https://doi.org/10.3390/s26041288 - 16 Feb 2026
Viewed by 224
Abstract
‌Audio–visual learning‌ seeks to achieve holistic scene understanding by integrating auditory and visual cues. Early research focused on fully fine-tuning pre-trained models, incurring high computational costs. Consequently, recent studies have adopted ‌parameter-efficient tuning‌ methods to adapt large-scale vision models to the audio–visual domain. [...] Read more.
‌Audio–visual learning‌ seeks to achieve holistic scene understanding by integrating auditory and visual cues. Early research focused on fully fine-tuning pre-trained models, incurring high computational costs. Consequently, recent studies have adopted ‌parameter-efficient tuning‌ methods to adapt large-scale vision models to the audio–visual domain. Despite the competitive performance of existing methods, several challenges persist. Firstly, effectively leveraging the ‌complementary semantics‌ between the audio and visual modalities remains difficult, as these two modalities capture fundamentally different aspects of a video. Secondly, comprehending ‌dynamic video context is challenging because both spatial attributes (such as scale) and temporal characteristics (such as motion) of objects co-evolve over time, making semantic comprehension more complex. To address these challenges, we propose a novel framework, named Joint Spatial–Temporal Modeling with Semantic Completion (JMSC). JMSC introduces cross-modal latent reconstruction, which moves beyond shallow correlation by encouraging the model to reconstruct one modality’s complete semantic summary from a masked version of its counterpart. Furthermore, JMSC learns a unified representation of video spatial attributes and temporal changes by jointly modeling them under audio guidance, enabling accurate localization and consistent tracking in dynamic video scenes. Experimental results demonstrate that JMSC achieves state-of-the-art performance across multiple downstream tasks while maintaining high computational efficiency. Full article
22 pages, 2874 KB  
Article
From Signal to Semantics: The Multimodal Haptic Informatics Index for Triangulating Haptic Intent at the Edge
by Song Xu, Chen Li, Jia-Rong Li and Teng-Wen Chang
Electronics 2026, 15(4), 832; https://doi.org/10.3390/electronics15040832 - 15 Feb 2026
Viewed by 150
Abstract
Modern interaction with smart devices is hindered by the “Midas Touch” problem, where sensors frequently misinterpret incidental physical movements as intentional commands due to a lack of human context. This research addresses this conflict by introducing the Multimodal Haptic Informatics (MHI) index within [...] Read more.
Modern interaction with smart devices is hindered by the “Midas Touch” problem, where sensors frequently misinterpret incidental physical movements as intentional commands due to a lack of human context. This research addresses this conflict by introducing the Multimodal Haptic Informatics (MHI) index within a novel Scene–Action–Trigger (SAT) framework. The goal is to contextualize mechanical movements as human intent by integrating physical, spatial, and cognitive data locally at the edge. The methodology employs an “Action-as-primary indexing” mechanism where the Action channel (IMU) serves as a temporal anchor t, triggering high-resolution Scene (computer vision) and Trigger (audio) processing only during critical haptic events. Validated through a complex origami crane task generating 29,408 data frames, the framework utilizes a three-stage informatics derivation process: single-modal scoring, score weighting, and hand state mapping. Results demonstrate that applying an adaptive “Speedometer” logic successfully reclassifies the “Transitional State”. While this state constitutes over half of the behavioral dataset (54.76% on average), it is effectively disambiguated into meaningful intent using a self-trained local Large Language Model (LLM) for semantic verification. Furthermore, the event-driven sampling of 93 keyframes reduces the processing overhead by 99.68% compared to linear annotation. This study contributes a low-latency, privacy-preserving “Protocol of Assent” that maintains user agency by providing intelligent system suggestions based on confirmed haptic intensity. Full article
(This article belongs to the Special Issue New Trends in Human-Computer Interactions for Smart Devices)
36 pages, 2539 KB  
Review
Sensor Technologies for Water Velocity, Flow, and Wave Motion Measurement in Marine Environments: A Comprehensive Review
by Tiago Matos
J. Mar. Sci. Eng. 2026, 14(4), 365; https://doi.org/10.3390/jmse14040365 - 14 Feb 2026
Viewed by 169
Abstract
Measuring water motion is essential for oceanography, coastal engineering, and marine environmental monitoring. A wide range of sensing technologies is used to quantify water velocity, wave motion, and flow dynamics, each suited to specific spatial and temporal scales. This paper presents a comprehensive [...] Read more.
Measuring water motion is essential for oceanography, coastal engineering, and marine environmental monitoring. A wide range of sensing technologies is used to quantify water velocity, wave motion, and flow dynamics, each suited to specific spatial and temporal scales. This paper presents a comprehensive review of modern sensor technologies for marine flow measurement, covering mechanical, electromagnetic, pressure-based, acoustic, optical, MEMS-based, inertial, Lagrangian, and remote-sensing approaches. The operating principles, strengths, and limitations of each technology are examined alongside their suitability for different environments and deployment platforms, including moorings, buoys, vessels, autonomous underwater vehicles, and drifters. Special attention is given to rapidly advancing fields such as MEMS flow sensors, multi-sensor fusion, and hybrid systems that combine inertial, acoustic, and optical data. Applications range from high-resolution turbulence measurements to large-scale current mapping and wave characterization. Remaining challenges include biofouling, performance degradation in energetic shallow waters, uncertainties in indirect velocity estimation, and long-term calibration stability. By synthesizing the state of the art across sensing modalities, this review provides a unified perspective on current technological capabilities and identifies key trends shaping the future of marine flow measurement. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

17 pages, 6831 KB  
Technical Note
Transformer-Based Multi-Modal Fusion for Martian Impact Crater Classification
by Chen Yang, Yinghong Wu, Haishi Zhao and Minghao Zhao
Remote Sens. 2026, 18(4), 599; https://doi.org/10.3390/rs18040599 - 14 Feb 2026
Viewed by 102
Abstract
Impact craters, as key geomorphic features on Mars, provide important insights into surface processes and geological evolution. However, automatic classification of crater morphologies remains challenging due to substantial variations in size, degradation degree, and data quality across different types of Martian craters. This [...] Read more.
Impact craters, as key geomorphic features on Mars, provide important insights into surface processes and geological evolution. However, automatic classification of crater morphologies remains challenging due to substantial variations in size, degradation degree, and data quality across different types of Martian craters. This study proposes a multi-modal framework for Martian crater classification by integrating infrared imagery, an optical map, and digital elevation model (DEM) data. Specifically, daytime infrared imagery from THEMIS, a color map from the Tianwen-1 MoRIC instrument, and topographic data derived from combined MOLA–HRSC observations are used to capture complementary thermal, morphological, and elevation-related characteristics. A transformer-based feature extraction and cross-modal fusion strategy is adopted, where infrared imagery guides the interaction among multi-source features. Experiments on a carefully constructed dataset covering four crater categories, i.e., standard craters, layered ejecta craters, degraded craters, and secondary craters, demonstrate that the proposed approach achieves an overall precision of 0.848 and a recall of 0.851, outperforming single-modality baselines. Layered ejecta craters exhibit the highest classification performance, benefiting from their distinctive ejecta morphologies, whereas secondary craters remain more difficult to classify due to their small spatial scales. The results highlight the value of multi-modal data for Martian crater morphology classification. Full article
Show Figures

Figure 1

19 pages, 9943 KB  
Article
Identification of Natural Fractures in Shale Reservoirs Using a Multimodal Neural Network: A Case Study of the Chang 7 Shale Formation in the Ordos Basin
by Yawen He, Dalin Zhou, Yaxin Dun, Yulin Kou, Jing Ding, Wenzhao Sun, Shanshan Yang, Xin Zhang and Wei Dang
Processes 2026, 14(4), 657; https://doi.org/10.3390/pr14040657 - 14 Feb 2026
Viewed by 129
Abstract
Natural fractures are critical controls on shale oil storage and migration in the Upper Triassic Chang 7 Member of the Ordos Basin. However, conventional identification techniques—such as mud-invasion correction, R/S rescaled range analysis, and radioactive element analysis—are time-consuming, computationally intensive, and highly dependent [...] Read more.
Natural fractures are critical controls on shale oil storage and migration in the Upper Triassic Chang 7 Member of the Ordos Basin. However, conventional identification techniques—such as mud-invasion correction, R/S rescaled range analysis, and radioactive element analysis—are time-consuming, computationally intensive, and highly dependent on specialized logging data, limiting their large-scale application. To overcome these challenges, this study develops a multi-modal deep neural network that integrates conventional well logs with borehole imaging data. A coupled convolutional neural network (CNN) and deep neural network (DNN) architecture was constructed to predict fracture occurrence, dip angle, and aperture. The model achieves dip-angle prediction accuracies of 98.82% for both training and testing datasets, while aperture prediction accuracies reach 95.97% and 95.91%, respectively. Predicted dip angles are concentrated between 65° and 80°, deviating by less than 0.48° from measured values, whereas apertures fall mainly within 0.5–4.5 cm, with deviations below 0.21 cm except in extreme cases. The CNN branch effectively extracts spatial features from imaging logs, while the DNN branch captures nonlinear relationships in conventional logs. The integrated framework substantially improves fracture characterization accuracy and efficiency. This study provides a scalable and cost-effective approach for rapid fracture identification based on conventional logging data, reducing reliance on specialized imaging logs and supporting integrated geological and engineering evaluations in shale oil reservoirs. Full article
(This article belongs to the Section Energy Systems)
Show Figures

Figure 1

Back to TopTop