You are currently on the new version of our website. Access the old version .

133 Results Found

  • Article
  • Open Access
10 Citations
4,128 Views
24 Pages

26 June 2024

In this paper, we leverage multimodal data to classify minerals using a multi-stream neural network. In a previous study on the Tinto dataset, which consisted of a 3D hyperspectral point cloud from the open-pit mine Corta Atalaya in Spain, we success...

  • Article
  • Open Access
9 Citations
3,813 Views
13 Pages

3 March 2023

Because of societal changes, human activity recognition, part of home care systems, has become increasingly important. Camera-based recognition is mainstream but has privacy concerns and is less accurate under dim lighting. In contrast, radar sensors...

  • Article
  • Open Access
4 Citations
1,857 Views
28 Pages

31 July 2025

Recently, deep learning algorithms have been increasingly applied in construction for activity recognition, particularly for excavators, to automate processes and enhance safety and productivity through continuous monitoring of earthmoving activities...

  • Article
  • Open Access
940 Views
21 Pages

18 September 2025

In large-scale recommendation scenarios, achieving high-precision ranking requires simultaneously modeling user interest dynamics and content propagation potential. In this work, we propose a unified framework that integrates a temporal interest mode...

  • Article
  • Open Access
498 Views
23 Pages

MSF-Net: A Data-Driven Multimodal Transformer for Intelligent Behavior Recognition and Financial Risk Reasoning in Virtual Live-Streaming

  • Yang Song,
  • Liman Zhang,
  • Ruoyun Zhang,
  • Haoyuan Zhan,
  • Mingyuan Dai,
  • Xinyi Hu,
  • Ranran Chen and
  • Manzhou Li

4 December 2025

With the rapid advancement of virtual human technology and live-streaming e-commerce, virtual anchors have increasingly become key interactive entities in the digital economy. However, emerging issues such as fake reviews, abnormal tipping, and illeg...

  • Systematic Review
  • Open Access
124 Citations
15,746 Views
23 Pages

Detecting Emotions through Electrodermal Activity in Learning Contexts: A Systematic Review

  • Anne Horvers,
  • Natasha Tombeng,
  • Tibor Bosse,
  • Ard W. Lazonder and
  • Inge Molenaar

26 November 2021

There is a strong increase in the use of devices that measure physiological arousal through electrodermal activity (EDA). Although there is a long tradition of studying emotions during learning, researchers have only recently started to use EDA to me...

  • Article
  • Open Access
16 Citations
6,820 Views
23 Pages

19 October 2023

This paper proposes a novel multimodal generative adversarial network AVSR (multimodal AVSR GAN) architecture, to improve both the energy efficiency and the AVSR classification accuracy of artificial intelligence Internet of things (IoT) applications...

  • Article
  • Open Access
34 Citations
6,981 Views
12 Pages

20 May 2020

Existing public domain multi-modal datasets for human action recognition only include actions of interest that have already been segmented from action streams. These datasets cannot be used to study a more realistic action recognition scenario where...

  • Article
  • Open Access
399 Views
24 Pages

18 November 2025

Temporal variability in online streams arises in information systems where heterogeneous modalities exhibit varying latencies and delay distributions. Efficient synchronization strategies help to establish a reliable flow and ensure a correct deliver...

  • Article
  • Open Access
4 Citations
2,437 Views
24 Pages

25 August 2023

Detecting anomalies in data streams from smart communication environments is a challenging problem that can benefit from novel learning techniques. The Attention Mechanism is a very promising architecture for addressing this problem. It allows the mo...

  • Systematic Review
  • Open Access
Logistics2026, 10(2), 29;https://doi.org/10.3390/logistics10020029 
(registering DOI)

23 January 2026

Background: Artificial intelligence (AI) in urban and multimodal transport has demonstrated strong potential; however, real-world deployment remains constrained by limited governance-ready design, fragmented data ecosystems, and single-objective opti...

  • Article
  • Open Access
15 Citations
5,187 Views
18 Pages

EmmDocClassifier: Efficient Multimodal Document Image Classifier for Scarce Data

  • Shrinidhi Kanchi,
  • Alain Pagani,
  • Hamam Mokayed,
  • Marcus Liwicki,
  • Didier Stricker and
  • Muhammad Zeshan Afzal

29 January 2022

Document classification is one of the most critical steps in the document analysis pipeline. There are two types of approaches for document classification, known as image-based and multimodal approaches. Image-based document classification approaches...

  • Article
  • Open Access
50 Citations
7,941 Views
24 Pages

Context-Aware Fusion of RGB and Thermal Imagery for Traffic Monitoring

  • Thiemo Alldieck,
  • Chris H. Bahnsen and
  • Thomas B. Moeslund

18 November 2016

In order to enable a robust 24-h monitoring of traffic under changing environmental conditions, it is beneficial to observe the traffic scene using several sensors, preferably from different modalities. To fully benefit from multi-modal sensor output...

  • Article
  • Open Access
6 Citations
4,067 Views
17 Pages

4 July 2024

Optical and Synthetic Aperture Radar (SAR) imagery offers a wealth of complementary information on a given target, attributable to the distinct imaging modalities of each component image type. Thus, multimodal remote sensing data have been widely use...

  • Article
  • Open Access
2,022 Views
15 Pages

Interactive Learning of a Dual Convolution Neural Network for Multi-Modal Action Recognition

  • Qingxia Li,
  • Dali Gao,
  • Qieshi Zhang,
  • Wenhong Wei and
  • Ziliang Ren

22 October 2022

RGB and depth modalities contain more abundant and interactive information, and convolutional neural networks (ConvNets) based on multi-modal data have achieved successful progress in action recognition. Due to the limitation of a single stream, it i...

  • Article
  • Open Access
63 Citations
8,427 Views
19 Pages

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

  • Madina Abdrakhmanova,
  • Askat Kuzdeuov,
  • Sheikh Jarju,
  • Yerbolat Khassanov,
  • Michael Lewis and
  • Huseyin Atakan Varol

16 May 2021

We present SpeakingFaces as a publicly-available large-scale multimodal dataset developed to support machine learning research in contexts that utilize a combination of thermal, visual, and audio data streams; examples include human–computer interact...

  • Article
  • Open Access
103 Citations
13,905 Views
21 Pages

20 August 2018

Autonomous robots that assist humans in day to day living tasks are becoming increasingly popular. Autonomous mobile robots operate by sensing and perceiving their surrounding environment to make accurate driving decisions. A combination of several d...

  • Article
  • Open Access
31 Citations
6,393 Views
17 Pages

On Robustness of Multi-Modal Fusion—Robotics Perspective

  • Michal Bednarek,
  • Piotr Kicki and
  • Krzysztof Walas

The efficient multi-modal fusion of data streams from different sensors is a crucial ability that a robotic perception system should exhibit to ensure robustness against disturbances. However, as the volume and dimensionality of sensory-feedback incr...

  • Article
  • Open Access
2 Citations
4,529 Views
23 Pages

22 July 2024

Multimodal sentiment analysis, a significant challenge in artificial intelligence, necessitates the integration of various data modalities for accurate human emotion interpretation. This study introduces the Advanced Multimodal Sentiment Analysis wit...

  • Article
  • Open Access
22 Citations
5,250 Views
25 Pages

3 September 2020

As is known, cerebral stroke has become one of the main diseases endangering people’s health; ischaemic strokes accounts for approximately 85% of cerebral strokes. According to research, early prediction and prevention can effectively reduce th...

  • Article
  • Open Access
48 Citations
5,589 Views
15 Pages

Multimodal Ground-Based Cloud Classification Using Joint Fusion Convolutional Neural Network

  • Shuang Liu,
  • Mei Li,
  • Zhong Zhang,
  • Baihua Xiao and
  • Xiaozhong Cao

25 May 2018

The accurate ground-based cloud classification is a challenging task and still under development. The most current methods are limited to only taking the cloud visual features into consideration, which is not robust to the environmental factors. In t...

  • Article
  • Open Access
1,402 Views
27 Pages

12 September 2025

In the era of big-data-driven multi-platform and multimodal health information dissemination, the rapid spread of false and misleading content poses a critical threat to public health awareness and decision making. To address this issue, a dual-strea...

  • Article
  • Open Access
9 Citations
5,090 Views
14 Pages

Alternative Vegetation States in Tropical Forests and Savannas: The Search for Consistent Signals in Diverse Remote Sensing Data

  • Sanath Sathyachandran Kumar,
  • Niall P. Hanan,
  • Lara Prihodko,
  • Julius Anchang,
  • C. Wade Ross,
  • Wenjie Ji and
  • Brianna M Lind

4 April 2019

Globally, the spatial distribution of vegetation is governed primarily by climatological factors (rainfall and temperature, seasonality, and inter-annual variability). The local distribution of vegetation, however, depends on local edaphic conditions...

  • Article
  • Open Access
17 Citations
6,169 Views
13 Pages

Recent research demonstrates that the fusion of multimodal images can improve the performance of pedestrian detectors under low-illumination environments. However, existing multimodal pedestrian detectors cannot adapt to the variability of environmen...

  • Article
  • Open Access
88 Citations
18,264 Views
12 Pages

10 February 2023

Recommendation systems, the best way to deal with information overload, are widely utilized to provide users with personalized content and services with high efficiency. Many recommendation algorithms have been researched and deployed extensively in...

  • Article
  • Open Access
63 Citations
5,091 Views
20 Pages

16 November 2020

Automated extraction of buildings from earth observation (EO) data has long been a fundamental but challenging research topic. Combining data from different modalities (e.g., high-resolution imagery (HRI) and light detection and ranging (LiDAR) data)...

  • Article
  • Open Access
29 Citations
4,959 Views
19 Pages

Semantic Segmentation of High-Resolution Airborne Images with Dual-Stream DeepLabV3+

  • Ozgun Akcay,
  • Ahmet Cumhur Kinaci,
  • Emin Ozgur Avsar and
  • Umut Aydar

In geospatial applications such as urban planning and land use management, automatic detection and classification of earth objects are essential and primary subjects. When the significant semantic segmentation algorithms are considered, DeepLabV3+ st...

  • Article
  • Open Access
815 Views
15 Pages

A Multimodal Power Sample Feature Migration Method Based on Dual Cross-Modal Information Decoupling

  • Zhenyu Chen,
  • Huaguang Yan,
  • Jianguang Du,
  • Yuhao Zhou,
  • Yi Chen,
  • Yunfeng Yan and
  • Shuai Zhao

10 September 2025

With the rapid development of energy transition and power system informatization, the efficient integration and feature migration of multimodal power data have become critical challenges for intelligent power systems. Existing methods often overlook...

  • Article
  • Open Access
240 Views
39 Pages

Trustworthy AI-IoT for Citizen-Centric Smart Cities: The IMTPS Framework for Intelligent Multimodal Crowd Sensing

  • Wei Li,
  • Ke Li,
  • Zixuan Xu,
  • Mengjie Wu,
  • Yang Wu,
  • Yang Xiong,
  • Shijie Huang,
  • Yijie Yin,
  • Yiping Ma and
  • Haitao Zhang

12 January 2026

The fusion of Artificial Intelligence and the Internet of Things (AI-IoT, also widely referred to as AIoT) offers transformative potential for smart cities, yet presents a critical challenge: how to process heterogeneous data streams from intelligent...

  • Article
  • Open Access
9 Citations
3,784 Views
18 Pages

12 January 2022

While the majority of social scientists still rely on traditional research instruments (e.g., surveys, self-reports, qualitative observations), multimodal sensing is becoming an emerging methodology for capturing human behaviors. Sensing technology h...

  • Article
  • Open Access
12 Citations
7,365 Views
17 Pages

20 October 2016

Biological and technical systems operate in a rich multimodal environment. Due to the diversity of incoming sensory streams a system perceives and the variety of motor capabilities a system exhibits there is no single representation and no singular u...

  • Article
  • Open Access
2 Citations
3,828 Views
24 Pages

Regulating Modality Utilization within Multimodal Fusion Networks

  • Saurav Singh,
  • Eli Saber,
  • Panos P. Markopoulos and
  • Jamison Heard

19 September 2024

Multimodal fusion networks play a pivotal role in leveraging diverse sources of information for enhanced machine learning applications in aerial imagery. However, current approaches often suffer from a bias towards certain modalities, diminishing the...

  • Article
  • Open Access
4 Citations
2,513 Views
24 Pages

26 February 2024

The paper aims to develop an information system for human emotion recognition in streaming data obtained from a PC or smartphone camera, using different methods of modality merging (image, sound and text). The objects of research are the facial expre...

  • Article
  • Open Access
7 Citations
3,813 Views
22 Pages

Multimodal Fall Detection Using Spatial–Temporal Attention and Bi-LSTM-Based Feature Fusion

  • Jungpil Shin,
  • Abu Saleh Musa Miah,
  • Rei Egawa,
  • Najmul Hassan,
  • Koki Hirooka and
  • Yoichi Tomioka

15 April 2025

Human fall detection is a significant healthcare concern, particularly among the elderly, due to its links to muscle weakness, cardiovascular issues, and locomotive syndrome. Accurate fall detection is crucial for timely intervention and injury preve...

  • Article
  • Open Access
17 Citations
5,111 Views
13 Pages

Gesture Recognition Based on 3D Human Pose Estimation and Body Part Segmentation for RGB Data Input

  • Ngoc-Hoang Nguyen,
  • Tran-Dac-Thinh Phan,
  • Guee-Sang Lee,
  • Soo-Hyung Kim and
  • Hyung-Jeong Yang

6 September 2020

This paper presents a novel approach for dynamic gesture recognition using multi-features extracted from RGB data input. Most of the challenges in gesture recognition revolve around the axis of the presence of multiple actors in the scene, occlusions...

  • Article
  • Open Access
1 Citations
1,325 Views
21 Pages

Robust 3D Target Detection Based on LiDAR and Camera Fusion

  • Miao Jin,
  • Bing Lu,
  • Gang Liu,
  • Yinglong Diao,
  • Xiwen Chen and
  • Gaoning Nie

27 October 2025

Autonomous driving relies on multimodal sensors to acquire environmental information for supporting decision making and control. While significant progress has been made in 3D object detection regarding point cloud processing and multi-sensor fusion,...

  • Article
  • Open Access
184 Views
25 Pages

MBS: A Modality-Balanced Strategy for Multimodal Sample Selection

  • Yuntao Xu,
  • Bing Chen,
  • Feng Hu,
  • Jiawei Liu,
  • Changjie Zhao and
  • Hongtao Wu

With the rapid development of applications such as edge computing, the Internet of Things (IoT), and embodied intelligence, massive multimodal data are continuously generated on end devices in a streaming manner. To maintain model adaptability and ro...

  • Article
  • Open Access
8 Citations
6,827 Views
17 Pages

The diffusion of Multimodal Large Language Models (MLLMs) has opened new research directions in the context of video content understanding and classification. Emotion recognition from videos aims to automatically detect human emotions such as anxiety...

  • Article
  • Open Access
4 Citations
4,015 Views
20 Pages

Movement-Oriented Objectified Organization and Retrieval Approach for Heterogeneous GeoVideo Data

  • Chen Wu,
  • Qing Zhu,
  • Yeting Zhang,
  • Xiao Xie,
  • Han Qin,
  • Yan Zhou,
  • Pengcheng Zhang and
  • Weijun Yang

With the wide deployment of the video sensor network and the rapid development of video spatialization technology, the large volume of complex GeoVideo data necessitates improvements in the application efficiency of the GeoVideo database and GeoVideo...

  • Review
  • Open Access
15 Citations
5,005 Views
23 Pages

2 June 2025

As systems in industry become increasingly interconnected and sophisticated, the task of fault detection and diagnosis becomes significantly more difficult. Predictive maintenance, in conjunction with sophisticated multimodal learning methods, has be...

  • Feature Paper
  • Article
  • Open Access
398 Views
24 Pages

24 December 2025

Effective situation awareness relies on the robust processing of high-dimensional data streams generated by onboard sensors. However, the application of deep generative models to extract features from complex UAV sensor data (e.g., GPS, IMU, and rada...

  • Article
  • Open Access
4 Citations
2,390 Views
14 Pages

17 February 2024

The role of air traffic controllers is to direct and manage highly dynamic flights. Their work requires both efficiency and accuracy. Previous studies have shown that fatigue in air traffic controllers can impair their work ability and even threaten...

  • Article
  • Open Access
23 Citations
7,459 Views
20 Pages

Activity-Aware Wearable System for Power-Efficient Prediction of Physiological Responses

  • Nathan Starliper,
  • Farrokh Mohammadzadeh,
  • Tanner Songkakul,
  • Michelle Hernandez,
  • Alper Bozkurt and
  • Edgar Lobaton

22 January 2019

Wearable health monitoring has emerged as a promising solution to the growing need for remote health assessment and growing demand for personalized preventative care and wellness management. Vital signs can be monitored and alerts can be made when an...

  • Article
  • Open Access
1 Citations
684 Views
29 Pages

19 September 2025

Real-time media streaming over publish–subscribe platforms is increasingly vital in scenarios that demand the scalability of event-driven architectures while ensuring timely media delivery. This is especially true in multi-modal and resource-co...

  • Article
  • Open Access
14 Citations
6,304 Views
12 Pages

7 February 2023

This paper investigates multimodal sensor architectures with deep learning for audio-visual speech recognition, focusing on in-the-wild scenarios. The term “in the wild” is used to describe AVSR for unconstrained natural-language audio st...

  • Article
  • Open Access
393 Views
29 Pages

21 November 2025

Underwater robotics produces diverse and complex streams of sensor, image, video, and navigational data under challenging environmental conditions, creating obstacles for seamless integration and interpretation. This paper introduces ROVON (Remotely...

  • Article
  • Open Access
12 Citations
4,303 Views
13 Pages

9 August 2020

Deep learning (DL) models have emerged in recent years as the state-of-the-art technique across numerous machine learning application domains. In particular, image processing-related tasks have seen a significant improvement in terms of performance d...

  • Article
  • Open Access
1 Citations
1,984 Views
22 Pages

Synergy Makes Direct Perception Inefficient

  • Miguel de Llanza Varona and
  • Manolo Martínez

21 August 2024

A typical claim in anti-representationalist approaches to cognition such as ecological psychology or radical embodied cognitive science is that ecological information is sufficient for guiding behavior. According to this view, affordances are immedia...

  • Article
  • Open Access
7 Citations
2,525 Views
19 Pages

MIFM: Multimodal Information Fusion Model for Educational Exercises

  • Jianfeng Song,
  • Hui Chen,
  • Chuan Li and
  • Kun Xie

16 September 2023

Educational exercises are crucial factors in the successful implementation of online education as they play a key role in assessing students’ learning and supporting teachers in instruction. These exercises encompass two primary types of data:...

  • Article
  • Open Access

Uncertainty-Aware Multimodal Fusion and Bayesian Decision-Making for DSS

  • Vesna Antoska Knights,
  • Marija Prchkovska,
  • Luka Krašnjak and
  • Jasenka Gajdoš Kljusurić

Uncertainty-aware decision-making increasingly relies on multimodal sensing pipelines that must fuse correlated measurements, propagate uncertainty, and trigger reliable control actions. This study develops a unified mathematical framework for multim...

of 3