sensors-logo

Journal Browser

Journal Browser

Multimodal Perception Modeling Based on Advanced Computational Technologies

A special issue of Sensors (ISSN 1424-8220). This special issue belongs to the section "Intelligent Sensors".

Deadline for manuscript submissions: 30 June 2025 | Viewed by 1886

Special Issue Editors


E-Mail Website
Guest Editor
Department of Electronics Engineering, Universidad Politécnica de Madrid, Madrid, Spain
Interests: artificial intelligence; machine learning; deep learning; neural networks; activity recognition; wearable computing; computer vision; biometrics; motion health applications
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electronics Engineering, Universidad Politécnica de Madrid, Madrid, Spain
Interests: human activity recognition; speech technology; signal processing; biosignals
Special Issues, Collections and Topics in MDPI journals

E-Mail Website
Guest Editor
Department of Electronics Engineering, Universidad Politécnica de Madrid, Madrid, Spain
Interests: artificial intelligence; machine learning; multimedia processing and retrieval; speech technology; affective computing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

This Special Issue aims to explore the advanced computational technologies for modeling human perceptual responses to multimedia stimuli across multiple modalities. With growing interest in areas such as memorability, attention, aesthetics, and persuasion, the use of multimodal approaches is essential for deepening our understanding of how humans interact with multimedia content. By integrating information from visual, textual, auditory, and other sources, as well as gestures and motion, this Issue seeks to advance the field of multimodal modeling, particularly in applications that benefit from a holistic perception of stimuli. Cutting-edge techniques like large multimodal language models (MMLLMs), machine and deep learning, and multi-sensor fusion are expected to play a key role in addressing these challenges.

In addition, motion recognition remains an integral part of this research, as it contributes valuable contextual information to perceptual response modeling. Wearable sensors, smart devices, and computer vision-based methods offer opportunities to analyze motion and gestures. Contributions are invited that examine how motion and activity recognition can enhance the understanding of multimedia experiences, as well as new datasets, algorithms, and architectures that leverage motion data alongside other modalities. Researchers working on intelligent sensing systems, emotion recognition, multimodal fusion, and human–computer interaction are encouraged to submit their work, particularly in healthcare, entertainment, and interactive applications.

Dr. Manuel Gil-Martín
Dr. Rubén San-Segundo
Dr. Fernando Fernández-Martínez
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • multimodal perceptual modeling
  • motion recognition
  • large multimodal language models (MMLLMs)
  • sensor technology
  • wearable devices
  • computer vision
  • multi-sensor fusion
  • machine learning
  • deep learning
  • signal processing
  • activity recognition
  • multimedia memorability
  • attention and aesthetics
  • persuasion in multimedia
  • olfactory signals
  • emotion recognition
  • gesture data
  • human–computer interaction
  • healthcare applications
  • biometric systems

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (3 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

20 pages, 2148 KiB  
Article
Analysis of Pleasure and Displeasure in Harmony Between Colored Light and Fragrance by the Left and Right OFC Response Differences
by Toshinori Oba, Midori Tanaka and Takahiko Horiuchi
Sensors 2025, 25(7), 2230; https://doi.org/10.3390/s25072230 - 2 Apr 2025
Viewed by 268
Abstract
Daily actions are influenced by sensory information. Several studies have investigated the multisensory integration of multiple sensory modalities, known as crossmodal perception. Recently, visual–olfactory crossmodal perception has been studied using objective physiological measures rather than subjective evaluations. This study focused on sensing in [...] Read more.
Daily actions are influenced by sensory information. Several studies have investigated the multisensory integration of multiple sensory modalities, known as crossmodal perception. Recently, visual–olfactory crossmodal perception has been studied using objective physiological measures rather than subjective evaluations. This study focused on sensing in the orbitofrontal cortex (OFC), which responds to visual and olfactory stimuli, and may serve as a physiological indicator of perception. Using near-infrared spectroscopy (NIRS), we analyzed the emotions evoked by combinations of colored light and fragrance with a particular focus on the lateralization of brain function. We selected pleasant and unpleasant fragrances from some essential oils, paired with colored lights that were perceived as either harmonious or disharmonious with the fragrances. NIRS measurements were conducted under the four following conditions: fragrance-only, colored light-only, harmonious crossmodal, and disharmonious crossmodal presentations. The results showed that the left OFC was activated during the crossmodal presentation of a harmonious color with a pleasant fragrance, thereby evoking pleasant emotions. In contrast, during the crossmodal presentation of a disharmonious color with an unpleasant fragrance, the right OFC was activated, suggesting increased displeasure. Additionally, the lateralization of brain function between the left and right OFC may be influenced by ‘pleasure–displeasure ’ and ‘crossmodal perception–multimodal perception’. Full article
Show Figures

Figure 1

16 pages, 3396 KiB  
Article
Parameter-Efficient Adaptation of Large Vision—Language Models for Video Memorability Prediction
by Iván Martín-Fernández, Sergio Esteban-Romero, Fernando Fernández-Martínez and Manuel Gil-Martín
Sensors 2025, 25(6), 1661; https://doi.org/10.3390/s25061661 - 7 Mar 2025
Viewed by 506
Abstract
The accurate modelling of video memorability, or the intrinsic properties that render a piece of audiovisual content more likely to be remembered, will facilitate the development of automatic systems that are more efficient in retrieving, classifying and generating impactful media. Recent studies have [...] Read more.
The accurate modelling of video memorability, or the intrinsic properties that render a piece of audiovisual content more likely to be remembered, will facilitate the development of automatic systems that are more efficient in retrieving, classifying and generating impactful media. Recent studies have indicated a strong correlation between the visual semantics of video and its memorability. This underscores the importance of developing advanced visual comprehension abilities to enhance model performance. It has been demonstrated that Large Vision–Language Models (LVLMs) demonstrate exceptional proficiency in generalist, high-level semantic comprehension of images and video, due to their extensive multimodal pre-training on a vast scale. This work makes use of the vast generalist knowledge of LVLMs and explores efficient adaptation techniques with a view to utilising them as memorability predictors. In particular, the Quantized Low-Rank Adaptation (QLoRA) technique is employed to fine-tune the Qwen-VL model with memorability-related data extracted from the Memento10k dataset. In light of existing research, we propose a particular methodology that transforms Qwen-VL from a language model to a memorability score regressor. Furthermore, we consider the influence of selecting appropriate LoRA hyperparameters, a design aspect that has been insufficiently studied. We validate the LoRA rank and alpha hyperparameters using 5-Fold Cross-Validation and evaluate our best configuration on the official testing portion of the Memento10k dataset, obtaining a state-of-the-art Spearman Rank Correlation Coefficient (SRCC) of 0.744. Consequently, this work represents a significant advancement in modelling video memorability through high-level semantic understanding. Full article
Show Figures

Figure 1

17 pages, 2351 KiB  
Article
Extending Anxiety Detection from Multimodal Wearables in Controlled Conditions to Real-World Environments
by Abdulrahman Alkurdi, Maxine He, Jonathan Cerna, Jean Clore, Richard Sowers, Elizabeth T. Hsiao-Wecksler and Manuel E. Hernandez
Sensors 2025, 25(4), 1241; https://doi.org/10.3390/s25041241 - 18 Feb 2025
Cited by 1 | Viewed by 680
Abstract
This study quantitatively evaluated whether and how machine learning (ML) models built by data from controlled conditions can fit real-world conditions. This study focused on feature-based models using wearable technology from real-world data collected from young adults, so as to provide insights into [...] Read more.
This study quantitatively evaluated whether and how machine learning (ML) models built by data from controlled conditions can fit real-world conditions. This study focused on feature-based models using wearable technology from real-world data collected from young adults, so as to provide insights into the models’ robustness and the specific challenges posed by diverse environmental noise. Feature-based models, particularly XGBoost and Decision Trees, demonstrated considerable resilience, maintaining higher accuracy and reliability across different noise levels. This investigation included an in-depth analysis of transfer learning, highlighting its potential and limitations in adapting models developed from standard datasets, like WESAD, to complex real-world scenarios. Moreover, this study analyzed the distributed feature importance across various physiological signals, such as electrodermal activity (EDA) and electrocardiography (ECG), considering their vulnerability to environmental factors. It was found that integrating multiple physiological data types could significantly enhance model robustness. The results underscored the need for a nuanced understanding of signal contributions to model efficacy, suggesting that feature-based models showed much promise in practical applications. Full article
Show Figures

Figure 1

Back to TopTop