Sensors

Journal Browser

► Journal Browser

Multimodal Sensing Fusion-Based LLM Agent Methods, System, and Applications

Share This Special Issue

Special Issue Editor

Prof. Dr. Chaoning Zhang

E-Mail Website
Guest Editor

School of Computer Science & Engineering, University of Electronic Science and Technology of China, Chengdu, Chengdu
Interests: multimodal large language models; lightweight large language models; robust and secure large language models; computer vision; AIGC and LLM
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The integration of multimodal sensor data with large language model (LLM) agents represents an important frontier in artificial intelligence as it combines heterogeneous streams of information to advance machine perception, reasoning, and interaction. LLM agents, which are trained on extensive and diverse datasets, have demonstrated strong capabilities in language understanding, contextual inference, and complex decision-making. When augmented with multimodal sensing that incorporates visual, auditory, speech, tactile, LiDAR, radar, physiological, and other forms of data, these agents can achieve more comprehensive situational awareness and higher adaptability in dynamic and uncertain environments. Such integration not only improves real-time decision-making and enhances human–machine interaction but also creates new opportunities for innovation across a wide range of domains including healthcare, robotics, autonomous systems, smart manufacturing, education, environmental monitoring, and security. At the same time, it raises fundamental research challenges related to trustworthiness, interpretability, and ethical considerations in the deployment of intelligent agents. This Special Issue aims to bring together original contributions that explore methodologies, theories, and applications at the intersection of multimodal sensing and LLM-based intelligence, with particular emphasis on advances in multimodal fusion strategies, contextual reasoning, adaptive dialogue systems, collaborative and immersive applications, as well as cross-disciplinary approaches that contribute to the development of next-generation multimodal artificial intelligence.

Prof. Dr. Chaoning Zhang
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 250 words) can be sent to the Editorial Office for assessment.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Sensors is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

multimodal sensor fusion
large language model (LLM) agents
multimodal perception and reasoning
contextual understanding
adaptive dialogue systems
human–machine interaction
trustworthy and explainable AI
robotics and autonomous systems
healthcare and smart manufacturing
cognitive intelligence and situational awareness

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.

Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.

Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.

External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.

Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (2 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

24 pages, 8847 KB

Open AccessArticle

Implicit Neural Representation with Dead-Free Linear Unit for Remote Sensing Images

by Yi Lu, Chang Lu, Dongshen Han, Donggeon Kim, Mingming Zhang, Rizwan Qureshi and Caiyan Qin

Sensors 2026, 26(8), 2370; https://doi.org/10.3390/s26082370 - 12 Apr 2026

Viewed by 637

Abstract

As a crucial component of multimodal sensing in modern AI agents, remote sensing images have attracted significant attention, for which neural representation is a promising direction. Implicit Neural Representations (INRs) using Multi-Layer Perceptrons (MLPs) have the ability to model images by learning an implicit mapping from pixel coordinates to pixel intensities. This paper revisits the ReLU activation function, a widely adopted non-linearity known for its dead region on the negative axis, within the context of MLP-based INRs. We introduce the Dead-Free Linear Unit (DeLU), a novel activation function that leverages a linearly transformed absolute value to eliminate inactive regions. By combining dead-free non-linearity with adaptive linear scaling, DeLU enhances the expressiveness of INR architectures, particularly those employing periodic activations. Extensive experiments across multiple remote sensing datasets, including LandCover.ai, LoveDA, INRIA, UAVid, and ISPRS Potsdam, validate the efficacy of our proposed method. Full article

(This article belongs to the Special Issue Multimodal Sensing Fusion-Based LLM Agent Methods, System, and Applications)

► Show Figures

Figure 1

32 pages, 4404 KB

Open AccessArticle

Revisiting Text-Based Person Retrieval: Mitigating Annotation-Induced Mismatches with Multimodal Large Language Models

by Zihang Han, Chao Zhu and Mengyin Liu

Sensors 2026, 26(5), 1599; https://doi.org/10.3390/s26051599 - 4 Mar 2026

Viewed by 656

Abstract

Text-based person retrieval (TBPR) aims to search for target person images from large-scale video clips or image databases based on textual descriptions. The quality of benchmarks is critical to accurately evaluating TBPR models for their ability in relation to cross-modal matching. However, we find that existing TBPR benchmarks have a common problem, which often leads to ambiguities where multiple images of persons with different identities have very similar or even identical textual descriptions. As a consequence, although TBPR models correctly retrieve the images corresponding to a given description, such matches may be erroneously evaluated as mismatches due to the above annotation problem. We argue that the main cause of this problem is that each person image is annotated individually without reference to other similar images, making it challenging to provide distinctive descriptions for each image. To address this problem, we propose an effective and efficient annotation refinement framework to improve the annotation quality of TBPR benchmarks and thereby mitigate annotation-induced mismatches. Firstly, sets of images prone to mismatches are automatically identified by TBPR models. Then, by leveraging multimodal large language models (MLLMs), multiple images are simultaneously processed and distinctive descriptions are generated for each image. Finally, the original descriptions are replaced to improve the annotation quality. Extensive experiments on three popular TBPR benchmarks (CUHK-PEDES, RSTPReid and ICFG-PEDES) validate the effectiveness of our proposed method for improving the quality of annotations, and demonstrate that the resulting more discriminative captions can truly benefit the mainstream TBPR models. The improved annotations of these benchmarks will be released publicly. Full article

(This article belongs to the Special Issue Multimodal Sensing Fusion-Based LLM Agent Methods, System, and Applications)

► Show Figures

Journal Menu

Journal Browser

Multimodal Sensing Fusion-Based LLM Agent Methods, System, and Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (2 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI