Next Article in Journal
Will IP Location Openness Affect Posts?—An Empirical Examination from Sina Weibo
Previous Article in Journal
An Industrial Framework for Cold-Start Recommendation in Few-Shot and Zero-Shot Scenarios
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Adaptive Token Boundaries: Towards Integrating Human Chunking Mechanisms into Multimodal LLMs

School of Education, Sanda University, Shanghai 314100, China
Information 2025, 16(12), 1106; https://doi.org/10.3390/info16121106
Submission received: 12 October 2025 / Revised: 26 November 2025 / Accepted: 10 December 2025 / Published: 15 December 2025

Abstract

Recent advancements in multimodal large language models (MLLMs) have demonstrated remarkable capabilities in processing diverse data types, yet significant disparities persist between human cognitive processes and computational approaches to multimodal information integration. This research presents a systematic investigation into the parallels between human cross-modal chunking mechanisms and token representation methodologies in MLLMs. Through empirical studies comparing human performance patterns with model behaviors across visual–linguistic tasks, we demonstrate that conventional static tokenization schemes fundamentally constrain current models’ capacity to simulate the dynamic, context-sensitive nature of human information processing. We propose a novel framework for dynamic cross-modal tokenization that incorporates adaptive boundaries, hierarchical representations, and alignment mechanisms grounded in cognitive science principles. Quantitative evaluations demonstrate that our approach yields statistically significant improvements over state-of-the-art models on benchmark tasks (+7.8% on Visual Question Answering (p < 0.001, 5.3% on Complex Scene Description while exhibiting more human-aligned error patterns and attention distributions. These findings contribute to the theoretical understanding of the relationship between human cognition and artificial intelligence, while providing empirical evidence for developing more cognitively plausible AI systems.
Keywords: multimodal language models; tokenization; cognitive chunking; cross-modal integration; hierarchical representations; adaptive processing; visual–linguistic understanding; transfer learning; human-aligned AI; neural information processing multimodal language models; tokenization; cognitive chunking; cross-modal integration; hierarchical representations; adaptive processing; visual–linguistic understanding; transfer learning; human-aligned AI; neural information processing
Graphical Abstract

Share and Cite

MDPI and ACS Style

Yu, D. Adaptive Token Boundaries: Towards Integrating Human Chunking Mechanisms into Multimodal LLMs. Information 2025, 16, 1106. https://doi.org/10.3390/info16121106

AMA Style

Yu D. Adaptive Token Boundaries: Towards Integrating Human Chunking Mechanisms into Multimodal LLMs. Information. 2025; 16(12):1106. https://doi.org/10.3390/info16121106

Chicago/Turabian Style

Yu, Dongxing. 2025. "Adaptive Token Boundaries: Towards Integrating Human Chunking Mechanisms into Multimodal LLMs" Information 16, no. 12: 1106. https://doi.org/10.3390/info16121106

APA Style

Yu, D. (2025). Adaptive Token Boundaries: Towards Integrating Human Chunking Mechanisms into Multimodal LLMs. Information, 16(12), 1106. https://doi.org/10.3390/info16121106

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop