Previous Article in Journal
TAS-SLAM: A Visual SLAM System for Complex Dynamic Environments Integrating Instance-Level Motion Classification and Temporally Adaptive Super-Pixel Segmentation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery

1
School of Computer Science, Chengdu University of Information Technology, Chengdu 610225, China
2
Sain Associates, Inc. 5021 Technology Drive Northwest, Suite B2, Huntsville, AL 35805, USA
*
Author to whom correspondence should be addressed.
ISPRS Int. J. Geo-Inf. 2026, 15(1), 8; https://doi.org/10.3390/ijgi15010008 (registering DOI)
Submission received: 1 November 2025 / Revised: 7 December 2025 / Accepted: 15 December 2025 / Published: 21 December 2025
(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Abstract

Traditional road traffic accident analysis has long relied on structured data, making it difficult to integrate high-dimensional heterogeneous information such as remote sensing imagery and leading to an incomplete understanding of accident scene environments. This study proposes a road traffic accident analysis framework based on Multimodal Large Language Models. The approach integrates high-resolution remote sensing imagery with structured accident data through a three-stage progressive training pipeline. Specifically, we fine-tune three open-source vision–language models using Low-Rank Adaptation (LoRA) to sequentially optimize the model’s capabilities in visual environmental description, multi-task accident classification, and Chain-of-Thought (CoT) driven causal reasoning. A multimodal dataset was constructed containing remote sensing image descriptions, accident classification labels, and interpretable reasoning chains. Experimental results show that the fine-tuned model achieved a maximum improvement in the CIDEr score for image description tasks. In the joint classification task of accident severity and duration, the model achieved an accuracy of 71.61% and an F1-score of 0.8473. In the CoT reasoning task, both METEOR and CIDEr scores improved significantly. These results validate the effectiveness of structured reasoning mechanisms in multimodal fusion for transportation applications, providing a feasible path toward interpretable and intelligent analysis for real-world traffic management.
Keywords: cross-modal fusion; domain adaptation; traffic scene understanding; intelligent traffic management cross-modal fusion; domain adaptation; traffic scene understanding; intelligent traffic management

Share and Cite

MDPI and ACS Style

He, B.; He, W.; Chang, Q.; Luo, W.; Xiao, L. Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery. ISPRS Int. J. Geo-Inf. 2026, 15, 8. https://doi.org/10.3390/ijgi15010008

AMA Style

He B, He W, Chang Q, Luo W, Xiao L. Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery. ISPRS International Journal of Geo-Information. 2026; 15(1):8. https://doi.org/10.3390/ijgi15010008

Chicago/Turabian Style

He, Bing, Wei He, Qing Chang, Wen Luo, and Lingli Xiao. 2026. "Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery" ISPRS International Journal of Geo-Information 15, no. 1: 8. https://doi.org/10.3390/ijgi15010008

APA Style

He, B., He, W., Chang, Q., Luo, W., & Xiao, L. (2026). Domain-Adapted MLLMs for Interpretable Road Traffic Accident Analysis Using Remote Sensing Imagery. ISPRS International Journal of Geo-Information, 15(1), 8. https://doi.org/10.3390/ijgi15010008

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop