MDPI - Publisher of Open Access Journals

19 pages, 8953 KB

Open AccessArticle

Leveraging Multimodal Large Language Models (MLLMs) for Enhanced Object Detection and Scene Understanding in Thermal Images for Autonomous Driving Systems

by Huthaifa I. Ashqar, Taqwa I. Alhadidi, Mohammed Elhenawy and Nour O. Khanfar

Automation 2024, 5(4), 508-526; https://doi.org/10.3390/automation5040029 - 10 Oct 2024

Cited by 18 | Viewed by 4885

Abstract

The integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This study investigates the potential of MLLMs, specifically GPT-4 Vision Preview and Gemini 1.0 Pro Vision, for interpreting thermal images for applications in ADS and ITS. Two primary research questions are addressed: the capacity of these models to detect and enumerate objects within thermal images, and to determine whether pairs of image sources represent the same scene. Furthermore, we propose a framework for object detection and classification by integrating infrared (IR) and RGB images of the same scene without requiring localization data. This framework is particularly valuable for enhancing the detection and classification accuracy in environments where both IR and RGB cameras are essential. By employing zero-shot in-context learning for object detection and the chain-of-thought technique for scene discernment, this study demonstrates that MLLMs can recognize objects such as vehicles and individuals with promising results, even in the challenging domain of thermal imaging. The results indicate a high true positive rate for larger objects and moderate success in scene discernment, with a recall of 0.91 and a precision of 0.79 for similar scenes. The integration of IR and RGB images further enhances detection capabilities, achieving an average precision of 0.93 and an average recall of 0.56. This approach leverages the complementary strengths of each modality to compensate for individual limitations. This study highlights the potential of combining advanced AI methodologies with thermal imaging to enhance the accuracy and reliability of ADS, while identifying areas for improvement in model performance. Full article

► Show Figures

Figure 1

25 pages, 6794 KB

Open AccessArticle

An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models

by Junbo Chen, Shunlai Lu and Lei Zhong

Appl. Sci. 2024, 14(17), 7716; https://doi.org/10.3390/app14177716 - 1 Sep 2024

Cited by 3 | Viewed by 3309

Abstract

With the rapid increase in the number of vehicles on the road, minor traffic accidents have become more frequent, contributing significantly to traffic congestion and disruptions. Traditional methods for determining responsibility in such accidents often require human intervention, leading to delays and inefficiencies. This study proposed a fully intelligent method for liability determination in minor accidents, utilizing collision detection and large language models. The approach integrated advanced vehicle recognition using the YOLOv8 algorithm coupled with a minimum mean square error filter for real-time target tracking. Additionally, an improved global optical flow estimation algorithm and support vector machines were employed to accurately detect traffic accidents. Key frames from accident scenes were extracted and analyzed using the GPT4-Vision-Preview model to determine liability. Simulation experiments demonstrated that the proposed method accurately and efficiently detected vehicle collisions, rapidly determined liability, and generated detailed accident reports. The method achieved the fully automated AI processing of minor traffic accidents without manual intervention, ensuring both objectivity and fairness. Full article

(This article belongs to the Collection Urban Transport Systems Efficiency, Network Planning and Safety: Volume II)

► Show Figures

Figure 1

21 pages, 11156 KB

Open AccessEditor’s ChoiceArticle

Map Reading and Analysis with GPT-4V(ision)

by Jinwen Xu and Ran Tao

ISPRS Int. J. Geo-Inf. 2024, 13(4), 127; https://doi.org/10.3390/ijgi13040127 - 11 Apr 2024

Cited by 16 | Viewed by 8679

Abstract

In late 2023, the image-reading capability added to a Generative Pre-trained Transformer (GPT) framework provided the opportunity to potentially revolutionize the way we view and understand geographic maps, the core component of cartography, geography, and spatial data science. In this study, we explore reading and analyzing maps with the latest version of GPT-4-vision-preview (GPT-4V), to fully evaluate its advantages and disadvantages in comparison with human eye-based visual inspections. We found that GPT-4V is able to properly retrieve information from various types of maps in different scales and spatiotemporal resolutions. GPT-4V can also perform basic map analysis, such as identifying visual changes before and after a natural disaster. It has the potential to replace human efforts by examining batches of maps, accurately extracting information from maps, and linking observed patterns with its pre-trained large dataset. However, it is encumbered by limitations such as diminished accuracy in visual content extraction and a lack of validation. This paper sets an example of effectively using GPT-4V for map reading and analytical tasks, which is a promising application for large multimodal models, large language models, and artificial intelligence. Full article

(This article belongs to the Special Issue Advances in AI-Driven Geospatial Analysis and Data Generation)

► Show Figures

Figure 1

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (3)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI