Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (3)

Search Parameters:
Keywords = GPT4-vision-preview

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 8953 KB  
Article
Leveraging Multimodal Large Language Models (MLLMs) for Enhanced Object Detection and Scene Understanding in Thermal Images for Autonomous Driving Systems
by Huthaifa I. Ashqar, Taqwa I. Alhadidi, Mohammed Elhenawy and Nour O. Khanfar
Automation 2024, 5(4), 508-526; https://doi.org/10.3390/automation5040029 - 10 Oct 2024
Cited by 18 | Viewed by 4885
Abstract
The integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This study investigates the potential of MLLMs, specifically GPT-4 Vision Preview and [...] Read more.
The integration of thermal imaging data with multimodal large language models (MLLMs) offers promising advancements for enhancing the safety and functionality of autonomous driving systems (ADS) and intelligent transportation systems (ITS). This study investigates the potential of MLLMs, specifically GPT-4 Vision Preview and Gemini 1.0 Pro Vision, for interpreting thermal images for applications in ADS and ITS. Two primary research questions are addressed: the capacity of these models to detect and enumerate objects within thermal images, and to determine whether pairs of image sources represent the same scene. Furthermore, we propose a framework for object detection and classification by integrating infrared (IR) and RGB images of the same scene without requiring localization data. This framework is particularly valuable for enhancing the detection and classification accuracy in environments where both IR and RGB cameras are essential. By employing zero-shot in-context learning for object detection and the chain-of-thought technique for scene discernment, this study demonstrates that MLLMs can recognize objects such as vehicles and individuals with promising results, even in the challenging domain of thermal imaging. The results indicate a high true positive rate for larger objects and moderate success in scene discernment, with a recall of 0.91 and a precision of 0.79 for similar scenes. The integration of IR and RGB images further enhances detection capabilities, achieving an average precision of 0.93 and an average recall of 0.56. This approach leverages the complementary strengths of each modality to compensate for individual limitations. This study highlights the potential of combining advanced AI methodologies with thermal imaging to enhance the accuracy and reliability of ADS, while identifying areas for improvement in model performance. Full article
Show Figures

Figure 1

25 pages, 6794 KB  
Article
An Autonomous Intelligent Liability Determination Method for Minor Accidents Based on Collision Detection and Large Language Models
by Junbo Chen, Shunlai Lu and Lei Zhong
Appl. Sci. 2024, 14(17), 7716; https://doi.org/10.3390/app14177716 - 1 Sep 2024
Cited by 3 | Viewed by 3309
Abstract
With the rapid increase in the number of vehicles on the road, minor traffic accidents have become more frequent, contributing significantly to traffic congestion and disruptions. Traditional methods for determining responsibility in such accidents often require human intervention, leading to delays and inefficiencies. [...] Read more.
With the rapid increase in the number of vehicles on the road, minor traffic accidents have become more frequent, contributing significantly to traffic congestion and disruptions. Traditional methods for determining responsibility in such accidents often require human intervention, leading to delays and inefficiencies. This study proposed a fully intelligent method for liability determination in minor accidents, utilizing collision detection and large language models. The approach integrated advanced vehicle recognition using the YOLOv8 algorithm coupled with a minimum mean square error filter for real-time target tracking. Additionally, an improved global optical flow estimation algorithm and support vector machines were employed to accurately detect traffic accidents. Key frames from accident scenes were extracted and analyzed using the GPT4-Vision-Preview model to determine liability. Simulation experiments demonstrated that the proposed method accurately and efficiently detected vehicle collisions, rapidly determined liability, and generated detailed accident reports. The method achieved the fully automated AI processing of minor traffic accidents without manual intervention, ensuring both objectivity and fairness. Full article
Show Figures

Figure 1

21 pages, 11156 KB  
Article
Map Reading and Analysis with GPT-4V(ision)
by Jinwen Xu and Ran Tao
ISPRS Int. J. Geo-Inf. 2024, 13(4), 127; https://doi.org/10.3390/ijgi13040127 - 11 Apr 2024
Cited by 16 | Viewed by 8679
Abstract
In late 2023, the image-reading capability added to a Generative Pre-trained Transformer (GPT) framework provided the opportunity to potentially revolutionize the way we view and understand geographic maps, the core component of cartography, geography, and spatial data science. In this study, we explore [...] Read more.
In late 2023, the image-reading capability added to a Generative Pre-trained Transformer (GPT) framework provided the opportunity to potentially revolutionize the way we view and understand geographic maps, the core component of cartography, geography, and spatial data science. In this study, we explore reading and analyzing maps with the latest version of GPT-4-vision-preview (GPT-4V), to fully evaluate its advantages and disadvantages in comparison with human eye-based visual inspections. We found that GPT-4V is able to properly retrieve information from various types of maps in different scales and spatiotemporal resolutions. GPT-4V can also perform basic map analysis, such as identifying visual changes before and after a natural disaster. It has the potential to replace human efforts by examining batches of maps, accurately extracting information from maps, and linking observed patterns with its pre-trained large dataset. However, it is encumbered by limitations such as diminished accuracy in visual content extraction and a lack of validation. This paper sets an example of effectively using GPT-4V for map reading and analytical tasks, which is a promising application for large multimodal models, large language models, and artificial intelligence. Full article
(This article belongs to the Special Issue Advances in AI-Driven Geospatial Analysis and Data Generation)
Show Figures

Figure 1

Back to TopTop