Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

Jaradat, Shadi; Elhenawy, Mohammed; Nayak, Richi; Paz, Alexander; Ashqar, Huthaifa I.; Glaser, Sebastien

doi:10.3390/ai6040072

Open AccessArticle

Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

by

Shadi Jaradat

^1,2,*

,

Mohammed Elhenawy

^1,2

,

Richi Nayak

²

,

Alexander Paz

³

,

Huthaifa I. Ashqar

^4,5

and

Sebastien Glaser

¹

CARRS-Q, Queensland University of Technology, Kelvin Grove, Brisbane, QLD 4059, Australia

²

Centre for Data Science, Queensland University of Technology, Garden Point, Brisbane, QLD 4000, Australia

³

School of Civil Engineering, Queensland University of Technology, Brisbane, QLD 4000, Australia

⁴

Civil Engineering Department, Arab American University, Jenin P.O. Box 240, Palestine

⁵

AI@Columbia, Columbia University, New York, NY 10027, USA

^*

Author to whom correspondence should be addressed.

AI 2025, 6(4), 72; https://doi.org/10.3390/ai6040072

Submission received: 20 February 2025 / Revised: 27 March 2025 / Accepted: 1 April 2025 / Published: 7 April 2025

(This article belongs to the Section AI Systems: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

In traffic safety analysis, previous research has often focused on tabular data or textual crash narratives in isolation, neglecting the potential benefits of a hybrid multimodal approach. This study introduces the Multimodal Data Fusion (MDF) framework, which fuses tabular data with textual narratives by leveraging advanced Large Language Models (LLMs), such as GPT-2, GPT-3.5, and GPT-4.5, using zero-shot (ZS), few-shot (FS), and fine-tuning (FT) learning strategies. We employed few-shot learning with GPT-4.5 to generate new labels for traffic crash analysis, such as driver fault, driver actions, and crash factors, alongside the existing label for severity. Our methodology was tested on crash data from the Missouri State Highway Patrol, demonstrating significant improvements in model performance. GPT-2 (fine-tuned) was used as the baseline model, against which more advanced models were evaluated. GPT-4.5 few-shot learning achieved 98.9% accuracy for crash severity prediction and 98.1% accuracy for driver fault classification. In crash factor extraction, GPT-4.5 few-shot achieved the highest Jaccard score (82.9%), surpassing GPT-3.5 and fine-tuned GPT-2 models. Similarly, in driver actions extraction, GPT-4.5 few-shot attained a Jaccard score of 73.1%, while fine-tuned GPT-2 closely followed with 72.2%, demonstrating that task-specific fine-tuning can achieve performance close to state-of-the-art models when adapted to domain-specific data. These findings highlight the superior performance of GPT-4.5 few-shot learning, particularly in classification and information extraction tasks, while also underscoring the effectiveness of fine-tuning on domain-specific datasets to bridge performance gaps with more advanced models. The MDF framework’s success demonstrates its potential for broader applications beyond traffic crash analysis, particularly in domains where labeled data are scarce and predictive modeling is essential.

Keywords:

Generative Pre-Trained Transformer (GPT); Large Language Model (LLM); traffic crash analysis; few-shot learning; zero-shot learning; Multimodal Data Fusion

1. Introduction

The fusion of tabular data with textual crash narratives holds significant promise for enhancing crash data analysis. Tabular data provide a wealth of critical details essential for comprehensive study in various areas, including road safety. However, they lack contextual meaning [1,2]. On the other hand, textual crash narratives present detailed accounts of incidents filled with nuanced insights into the human factors and conditions leading to road crashes. Despite the wealth of information in these two data formats, traditional approaches to traffic safety analysis have typically focused on one type of data, missing out on the comprehensive insights that a combined analysis could provide. In the era of smart cities, integrating these data sources is crucial for developing intelligent transportation systems that optimize urban mobility, reduce congestion, and enhance road safety. By leveraging MDF, city planners and policymakers can make more informed decisions, leading to more efficient traffic management and proactive accident prevention.

Multimodal Data Fusion offers significant advantages in various fields by combining different types of data to enhance understanding and improve analysis outcomes. One key advantage of MDF is its ability to improve classification accuracy and reduce errors. For instance, a study on marine fish and zooplankton classification demonstrated that combining multi-view, multimodal acoustic, and optical sensor data led to more than a 50% reduction in classification errors compared to single-view, unimodal sources [3]. This improvement is attributed to the fusion algorithm’s ability to capture multiple views of a subject, making it more robust to unknown orientations or poses. Similarly, in smart city applications, the fusion of structured crash reports with unstructured narratives can significantly enhance real-time traffic incident detection, contributing to safer and more resilient urban environments. Multimodal fusion also enhances the ability to recognize complex behaviors and patterns. In healthcare, a novel methodology for pain behavior recognition integrates statistical correlation analysis with human-centered insights, demonstrating superior performance across various deep learning architectures [4]. This approach not only improves recognition accuracy but also contributes to interpretable and explainable AI in healthcare, supporting patient-centered interventions and clinical decision making. In conclusion, MDF has proven to be a powerful tool across various domains, from marine biology to healthcare and neuroimaging. By combining diverse data types, researchers can overcome the limitations of individual modalities, achieve higher accuracy, and gain deeper insights into complex phenomena. As demonstrated in neuroimaging applications, fusion techniques can enhance spatial and temporal resolution, improve contrast, and bridge physiological and cognitive information [5]. The continued development of fusion methodologies promises to further advance our understanding and analysis capabilities in numerous fields.

Integrating these two data types will provide a richer and more nuanced understanding of the factors contributing to traffic incidents. When the inherent data lack diversity or further enhancements are desired, incorporating external data sources becomes a strategy for dataset enrichment [6]. However, textual crash narratives are often complex to manage because of a lack of standardization, which poses challenges in data integration and analysis. In this study, we introduced an additional context by applying LLMs and facilitating the enhanced analysis of combined data sources.

The serialization of tabular data, which converts structured data into a format that can be processed alongside natural language text, introduces new possibilities for integrating quantitative and qualitative data into crash analysis [7]. This holistic approach combines numerical data, such as speed and weather conditions, with detailed narrative descriptions to comprehensively view each incident. While the direct application of LLM models and techniques in traffic safety research is still developing, their potential is clear and promising. By employing these innovative approaches, researchers can unlock new insights from crash narratives, thereby contributing to more effective interventions and informed policymaking. Recently, machine learning (ML) applications in crash analysis have predominantly focused on the structured nature of tabular data, which is conducive to quantitative analysis [8,9]. Studies have shown that ML models utilize tabular datasets to predict crash severity and identify risk factors [8,10]. In the last two decades, text analysis, facilitated by natural language processing (NLP) advances, has become crucial in deciphering the vast amounts of unstructured data generated across road safety domains [11]. The study of crash narratives has gained attention because of its potential to provide detailed insights into the causes and conditions of traffic crashes [12]. NLP methods and ML algorithms have been employed to extract actionable information from these narratives, thereby enhancing our understanding of crash causality and informing prevention strategies [10,13].

Moreover, the transformer architecture introduced by Vaswani et al. [14] constitutes the foundation of the Generative Pre-trained Transformer (GPT) series. The advent of LLMs, such as GPT-2, GPT-3.5, and GPT-4.5, with their profound understanding of language, has opened new avenues for exploring the qualitative aspects of crash narratives [15]. These models’ ability to process and interpret natural language bridges the gap between quantitative data and qualitative descriptions, introducing an innovative approach for more comprehensive data analysis [16]. By leveraging the strengths of both data types, LLMs stand to offer unparalleled insights into road safety, potentially revolutionizing crash analysis to develop effective prevention strategies. More advanced methods, including few-shot in-context learning, further enhance this capability by allowing the LLM models to adapt to new tasks with minimal examples, making it possible to tailor the analysis to specific types of crashes or factors of interest with limited upfront data preparation [17].

This study explores the effectiveness of LLMs in analyzing both tabular and textual data for traffic crash analysis, emphasizing the innovative use of multimodal data to enhance knowledge extraction processes. Historically, crash analysis has primarily focused on either tabular or textual data, but integrating these two data formats offers a more comprehensive understanding of traffic incidents. LLMs, with their nuanced understanding of language, are uniquely suited to dissecting complex crash narratives, as they can interpret the contextual richness of textual data that is often missing from tabular datasets. Their advanced language comprehension enables scalable classification of extensive crash report datasets, leveraging zero-shot learning for immediate inference without the need for prior task-specific training [6,18].

One of the most significant challenges in this domain is the scarcity of labeled datasets, which hampers the development of robust supervised machine learning models. This scarcity is a key barrier, as labeled data are crucial for training models that can accurately predict and classify traffic incidents. To address this, GPT-4’s few-shot in-context learning is employed to derive new labels from hybrid multimodal traffic crash data, effectively overcoming the challenges posed by limited labeled datasets [16,19]. The newly generated labels, such as “driver at fault”, “driver actions”, and “crash factors”, are then validated by domain experts to ensure their accuracy and relevance. These labels play a critical role in providing a more detailed and nuanced understanding of traffic incidents, which is essential for identifying faults, assessing driver behavior, and understanding contributing crash factors. Such insights are vital for developing targeted road safety interventions and improving overall traffic safety [20,21,22].

Following validation, this study explores several training methodologies—zero-shot learning, few-shot learning, and fine-tuning—using GPT-3.5 and GPT-4.5 to evaluate their effectiveness in applying expert-validated labels for comprehensive crash report analysis. Zero-shot learning allows for the immediate application of labels without the need for prior training [23,24,25], while few-shot learning adjusts models using minimal examples, and fine-tuning provides more extensive training to optimize label extraction and analysis [17,19].

By integrating these LLM-based approaches, this study proposes a robust framework for enhanced traffic crash analysis through MDF, which capitalizes on the strengths of both tabular and textual data. This multimodal approach deepens the insights that can be extracted from crash reports and introduces a scalable, efficient solution for advancing traffic safety research and policy development. To the best of our knowledge, no prior work has combined narrative and tabular data in the traffic domain using advanced language models. Specifically, this paper contributes to the field through the following contributions.

We propose an innovative method that transforms tabular data into textual narratives through a serialization process and integrates them into existing textual data. This process enriches the dataset, facilitating multimodal analysis for traffic safety.
Utilizing GPT-4’s few-shot learning capabilities, we introduce new labels, such as “driver at fault”, “driver actions”, and “crash factors”, expanding the scope of analysis beyond crash severity to capture critical road safety dimensions.
We classify consolidated text into distinct labels and compare various learning strategies—zero-shot, few-shot, and fine-tuning—to highlight their impact on model optimization and performance.
We evaluate the effectiveness of a multimodal dataset versus a unimodal one, demonstrating the superiority of the multimodal approach in enhancing classification accuracy.

2. Related Work

In recent years, significant advancements in MDF have emerged, enhancing the ability to analyze complex datasets across multiple domains. In traffic crash analysis, researchers have increasingly integrated structured tabular data and unstructured textual narratives to improve predictive modeling and gain deeper insights into crash dynamics. Traditional methods have relied on statistical models and machine learning approaches, such as logistic regression, support vector machines, and gradient-boosted trees, to classify crash severity and identify contributing factors. However, the rise of LLMs and advanced multimodal fusion frameworks has expanded analytical capabilities beyond conventional approaches.

Beyond traffic safety, MDF has demonstrated transformative potential in various domains. In healthcare, integrating diverse data modalities like medical images, electronic health records, and clinical notes has shown promising results in improving diagnostic accuracy and treatment planning [26,27]. For instance, a novel approach combining structured Electronic Health Record (EHR) data with unstructured clinical text achieved an accuracy of 75.00% in predicting the top injury and 93.54% for the top three injuries [27]. In the business sector, Multimodal Data Fusion has been applied to analyze customer behavior and enhance decision making processes. A study of live streaming commerce on TikTok demonstrated the importance of integrating text, image, and audio data to assess seller trustworthiness, highlighting the crucial role of interpersonal interactions in microenterprise success [28]. Similarly, in natural disaster response, fusing satellite imagery with textual emergency reports has been instrumental in improving real-time situational awareness and disaster mitigation strategies [29]. These advancements highlight the broad applicability of multimodal learning, reinforcing its relevance in traffic safety research.

This section reviews three key areas: (1) the role of tabular data in crash analysis, (2) the integration of textual data and NLP techniques for extracting insights from crash narratives, and (3) the emergence of multimodal fusion frameworks that combine structured and unstructured data to enhance traffic safety predictions.

2.1. Tabular Data in Crash Analysis

Tabular data play a critical role in crash analysis, providing structured information about various aspects of traffic accidents. These data typically include variables like the crash location, time, weather conditions, road characteristics, vehicle types, and driver attributes, which form the foundation for many statistical and machine learning approaches [30]. Researchers have employed methods like logistic regression and ordered probit models to examine the relationship between crash characteristics and injury severity [31]. Recently, machine learning techniques, such as random forests and support vector machines, have been applied to tabular crash data to improve prediction accuracy [32].

Tabular data are also widely used for identifying crash hotspots and risk factors. Geographic Information Systems are commonly employed to create spatial visualizations and conduct spatial analyses of crash patterns, helping authorities identify high-risk locations and implement targeted safety measures [33]. Logistic regression and other ML techniques have been effectively applied to classify crashes, including identifying speeding-related incidents with notable accuracy rates [20,34,35,36]. However, empirical evaluations suggest that gradient-boosted tree models, such as XGBoost, often outperform deep learning methods in tabular data applications, highlighting the challenges in leveraging advanced ML techniques for richer interpretations [36,37,38].

Despite their structured nature, tabular data often lack the necessary context for comprehensive crash analysis, leading researchers to rely on feature engineering and external data sources for enrichment [39]. To address this limitation, researchers have begun exploring multimodal approaches that combine tabular data with other data types, such as crash narratives or images [40].

2.2. Textual Data and NLP in Crash Analysis

While tabular data provide quantitative insights, textual data, such as crash narratives, add qualitative context crucial for understanding traffic incidents. These narratives often include descriptions of driver behavior, environmental conditions, and the events leading up to a crash, providing insights that are missing from purely tabular data. For instance, Dingus et al. found that driver-related factors, such as error, impairment, fatigue, and distraction, are present in nearly 90% of crashes [41]. Adanu et al. further identified rural areas, fatigue, and risky behaviors like speeding and Driving Under the Influence (DUI) as significant influences on crash severity [42]. Textual data allow researchers to capture these nuances.

NLP has made significant strides in analyzing crash narratives. Traditional NLP techniques focused on extracting keywords and performing frequency analysis, but more advanced models, such as GPT-3.5 and GPT-4, now enable more sophisticated text classification and information extraction through few-shot and zero-shot learning [23]. These models are capable of capturing the semantic meaning behind crash narratives, providing a more accurate analysis of the text. In fact, previous studies found that driver actions, such as speeding and alcohol use, are major contributors to injury severity, further demonstrating the value of labeling such actions in crash analysis [37,43,44].

However, traditional NLP and ML methods often struggle with the contextual and semantic nuances of text data, limiting their ability to extract useful information fully. Advanced NLP models like GPT-4 aim to address this limitation by automating the process of label generation for crash factors, driver fault, and driver actions. By combining the qualitative insights from textual data with structured tabular data, researchers are able to build more comprehensive crash prediction models, providing policymakers with data-driven insights for improving road safety. Recent studies have explored the use of ensemble learning with pre-trained transformers, such as BERT and RoBERTa, to enhance crash severity classification from narrative data, showing improved performance through hard voting mechanisms [45]. However, these approaches focus solely on unstructured data and rely primarily on supervised fine-tuning, without leveraging multimodal integration or evaluating zero-shot and few-shot learning for broader traffic safety tasks.

2.3. Multimodal Data Fusion: Bridging Structured and Unstructured Data

MDF in traffic safety represents an innovative approach that integrates diverse data sources to enhance the analysis and prediction of traffic incidents. Combining tabular data, such as traffic volume and weather conditions, with crash narratives and other unstructured data forms provides a more holistic view of crash dynamics [20,46]. For example, Das et al. demonstrated how the fusion of crash narratives with tabular data uncovered specific contributing factors, such as sun glare, which were not apparent in tabular data alone [47].

Research shows that this multimodal fusion approach leads to improved traffic safety outcomes. Studies demonstrated how combining different data sources, including crash narratives, provided a more accurate estimate of road traffic injury burdens [48,49,50]. This fusion approach also enhances the accuracy of predictions related to crash severity, driver fault, and crash factors, helping transportation authorities develop more effective intervention strategies [48,49,50].

LLMs, such as GPT-3.5 and GPT-4, are particularly well-suited to this approach of MDF. These models allow for the generation of new labels, such as driver actions and crash factors, through few-shot and zero-shot learning, which significantly reduces the amount of training data required [19,23]. The fusion of tabular and textual data, combined with advanced NLP models, provides richer insights into traffic incidents, allowing for more accurate analysis and prediction of crash outcomes [51,52]. This study builds upon these advancements, introducing a novel MDF framework that leverages the latest generative models to improve crash analysis through zero-shot, few-shot, and fine-tuning learning experiments [19,23].

Moreover, combining these data sources and generating labels for driver fault, crash factors, and driver actions is crucial for crash analysis and prediction. For instance, identifying the at-fault driver is essential for understanding crash responsibility and liability, aiding in targeted interventions for high-risk drivers. A study by Chandraratna et al. developed a crash prediction model that correctly classified at-fault drivers up to 74.56%, highlighting the importance of this label [53]. Understanding crash factors is also critical, as Dingus et al. found driver-related factors (error, impairment, fatigue, and distraction) in nearly 90% of crashes. Adanu et al. identified rural areas, fatigue, and risky behaviors like speeding and DUI as significant influences on crash severity [41]. Driver actions, such as speeding and alcohol use, are significant contributors to injury severity, as noted by Ma et al. in [43]. Labeling these actions helps researchers and policymakers understand crash precursors and develop interventions [43]. These labels are essential for creating comprehensive crash prediction models, identifying high-risk drivers, and formulating data-driven safety measures.

This method combines information from various modes of transportation and data types, including tabular data, crash narratives, sensor data, and video feeds, to create a comprehensive view of traffic safety scenarios [54]. The key objective of MDF is to leverage the strengths of different data forms to improve traffic incident analysis and prevention strategies. For instance, tabular data, such as traffic volume and weather conditions, provide structured information that is crucial for statistical analysis. In contrast, crash narratives offer unstructured, detailed accounts of incidents, which can reveal underlying factors contributing to crashes [55].

However, the analysis of textual crash narratives provides detailed insights crucial for effective policymaking and interventions. Recent advancements in NLP, particularly with the emergence of GPT models like GPT-3.5 and GPT-4, have revolutionized the analysis of unstructured text. These models facilitate sophisticated text classification and information extraction through methods like zero-shot and few-shot learning, offering new avenues for analyzing crash reports [16]. Still, traditional NLP and ML methods often struggle with the complex and semantic nuances of text, a limitation LLMs aim to overcome.

Despite the strengths of analyzing tabular and textual data separately, a significant gap exists in research exploring their fusion for traffic crash analysis. This study presents a unified data fusion methodology that integrates both data types while addressing gaps in previous research. The introduction of LLMs, such as GPT-3.5 and GPT-4, into traffic safety research represents a novel contribution, enabling the extraction of nuanced information from complex narratives and potentially transforming the field with deeper insights and targeted interventions [56]. The application of LLMs in traffic safety analysis addresses the challenges of dataset size and quality, showcasing their potential to enhance road safety research beyond the capabilities of conventional ML methods.

The evolution of NLP has been significantly influenced by the advent of transformer-based LMs, such as BERT [57], GPT [58], and XLnet [59], which have set new benchmarks across various NLP tasks [14]. While these models offer unparalleled capabilities in text classification [46], question answering [60], and next-word prediction, their deployment often requires extensive computational resources and large datasets [57,61]. A common solution involves fine-tuning pre-trained models with task-specific data, enhancing their performance across different domains [62,63,64].

However, the application of these models in traffic safety analysis presents unique challenges, notably the need for efficient data labeling and adaptation to the specific context of crash narratives. Traditional manual labeling methods are increasingly impractical due to the growing volume of data. In this context, LLMs, particularly through active learning, offer a promising solution by automating or assisting in the data labeling process, significantly reducing the time and cost associated with manual annotation [65]. Moreover, the field of traffic safety research benefits immensely from the advanced text analysis capabilities of LLMs. Their ability to extract critical insights from unstructured text data enables a deeper understanding of crash circumstances, common factors in reports, and the development of targeted safety interventions. Initial studies have explored LLMs for semantic parsing of queries over tabular datasets and entity-matching tasks, demonstrating their potential in enriching tabular data with unstructured text [66,67,68].

To bridge the gap between traditional fine-tuning methods and the specific needs of crash analysis, this research introduces a novel MDF framework that leverages zero-shot, few-shot, and fine-tuning learning experiments with LLMs. The approach aims to overcome the limitations associated with handling tabular and textual data in traffic safety research. Zero-shot learning enables the models to classify and analyze data without prior exposure to specific task examples, offering a flexible solution for rapidly understanding new crash types. On the other hand, few-shot learning allows for the adaptation of models to crash analysis tasks with minimal data, enhancing the precision of insights derived from crash narratives. This integrated framework not only streamlines the model training and application process but also opens new avenues for leveraging LLMs in the nuanced field of traffic safety analysis. By employing zero-shot and few-shot learning, our research aims to address the challenges of dataset size and quality, showcasing the versatility of LLMs in generating actionable insights for traffic safety interventions.

To further clarify the contributions of this study, Table 1 presents a comparative analysis of selected prior work in traffic crash analysis. The table highlights differences in data types used, modeling approaches, learning paradigms, key tasks, and the relative advantages of the proposed MDF framework. This comparison emphasizes the novelty of integrating Large Language Models (LLMs) with multimodal data for automated label generation and traffic crash classification.

3. Methodology

This study develops and applies an MDF framework to traffic crash analysis, integrating both structured tabular crash data and unstructured textual narratives. Below is a detailed breakdown of each phase in the methodology.

3.1. Dataset

The crash dataset used in this study was obtained from the Missouri State Highway Patrol and contains data on 6400 crashes that occurred between 2019 and 2020. The dataset is notably imbalanced, with 476 fatal crashes and 5924 non-fatal crashes, reflecting a significant disparity in the target variable. The dataset includes 58 variables, which capture various aspects of each crash, such as unique identifiers, year, date, and time of the crash. Additionally, the dataset contains critical features related to the crash itself, including environmental conditions, road characteristics, and vehicle specifics.

To address the imbalance in the target variable (crash severity), we categorized crashes into fatal and non-fatal groups. To ensure that the model does not overfit to the majority class (non-fatal crashes) and that it generalizes well, we applied downsampling techniques [72]. Specifically, we included all 476 instances from the minority class (fatal crashes) and randomly selected 476 instances from the majority class (non-fatal crashes), resulting in a balanced dataset of 952 samples. This approach was essential for mitigating class skew and improving model performance [73,74].

The balanced dataset aligns well with the study’s use of zero-shot and few-shot learning paradigms, which are particularly effective in situations with limited labeled data. These methods leverage the extensive knowledge embedded in pre-trained models, such as GPT variants, and can adapt this knowledge to new tasks with minimal additional data. As a result, the need for large datasets is diminished, making the balanced dataset suitable for fine-tuning GPT models [75]. This ensures that the dataset, despite its smaller size, remains effective for the application of advanced learning techniques. The class distribution of the balanced dataset is presented in Table 2.

3.2. Multimodal Data Fusion (MDF) Framework

This study develops and applies an MDF framework to enhance traffic crash analysis by integrating structured tabular crash data with unstructured textual crash narratives, as illustrated in Figure 1. The MDF framework addresses the limitations of traditional analysis methods that handle tabular and textual data separately, creating a more holistic approach that unifies both data types, enabling deeper insights.

The framework includes several key stages: preprocessing tabular data, converting structured data into a textual format, and combining it with unstructured narratives. GPT-4 was employed specifically for label generation, producing important labels, such as driver fault, driver actions, and crash factors. Following label generation, domain experts verified the output to ensure accuracy and relevance, reducing the risk of errors or information loss during the process.

For predictive modeling and information extraction, we established GPT-2 fine-tuned as the baseline model and compared its performance against more advanced techniques, including fine-tuning GPT-2, as well as zero-shot and few-shot learning with GPT-3.5 and GPT-4.5 models. These approaches were applied to predict labels and extract relevant crash-related information, significantly improving the analysis of crash data. Integrating these advanced machine learning models with the MDF framework allowed for generating accurate insights while leveraging both textual and tabular data sources.

Validation of the model’s predictions was crucial to ensure both contextual and factual accuracy. The performance of the models was evaluated using standard classification metrics, including the F1-Score, Precision, Recall, and Jaccard Index. Algorithm 1 encapsulates the algorithmic steps of the MDF framework, which are versatile enough to be applied beyond traffic crash analysis to various multimodal data applications.

Algorithm 1. MDF pseudocode.

Input:

-: Tubular Data: $T = {t_{i} | i \in [1, N]},$ where $t_{i}$ represents the i-th tabular data entry
-: Textual Narratives: $N = {n_{i} | i \in [1, N]}$ , where $n_{i}$ corresponds to the narrative of the i-th crash
-: Pre-trained GPT Models: $G = {G P T - 2, G P T - 3.5, G P T - 4.5}$
-: Learning Approaches: $L = {Z S, F S, F T}$

Output:

-: Analysis Results: $A = {a_{i} | i \in [1, N]}$ , where $a_{i}$ includes enhanced analysis for the i-th crash

Procedure:

1.

Cleanse Tabular Data:

T' = p r e p r o c e s s (T)

2.

Serialize Data:

C = s e r i a l i z e (T', N)

, where

C = \{c_{i}| c_{i} = t_{i}^{'} + n_{i}, i \in [1, N]}

3.

Label Generation (GPT-4, FS):

L_{C} = \{l_{c}| l_{c} = G P T - 4_{F S} (c_{i}), \forall c_{i} \in C}

4.

Validate Labels:

L_{V} = v a l i d a t e (L_{C})

5.

Data Splitting:

(C_{t r a i n}, C_{t e s t}) = s p l i t (C, r a t i o)

6.

Initialize Results:

R = []

7.

For each approach

λ \in L

:

a.: Model Conditioning: $M_{λ} = c o n d i t i o n (G, λ, C_{t r a i n})$
b.: Apply Model: $R_{λ} = \{r_{λ i}| r_{λ i} = M_{λ} (c_{i}), \forall c_{i} \in C_{t e s t}}$
c.: Store Results: $R . a p p e n d (R_{λ})$

8.

Evaluate:

E = {e v a l u a t e (R_{λ}, m e t r i c s) | λ \in L}

9.

Aggregate Insights:

I n s i g h t s = a n a l y z e (E)

End

3.3. Data Preparation and Serialization (Tabular-to-Test Conversion)

Serialization is a crucial step in integrating structured tabular data with unstructured crash narratives. This involves transforming the structured data into textual format, making them compatible for integration with the narrative descriptions of each crash.

Before applying the MDF framework, the data undergo several preprocessing steps. This includes cleaning the tabular data and removing non-essential columns (e.g., complaint_id, two_vehicles, date) due to their limited contribution to the analysis or redundancy [76]. The remaining structured data, which included critical features, such as crash severity, road conditions, and vehicle specifics, were serialized into natural language text using NLP techniques.

This serialization process involved tokenizing both numerical and categorical data. For instance, numerical data like “speed = 50 mph” were converted into natural language phrases, such as “the vehicle was traveling at 50 miles per hour”. Similarly, categorical data like “road condition = wet” were transformed into “the road was wet at the time of the crash”. The serialized text was then combined with the original crash narrative, creating a consolidated and enriched dataset that integrates both types of information. Figure 2 demonstrates an example of serialized tabular data in textual form. Algorithm 2 provides additional details on the step-by-step process of converting tabular data into text.

The serialized data are combined with unstructured textual crash narratives to create a unified dataset. This dataset contains enriched descriptions of the crash events, offering a more holistic view by merging structured crash characteristics with the free-text narratives. The combined dataset was then used in subsequent steps for further analysis.

Algorithm 2. Data serialization process.

Input: A dataset with rows and columns (col_name, col_desc)
Output: A list of text descriptions generated from the tabular data
Begin
Initialize text_list as an empty list
For each record in the dataset do
Initialize textDescription to an empty string
For each (col_name, col_desc) pair in the record do
Switch (col_name)
Case patterns:
Apply conversion logic based on col_name and col_desc
Append the resulting string to textDescription
// Cases cover different column names with specific logic for each
// Example cases include “city”, “left_scene”, “acc_time”, etc.
Default:
Optionally handle unexpected or generic cases
End Switch
End For
Append textDescription to text_list
End For
Return text_list
End

3.4. Label Generation via GPT Models

Utilizing OpenAI’s GPT-4 application programming interface (API), we implemented few-shot learning techniques to automatically generate new labels for the dataset. This process was guided by specifically designed prompts to generate labels, such as driver fault (binary: yes/no), driver actions (e.g., speeding, aggressive driving), and crash factors (e.g., collision with an object, environmental conditions). These new labels enhance the depth of the analysis, progressing beyond the binary classification of crash severity (fatal/non-fatal) to capture more nuanced insights into traffic incidents. Figure 3 illustrates the flow of the label generation process, and Table 3 presents the structure of the few-shot learning prompt template used to create these labels. For instance, prompts were structured to ensure that the system accurately categorized whether the driver was at fault and to identify the contributing actions and external crash factors.

Manual review by domain experts is a critical step in validating the accuracy and reliability of labels generated through few-shot learning approaches using advanced language models like GPT-4. This review process involves experts meticulously examining the generated labels to ensure their correctness, relevance, and applicability to the task [77]. The focus is on assessing both the completeness and correctness of the labels while identifying potential errors or omissions that could lead to inaccuracies in real-world applications [78]. In this study, two domain experts conducted a thorough review of the labels produced using GPT-4.5, comparing the serialized textual data with the original tabular data and performing consistency checks. This expert review helps mitigate the risk of information loss, misclassification, or hallucination during the serialization process, ensuring that the final dataset maintains its integrity and suitability for further analysis.

These labels, such as driver_fault, were essential for understanding traffic incidents at a granular level, enabling detailed causal analysis to support road safety interventions [42,79,80]. Validation of the labels was performed through cross-checks with domain experts to ensure accuracy. Table 4 provides the distribution of the driver_fault label.

Due to limitations imposed by the GPT-4 API, such as the length of input data that can be processed at once, the dataset was split into smaller segments. Each segment was processed to generate labels within the API’s constraints, and the generated labels were subsequently validated by a domain expert, ensuring the model’s output aligned with expert expectations. These labeled datasets were then prepared for the next stage of analysis and fine-tuning.

3.5. Dataset Creation

Following the generation of labels, the dataset was prepared in a JSON format specifically designed for efficient model training. This format was chosen because it allows the enriched narratives and their corresponding labels to be stored in a structured and accessible way, ideal for integration into machine learning workflows. JSON files offer flexibility and scalability, making them particularly suitable for complex data types like those used in our study.

Each narrative, along with the generated labels (severity, driver fault, driver actions, crash factors), was carefully organized to ensure clarity and consistency, enabling seamless use for fine-tuning the GPT-2 and GPT-3.5 models. An example of how the dataset was structured is presented in Table 5.

3.6. Modeling

This study employed several machine learning models to analyze traffic crash data, with GPT-2 fine-tuned serving as the baseline model for benchmarking against GPT-3.5 and GPT-4.5, which leveraged zero-shot and few-shot learning approaches. The goal was to evaluate the effectiveness of these methodologies within the traffic safety domain, particularly in analyzing crash narratives with minimal reliance on large labeled datasets.

3.6.1. Fine-Tuning (FT)

GPT models, particularly GPT-2, have shown remarkable capabilities for NLP tasks. Fine-tuning these models involves adapting pre-trained models to specific domains or tasks, which can significantly enhance their performance in targeted applications. Key techniques for fine-tuning GPT-2 models include transfer learning, where the pre-trained model is further trained on domain-specific data, and prompt engineering, which involves designing effective input prompts to guide the model’s output. For instance, in the medical domain, a GPT-2 model was fine-tuned on over 374,000 dental clinical notes, achieving a 76% accuracy in next-word prediction [81]. This demonstrates the model’s ability to adapt to specialized vocabularies and contexts. However, fine-tuning GPT models faces challenges. One significant issue is the creation of proper training data, which can be resource intensive and time consuming [82]. Another challenge is maintaining context awareness, as pre-trained GPT models may lack specific contextual understanding, leading to awkward dialogue in certain scenarios [83]. Choosing GPT-2 for fine-tuning is justified by several factors. It offers a good balance between model size and performance, making it more accessible for fine-tuning compared to larger models. GPT-2 has demonstrated impressive results in various applications, from medical text prediction [81] to document analysis [83]. Additionally, its architecture allows for effective adaptation to specific domains, as seen in the power energy sector with the PowerPulse model, for example [84].

3.6.2. Prompt Engineering for Few-Shot and Zero-Shot Learning

Prompt engineering is a critical technique in the field of NLP and artificial intelligence involving the design and optimization of prompts to enhance the performance of LLMs in specific tasks. It plays a pivotal role in advancing NLP technologies and AI applications, making them more efficient, accessible, and personalized [85]. The process of prompt engineering encompasses various methods, including zero-shot, few-shot prompting, and more advanced techniques, like chain-of-thought and tree-of-thoughts prompting [86]. These methods aim to guide LLMs to follow established human logical thinking, significantly improving their performance.

Few-shot and zero-shot learning techniques have gained significant attention in the field of machine learning, particularly with the advent of LLMs. These approaches aim to enable models to perform tasks with minimal or no task-specific training data, leveraging their pre-existing knowledge and generalization capabilities [86]. GPT-3.5 and GPT-4 have demonstrated remarkable performance in few-shot and zero-shot learning scenarios across various domains. In cross-lingual summarization, these models have shown significant improvements in performance with few-shot learning, particularly in low-resource settings [87]. Similarly, in medical image interpretation, GPT-4V’s performance improved from 40% accuracy in zero-shot learning to 90% with twenty-shot learning, matching the performance of specialized Convolutional Neural Network models [69]. However, challenges remain in applying these techniques effectively. For instance, GPT-3.5 struggles with certain types of grammatical errors in zero-shot settings, particularly for languages other than English [88]. Additionally, the performance of these models can be highly dependent on the quality and representativeness of the few-shot examples provided [89]. However, GPT-3.5 and GPT-4 represent powerful tools for few-shot and zero-shot learning across various tasks, from text classification to code review automation [90]. Their ability to quickly adapt to new tasks with minimal examples makes them particularly valuable in domains where large labeled datasets are scarce or expensive to obtain. However, careful consideration must be given to prompt engineering and example selection to maximize their performance [91].

3.7. Evaluation Metrics

Performance evaluation metrics are essential for measuring the effectiveness of machine learning models. Commonly used metrics include Accuracy, Precision, Recall, and the F1-Score. Accuracy reflects the overall correctness of predictions, while Precision focuses on the proportion of true positive predictions among all positive predictions. Recall indicates the model’s ability to identify all relevant positive instances, and the F1-Score provides a balanced measure by combining both Precision and Recall. In cases of imbalanced datasets, Accuracy can be misleading, so additional metrics are required. The Jaccard score measures the similarity between predicted and actual sets, making it especially useful for multi-label tasks, such as driver actions or crash factors, where multiple contributing factors are common.

Our dataset was split into 60% for training (571 crashes) and 40% for testing (381 crashes). For binary classification tasks like driver fault or crash fatality, we used Accuracy, Precision, Recall, and the F1-Score. For multi-label classification tasks, the Jaccard score was applied to measure intersection-over-union between predicted and actual labels, offering a more nuanced evaluation.

3.8. Experiment Setup

The primary objective of the experimental investigation was threefold:

To evaluate whether integrating crash narrative information with tabular data enhances the accuracy of traffic crash classification models.
To develop a novel framework for MDF using state-of-the-art LLMs.
To test these methodologies on real-world traffic crash datasets.

This study introduces an innovative approach to the traffic domain by combining narrative and tabular data using advanced language models. No prior work has integrated these data types for traffic crash analysis, making this study a pioneering effort. Consequently, there are no established baseline models for this specific multimodal approach. The experiments conducted are divided into two main parts: Experiment 1, focusing on the comparative performance of models using tabular, narrative, and fused data, and Experiment 2, focusing on the application of advanced model learning techniques.

3.9. Experiment 1: Comparative Analysis of Narrative, Tabular, and Fused Data Performance

3.9.1. Experimental Setup

The primary objective of Experiment 1 was to evaluate whether the integration of crash narrative data with tabular data enhances the predictive accuracy of crash classification tasks, specifically for crash severity and driver fault. This experiment serves as a foundational validation of the hypothesis that MDF outperforms individual data sources.

Three distinct models were tested:

Tabular-only model: Relied solely on quantitative tabular data, such as environmental conditions, traffic statistics, and time of incidents, to evaluate the effectiveness of structured data in isolation.
Narrative-only model: Focused exclusively on the unstructured crash narrative to assess the standalone value of unstructured data in the incident analysis.
Fused (tabular + narrative) model: Combined both tabular and narrative data into a cohesive format to evaluate if this integrated approach could surpass the informational value of either data type alone, providing a richer, more nuanced context for analysis.

A few-shot learning approach was employed using the GPT-4.5 model. The model was primed with five examples (k = 5) and tested on 100 samples from the dataset, which contained both tabular and narrative data for each crash incident. The tabular data included fields like weather conditions, vehicle type, and crash location, while the narrative data described the sequence of events leading to the crash.

3.9.2. Model Evaluation and Results

The performance of the three models (tabular-only, narrative-only, and fused) for driver fault prediction is compared based on key metrics: Accuracy, Precision, Recall, and F1-score. These metrics are critical for evaluating how well the models perform in classifying whether the driver is at fault or not.

Figure 4 presents the confusion matrix for the fused model, which significantly improves classification accuracy compared to the GPT-2 fine-tuned baseline model. The model correctly classified 78 cases where the driver was at fault and 9 cases where the driver was not at fault, with only 1 false positive and 12 false negatives. This performance reflects the power of integrating structured (tabular) data and unstructured (narrative) data to improve classification accuracy.

A bar chart comparing the performance metrics—Accuracy, Precision, Recall, and F1-score—for driver fault prediction across the three models is presented in Figure 5. The fused model outperformed both the narrative-only and tabular-only models across all metrics. The key findings are as follows:

Fused model: Achieved 90% accuracy and an F1-score of 94% for identifying drivers at fault. The model also demonstrated strong precision (100%) for detecting drivers at fault and recall (63%) for identifying drivers not at fault.
Narrative-only model: Showed slightly lower performance with 88% accuracy and an F1-score of 93%. Precision for detecting drivers at fault remained at 100%, but Recall for not-at-fault drivers dropped to 62%.
Tabular-only model: Performed the weakest, particularly in identifying drivers not at fault, with an accuracy of 78% and a precision of only 32%. This model had difficulty capturing contextual information, resulting in lower Recall and F1-scores.

Figure 5. Comparative performance metrics for driver fault prediction across fused, narrative, and tabular models.

These results illustrate that the fused model not only outperformed the other models in terms of overall accuracy but also provided a more balanced performance across both guilty and non-guilty predictions. The narrative-only model performed slightly worse due to its inability to leverage structured data, while the tabular-only model exhibited poor performance, particularly in cases involving complex behavioral factors that could only be captured by narrative descriptions.

To further investigate the qualitative insights provided by each data source, word cloud analyses were conducted to visually demonstrate the focus of each model. The following insights were gained from the word clouds.

The tabular text word cloud in Figure 6 reflects the structured nature of the data collected in official accident reports, focusing on key factual elements. Prominent terms like “accident”, “driver”, and “vehicle” highlight the central aspects of the incident—who was involved and what happened. Terms like “highway”, “intersection”, and “street” demonstrate the importance of spatial information in understanding where the incident occurred, helping identify accident-prone locations. Meanwhile, words like “conditions” and “light” point to the environmental factors that may have contributed to the crash, providing a static view of external influences like weather or time of day. The frequent appearance of administrative terms like “Troop”, “Patrol”, and “County” further emphasizes the structured, location-specific reporting common in tabular data. While this type of information is essential for understanding the context of the crash, it often lacks the depth needed to explore dynamic aspects of the event, such as how the accident unfolded over time or the actions of the driver beyond simple descriptors like “driving aggressively”. In isolation, tabular data offer a factual overview but may miss the causal relationships and behavioral details that a narrative approach can provide.

The narrative text word cloud in Figure 7 highlights the dynamic and descriptive nature of the data captured in accident reports. Unlike the more structured and factual tabular data, the narrative data focus on how the crash unfolded, as evidenced by terms like “CRASH”, “ROADWAY”, “LOST”, “CONTROL”, and “STRUCK”. These words reflect the sequence of events and provide crucial details about the behavior of drivers and the environment during the accident. For example, terms like “overturned”, “traveled”, and “right” point to specific movements and outcomes of the crash, giving a more in-depth understanding of what occurred in real time. The frequent appearance of words like “VEHICLE” and “DRIVER” suggests that the narrative text emphasizes actions related to the individuals involved in the crash. Additionally, terms like “PRONOUNCE” and “FATALITY” offer critical details about the severity of the crash, which may not be explicitly captured in the structured tabular data. Overall, narrative data add a rich layer of qualitative insights, filling in gaps that tabular data might miss, such as driver behavior and crash dynamics.

The fused text word cloud in Figure 8 integrates insights from both the structured tabular data and the unstructured narrative data, creating a more comprehensive picture of the crash events. Key terms like “accident”, “vehicle”, “driver”, and “scene” are prominent, providing a foundational understanding of the involved entities and locations. Additionally, the inclusion of terms like “conditions”, “driving aggressively”, and “overturned” highlights critical factors contributing to the crash, combining both environmental and behavioral elements.

The fused data also capture spatial and directional information, as seen from terms like “intersection”, “direction”, “street”, and “MILE”. This contextual detail offers insights into where accidents occurred and the conditions of those locations, complementing the specific narrative descriptions of how the events unfolded (e.g., “STRUCK”, “CRASH”, “LOST”). Furthermore, terms like “Patrol”, “Missouri State”, and “reported” connect the incident to official records, emphasizing the role of law enforcement and formal reporting.

The results of Experiment 1 demonstrate that integrating narrative and tabular data through MDF not only improves predictive accuracy but also enables a more nuanced understanding of traffic crash events. This comparison highlights the value of leveraging both structured and unstructured data for traffic crash analysis and supports the broader application of MDF in similar research domains.

3.10. Experiment 2: Multimodal Data Framework (MDF) Modeling and Evaluation

Building upon the findings from Experiment 1, Experiment 2 aimed to explore the efficacy of various learning approaches in multimodal data text classification. We implemented a range of techniques, including zero-shot learning, few-shot learning, and fine-tuning methods using GPT-2, GPT-3.5, and GPT-4.5 models.

The models were chosen for their state-of-the-art performance in natural language understanding and generation. The training and evaluation of these models were conducted using GPT-2 and the OpenAI GPT series via API. This approach enabled us to apply the models directly to the dataset labeled for severity, driver fault, driver actions, and crash factors. The labels were generated through textual data, covering a broad spectrum of scenarios relevant to automated event detection and analysis.

3.10.1. Fine-Tuning with GPT-2

We fine-tuned GPT-2 on a crash dataset to adjust its weights for the specific task. Fine-tuning offers robust performance but requires significant labeled data for training. Fine-tuning enables the model to adapt to the crash narrative data, improving its ability to generate accurate textual outputs and predict relevant labels. We optimized several hyperparameters to ensure efficient training and text generation. Table 6 outlines the hyperparameters used for the GPT-2 model. Key parameters, such as batch_size, num_train_epochs, and block_size, were tuned to optimize memory management and ensure efficient model training. For text generation, parameters like do_sample, max_length, top_k, and top_p were adjusted to control the diversity and length of generated outputs.

3.10.2. Prompt Engineering Using Few-Shot and Zero-Shot Learning (GPT-3.5 and GPT-4.5)

Task-specific prompts were carefully crafted for both few-shot and zero-shot learning to guide the models in generating accurate outputs. In few-shot learning, the model is provided with a few task examples to condition its response, as demonstrated in Table 7. On the other hand, zero-shot learning relies solely on the model’s general knowledge and understanding without any task-specific examples, as shown in Table 8.

These task-specific prompts were designed to help the models classify crash data effectively and predict labels like driver fault, driver actions, and crash factors.

For GPT-3.5 and GPT-4.5, both few-shot and zero-shot learning paradigms were tested. Hyperparameters for these models are listed in Table 9. These approaches required no task-specific fine-tuning, making them more flexible for tasks requiring minimal labeled data.

4. Analysis and Results

In this section, we evaluate the MDF methodology applied to traffic crash analysis. Our focus is on predicting crash severity, driver fault, driver actions, and crash factors using a fused dataset that integrates both tabular and narrative data.

The section begins with model evaluation and baseline comparison, where we assess the performance of the baseline GPT-2 fine-tuned model against advanced models like GPT-3.5 and GPT-4.5 utilizing few-shot and zero-shot learning. This comparison demonstrates the advantages of advanced models for handling multimodal datasets.

We then proceed to the analysis of fused data, applying these models to real-world traffic data, showcasing the extracted labels and insights into the factors contributing to traffic incidents. This analysis highlights the practical impact of our fusion approach on improving traffic safety predictions.

4.1. Model Performance Evaluation: Baseline vs. Advanced Models

In this section, we evaluate and compare the performance of baseline and advanced models in predicting crash severity, driver fault, driver actions, and crash factors. GPT-2 fine-tuned (FT) is used as the baseline model, while GPT-3.5 and GPT-4.5, utilizing both few-shot and zero-shot learning approaches, represent the advanced models.

4.1.1. Baseline Model: GPT-2 Fine-Tuned (FT)

We selected GPT-2 Fine-Tuned (FT) as the baseline model because it represents a conventional fine-tuning approach for crash data classification. Comparing its performance with GPT-3.5 and GPT-4.5 helps quantify the gains achieved through few-shot and zero-shot learning. While fine-tuned models like GPT-2 FT may underperform with sparse data, they provide a suitable reference for comparison with newer, more flexible models. Table 10 shows model performance for the classes’ severity, driver fault, and driver actions and crash factors.

4.1.2. Advanced Models Through Prompt Engineering: GPT-3.5 and GPT-4.5 (Few-Shot and Zero-Shot)

For advanced comparisons, we utilized GPT-3.5 and GPT-4.5 in both few-shot and zero-shot configurations. These models were chosen for their state-of-the-art capabilities in handling multimodal data with minimal labeled examples. Few-shot and zero-shot learning techniques are well-suited to scenarios with sparse datasets, enabling the models to generalize well without extensive fine-tuning. Table 11 shows model performance for the classes’ severity, driver fault, and driver actions and crash factors.

4.1.3. Summary of Baseline vs. Advanced Models

In this section, we compare the performance of the baseline model, GPT-2 fine-tuned (FT), against advanced models, GPT-3.5 few-shot and GPT-4.5 few-shot, for predicting crash severity, driver fault, driver actions, and crash factors. GPT-2 fine-tuned (FT) is established as the baseline model, providing a reference for performance improvements achieved by more advanced models. Our comparison focuses on the F1-score for binary classification and the Jaccard score for multi-class prediction.

To visually compare model performance, Figure 9 presents a bar chart of F1-scores for severity (fatal and non-fatal) and driver fault (at fault and not at fault) classification, comparing the baseline model (GPT-2 FT) with advanced models. The results show that GPT-2 fine-tuned (FT) achieves an F1-score of 99% for both fatal and non-fatal predictions, which is comparable to the advanced models. GPT-4.5 few-shot and GPT-3.5 few-shot also maintain high F1-scores above 94% for these categories, demonstrating that fine-tuning alone can yield strong performance in severity classification.

However, GPT-2 fine-tuned (FT) struggles significantly in driver fault classification, particularly in the “not at fault” category, where it achieves an F1-score of 0%. In contrast, GPT-4.5 few-shot outperforms all models with an F1-score of 92% for “Not at Fault” and 99% for “At Fault” classifications, highlighting its superior ability to differentiate between driver responsibility levels. GPT-3.5 few-shot also improves over GPT-2 FT, achieving 77% for “Not at Fault” and 98% for “At Fault” predictions.

For multi-class predictions like driver actions and crash factors, Figure 10 presents a bar chart comparing the Jaccard scores across models for multi-label classification tasks. Again, GPT-4.5 few-shot leads with a score of 73.1 for driver actions and 82.9 for crash factors, followed closely by GPT-3.5 few-shot, which scores 74.7 for crash factors and 54.1 for driver actions. The GPT-2 FT baseline, while performing adequately, shows lower Jaccard scores of 72.2 and 79.7, respectively, for driver actions and crash factors, underlining its limitation in handling more complex multi-label tasks.

From these results, it is evident that GPT-4.5 few-shot is the most effective model, consistently outperforming both the baseline (GPT-2 FT) and GPT-3.5 models across all evaluation metrics. Few-shot learning proves particularly advantageous for handling complex tasks, such as multi-label classification and nuanced driver fault determination. While GPT-2 fine-tuned (FT) demonstrates strong performance in severity classification, it struggles with identifying “Not at Fault” cases and has lower effectiveness in multi-label tasks.

These findings underscore the importance of leveraging few-shot learning with advanced models to enhance the robustness of crash data analysis, particularly in scenarios with limited labeled data or high classification complexity.

4.2. Enhancing Traffic Crash Prediction and Insights via MDF

Insights from Tabular and Narrative Data Fusion

We evaluate our methodology using the Missouri state dataset by generating new labels through the fusion of tabular and narrative data. The model predicts crash severity, driver fault, driver actions, and crash factors. The results for the first five rows of the dataset are presented, with the model predicting labels for each category. Table 12 summarizes the consolidated data (tabular + narrative) along with the corresponding predicted labels.

When driver fault is indicated as “yes”, the scenarios often involve aggressive driving behaviors, such as a driver losing control of the vehicle and subsequently overturning or going off-road. This can be seen in the cases reported by the Missouri State Highway Patrol, where daylight conditions did not prevent aggressive driving leading to collisions with fixed objects, other vehicles, or going off-road due to loss of vehicle control. Notably, none of these incidents with driver fault indicated as “yes” mention adverse weather conditions, suggesting that the driver’s actions were the primary factor in these crashes. On the other hand, scenarios where driver fault is marked as “no” include crashes where external factors play a significant role. For instance, one non-fatal crash occurred when a driver lost control due to a tire blowout. This incident was not attributed to the driver’s behavior but rather to a vehicle defect. In a fatal incident under dark, unlit conditions, the pedestrian’s impairment due to alcohol and drugs was a crucial factor, which overshadowed any potential driver fault. Figure 11 shows extracted factors when the driver fault is marked as yes and as no.

The word clouds in Figure 12 offer a distinct comparison between driver actions in traffic crashes based on fault attribution. In the “at-fault” scenario, terms like “aggressive”, “crossed”, “drinking”, and “centerline” dominate, highlighting aggressive or impaired driving as principal factors leading to crashes. This suggests a significant correlation between at-fault crashes and behaviors like speeding, driving under the influence, and disregarding road rules. Conversely, the “not-at-fault” cloud emphasizes words like “animal”, “slowed”, “avoid”, and “swerve”. This indicates that drivers not at fault are typically involved in situations requiring sudden, defensive reactions, often to unexpected external factors like animals on the road or erratic actions of other drivers. The prominence of these terms suggests that these incidents are largely out of the driver’s control, contrasting sharply with the self-inflicted nature of at-fault crashes. This comparison sheds light on the contrasting dynamics of traffic incidents, underlining the aggressive or negligent actions leading to at-fault crashes versus the reactionary measures in not-at-fault situations. Understanding these distinctions is vital for developing focused road safety campaigns and driver education programs. By targeting the specific behaviors associated with at-fault crashes, such as aggression or impairment, while also educating on defensive driving techniques, traffic safety initiatives can be more effectively tailored to address the diverse scenarios leading to road crashes.

Figure 13 shows distinct differences between how fatal and non-fatal crash factors emerge. Fatal crashes are characterized by aggressive driving behaviors, such as “speeding”, “drinking”, and “crossed centerline”, indicating high-risk actions that lead to severe outcomes. Key terms like “lost control” and “drugs” further highlight the role of impaired driving and loss of vehicle control in contributing to fatal incidents. Conversely, non-fatal crashes are marked by terms like “distracted”, “congested”, and “stop sign”, suggesting that these incidents often occur in busy, slower-moving environments where driver distractions are prevalent. Keywords, such as “avoid”, “swerve”, and “braked”, indicate attempts to prevent collisions, reflecting less aggressive, more reactive circumstances that, while risky, result in less severe consequences. Overall, fatal crashes are predominantly linked to high-speed, impaired decisions, while non-fatal crashes involve more urban, congested settings with distracted driving and evasive maneuvers, leading to lower-impact outcomes.

The word clouds in Figure 14 representing driver actions in fatal versus non-fatal crash scenarios offer insightful distinctions. In the fatal crashes cloud, predominant keywords include “aggressive”, “speeding”, “drinking”, and “drugs”, suggesting a high incidence of reckless behavior leading to severe outcomes. Terms like “crossed centerline” and “lost control” indicate dangerous maneuvers and loss of vehicle command, often resulting in fatal consequences. The prominence of “overcorrected” and “curve” implies that misjudgments, especially at high speeds or in challenging road conditions, significantly contribute to the lethality of crashes. In contrast, the non-fatal crashes cloud highlights “distracted”, “stop signed”, and “congested”, pointing towards scenarios commonly occurring in busy, urban settings where lower speeds might help reduce the severity of crashes. Words like “swerved”, “avoid”, and “braked” reflect reactive actions, indicating attempts to avert crashes or lessen their impact, typically leading to non-lethal outcomes. “Improper lane” and “another vehicle” suggest interaction with other road users, where miscommunications or errors occur but do not always result in fatal incidents. Overall, crash factors and driver actions vary significantly based on fault and severity. At-fault incidents often highlight aggressive behaviors and negligence, while no-fault crashes show reactive, defensive driving. Fatal crashes are marked by reckless actions like speeding and impaired driving, whereas non-fatal crashes involve more distracted and congested environments, showcasing the diverse dynamics influencing road safety outcomes.

4.3. Case Studies: Enhancing Crash Prediction and Factor Extraction with Multimodal Data

This section presents case studies demonstrating how integrating tabular and narrative data enhances driver fault prediction and crash factor extraction. While tabular data capture structured details, such as crash type, vehicle involvement, and environmental conditions, narrative descriptions provide additional context that can refine model predictions. The consolidated crash data in these cases represent a fusion of tabular information and textual crash narrative, offering a more comprehensive view.

We analyze three cases where predictions differ when using tabular-only, narrative-only, and combined data sources. In each case, the inclusion of narrative information leads to improved accuracy by providing contextual details that tabular data alone may miss. The findings highlight the importance of MDF for better crash analysis. Table 13 summarizes the outcomes of these cases, showing how integrating structured and unstructured data leads to more accurate and reliable driver fault classification.

5. Discussion and Conclusions

This study explored the effectiveness of advanced generative models for fusing tabular and textual data to classify and extract knowledge in a human-like language for traffic crash analysis. Unlike prior research focused on extracting word frequencies or keywords, which often resulted in superficial explanations, our approach utilized generative models like GPT to capture accurate semantic content without full-scale training. The introduction of the MDF framework represents a significant advancement in overcoming traditional limitations in integrating structured and unstructured data.

Our results demonstrated that advanced models, particularly GPT-4.5 few-shot learning, achieved exceptional performance in classifying crash severity and determining driver fault, with accuracy reaching 98.9% and precision as high as 100%. GPT-4.5 few-shot consistently outperformed other models, including GPT-2 fine-tuned, which served as a baseline. The baseline GPT-2 struggled particularly with the “not-at-fault” class in driver fault prediction. Meanwhile, GPT-3.5 few-shot learning showed significant improvement in identifying driver actions, with a Jaccard score of 80.2%, emphasizing its ability to handle nuanced textual data more effectively.

These results highlight the capability of few-shot, in-context learning to achieve high accuracy in extracting labels, such as crash severity, driver fault, driver actions, and crash factors, significantly outperforming the baseline GPT-2 fine-tuned model. The comparative analysis of baseline models (GPT-2) and advanced models (GPT-3.5 and GPT-4.5) illustrates the superiority of the latter in both binary classification and multi-label classification tasks, offering valuable insights for their application in automated text analysis. The integration of this AI-driven methodology within smart city infrastructures can significantly enhance the efficiency of transportation management, facilitating data-driven urban safety interventions.

Despite the promising findings of the MDF framework, several limitations must be considered. The framework was evaluated on Missouri crash data, and its generalizability to other regions remains uncertain. Further validation across diverse datasets, including international sources and different traffic environments (urban vs. rural), is necessary to assess its robustness. Additionally, the computational cost of large-scale LLMs, such as GPT-4.5, poses challenges for real-time deployment. Future research should explore efficient fine-tuning strategies and open-source alternatives to enhance accessibility and scalability.

Bias in AI-generated outputs is another concern, as models may inherit biases from training data, including discrepancies in crash reporting or enforcement practices. Implementing fairness-aware AI techniques and privacy-preserving methods, such as federated learning, can mitigate these risks.

Future research should systematically compare traditional machine learning models (e.g., logistic regression, random forests, and XGBoost) with LLM-based approaches in multimodal settings. While traditional ML models are effective for structured data, they require additional NLP pipelines for textual processing, increasing complexity. A comparative study between these methods will provide valuable insights into their strengths and trade-offs, particularly in terms of accuracy, computational efficiency, and interpretability. Additionally, hybrid approaches—where ML models handle structured data while LLMs process textual crash narratives—could offer a balanced approach to optimizing computational cost and model performance.

To enhance real-world applicability, the MDF framework should integrate real-time data sources, such as live traffic sensors, surveillance footage, and weather conditions, to improve predictive accuracy. Additionally, expanding its application to developing regions with limited labeled crash data could be supported through semi-supervised learning and synthetic data generation. Given the importance of broader validation, future research should explore the adaptability of the MDF framework to datasets from different states and international sources. Furthermore, to assess its robustness across diverse traffic conditions, future studies will analyze model performance in distinct traffic environments (e.g., urban vs. rural) to understand variations in crash patterns, contributing factors, and predictive accuracy. Additionally, ensuring applicability across different linguistic contexts, future research will investigate multilingual LLMs, transfer learning, or hybrid NLP techniques to process non-English crash data effectively.

Further investigations should assess the trade-offs between few-shot learning and fine-tuning, particularly in resource-constrained environments, where smaller fine-tuned models may offer better inference speed and cost-efficiency. Alternative data sampling techniques, including oversampling (e.g., SMOTE) and hybrid strategies, should also be explored to address class imbalances.

Integrating behavioral and psychological factors, such as risk-taking behaviors, driver distractions, and fatigue, could significantly improve predictive capabilities. Conducting ablation studies to optimize cost-performance trade-offs and developing Explainable AI models will enhance model interpretability, fostering trust and adoption by policymakers and transportation agencies.

Finally, a longitudinal study should evaluate the real-world impact of AI-driven interventions on traffic safety by monitoring crash trends and safety outcomes over time. Future research should also benchmark the MDF framework against state-of-the-art hybrid approaches combining rule-based NLP, deep learning architectures, and classical ML models to provide a holistic understanding of AI’s role in traffic crash analysis.

The MDF framework can be extended beyond traffic crash analysis to various domains requiring MDF, such as healthcare analytics, financial risk assessment, disaster response, and smart city management. These advancements will ensure the framework remains a versatile and impactful tool across industries, supporting data-driven decision-making in diverse real-world applications.

Author Contributions

S.J.: conceptualization, formal analysis, methodology, software, validation, visualization, writing—original draft, writing—review and editing. M.E.: supervision, methodology, conceptualization, and writing—review and editing. R.N.: supervision, methodology, and review. A.P.: supervision. H.I.A.: conceptualization, review, and proofreading. S.G.: reviewing and feedback. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Queensland University of Technology (QUT).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data will be made available upon request.

Acknowledgments

We extend our gratitude to Queensland University of Technology for supporting our research.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial intelligence
API	Application programming interface
DUI	Driving Under the Influence
EHR	Electronic Health Record
FS	Few-shot
FT	Fine-tuning
GPT	Generative Pre-Trained Transformer
LLM	Large Language Model
MDF	Multimodal Data Fusion
ML	Machine learning
NLP	Natural language processing
SMOTE	Synthetic Minority Over-sampling Technique
ZS	Zero-shot

References

Mannering, F.L.; Bhat, C.R. Analytic Methods in Accident Research: Methodological Frontier and Future Directions. Anal. Methods Accid. Res. 2014, 1, 1–22. [Google Scholar]
Saket, B.; Endert, A.; Demiralp, Ç. Task-Based Effectiveness of Basic Visualizations. IEEE Trans. Vis. Comput. Graph. 2018, 25, 2505–2512. [Google Scholar] [PubMed]
Roberts, P.L.D.; Jaffe, J.S.; Trivedi, M.M. Multiview, Multimodal Fusion of Acoustic and Optical Data for Classifying Marine Animals. J. Acoust. Soc. Am. 2011, 130, 2452. [Google Scholar]
Gu, X.; Wang, Z.; Jin, I.; Wu, Z. Advancing Multimodal Data Fusion in Pain Recognition: A Strategy Leveraging Statistical Correlation and Human-Centered Perspectives. arXiv 2024, arXiv:2404.00320. [Google Scholar]
Zhang, Y.-D.; Dong, Z.; Wang, S.-H.; Yu, X.; Yao, X.; Zhou, Q.; Hu, H.; Li, M.; Jiménez-Mesa, C.; Ramirez, J. Advances in Multimodal Data Fusion in Neuroimaging: Overview, Challenges, and Novel Orientation. Inf. Fusion. 2020, 64, 149–187. [Google Scholar]
Shen, W.; Wang, J.; Han, J. Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions. IEEE Trans. Knowl. Data Eng. 2014, 27, 443–460. [Google Scholar]
Hegselmann, S.; Buendia, A.; Lang, H.; Agrawal, M.; Jiang, X.; Sontag, D. Tabllm: Few-Shot Classification of Tabular Data with Large Language Models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 5549–5581. [Google Scholar]
Olugbade, S.; Ojo, S.; Imoize, A.L.; Isabona, J.; Alaba, M.O. A Review of Artificial Intelligence and Machine Learning for Incident Detectors in Road Transport Systems. Math. Comput. Appl. 2022, 27, 77. [Google Scholar] [CrossRef]
Haghshenas, S.S.; Guido, G.; Vitale, A.; Astarita, V. Assessment of the Level of Road Crash Severity: Comparison of Intelligence Studies. Expert. Syst. Appl. 2023, 234, 121118. [Google Scholar]
Yang, Y.; Wang, K.; Yuan, Z.; Liu, D. Predicting Freeway Traffic Crash Severity Using XGBoost-Bayesian Network Model with Consideration of Features Interaction. J. Adv. Transp. 2022, 2022, 4257865. [Google Scholar] [CrossRef]
Valcamonico, D.; Baraldi, P.; Amigoni, F.; Zio, E. A Framework Based on Natural Language Processing and Machine Learning for the Classification of the Severity of Road Accidents from Reports. Proc. Inst. Mech. Eng. O J. Risk Reliab. 2022, 238, 957–971. [Google Scholar] [CrossRef]
Arteaga, C.; Paz, A.; Park, J. Injury Severity on Traffic Crashes: A Text Mining with an Interpretable Machine-Learning Approach. Saf. Sci. 2020, 132, 104988. [Google Scholar] [CrossRef]
Xu, H.; Liu, Y.; Shu, C.-M.; Bai, M.; Motalifu, M.; He, Z.; Wu, S.; Zhou, P.; Li, B. Cause Analysis of Hot Work Accidents Based on Text Mining and Deep Learning. J. Loss Prev. Process Ind. 2022, 76, 104747. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5999–6009. [Google Scholar]
Tian, S.; Li, L.; Li, W.; Ran, H.; Ning, X.; Tiwari, P. A Survey on Few-Shot Class-Incremental Learning. Neural Netw. 2024, 169, 307–324. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models Are Few-Shot Learners. Adv. Neural Inf. Process Syst. 2020, 33, 1877–1901. [Google Scholar]
Song, Y.; Wang, T.; Mondal, S.K.; Sahoo, J.P. A Comprehensive Survey of Few-Shot Learning: Evolution, Applications, Challenges, and Opportunities. ACM Comput. Surv. 2023, 55, 271. [Google Scholar] [CrossRef]
Brown, D.E. Text Mining the Contributors to Rail Accidents. IEEE Trans. Intell. Transp. Syst. 2016, 17, 346–355. [Google Scholar] [CrossRef]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Comput. Surv. 2021, 53, 63. [Google Scholar] [CrossRef]
Ashqar, H.I.; Alhadidi, T.I.; Elhenawy, M.; Jaradat, S. Factors Affecting Crash Severity in Roundabouts: A Comprehensive Analysis in the Jordanian Context. Transp. Eng. 2024, 17, 100261. [Google Scholar] [CrossRef]
Hussain, Q.; Alhajyaseen, W.K.M.; Brijs, K.; Pirdavani, A.; Brijs, T. Innovative Countermeasures for Red Light Running Prevention at Signalized Intersections: A Driving Simulator Study. Accid. Anal. Prev. 2020, 134, 105349. [Google Scholar] [PubMed]
Fisa, R.; Musukuma, M.; Sampa, M.; Musonda, P.; Young, T. Effects of Interventions for Preventing Road Traffic Crashes: An Overview of Systematic Reviews. BMC Public. Health 2022, 22, 513. [Google Scholar]
Pourpanah, F.; Abdar, M.; Luo, Y.; Zhou, X.; Wang, R.; Lim, C.P.; Wang, X.-Z.; Wu, Q.M.J. A Review of Generalized Zero-Shot Learning Methods. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 4051–4070. [Google Scholar]
Xu, W.; Xian, Y.; Wang, J.; Schiele, B.; Akata, Z. Attribute Prototype Network for Zero-Shot Learning. Adv. Neural Inf. Process Syst. 2020, 33, 21969–21980. [Google Scholar]
Cao, W.; Wu, Y.; Sun, Y.; Zhang, H.; Ren, J.; Gu, D.; Wang, X. A Review on Multimodal Zero-Shot Learning. WIREs Data Min. Knowl. Discov. 2023, 13, e1488. [Google Scholar] [CrossRef]
Teoh, J.R.; Dong, J.; Zuo, X.; Lai, K.W.; Hasikin, K.; Wu, X. Advancing Healthcare through Multimodal Data Fusion: A Comprehensive Review of Techniques and Applications. PeerJ Comput. Sci. 2024, 10, e2298. [Google Scholar] [CrossRef]
Ye, J.; Hai, J.; Song, J.; Wang, Z. Multimodal Data Hybrid Fusion and Natural Language Processing for Clinical Prediction Models. medRxiv 2023. [Google Scholar] [CrossRef]
Luo, X.; Jia, N.; Ouyang, E.; Fang, Z. Introducing Machine-learning-based Data Fusion Methods for Analyzing Multimodal Data: An Application of Measuring Trustworthiness of Microenterprises. Strateg. Manag. J. 2024, 45, 1597–1629. [Google Scholar] [CrossRef]
Zou, Z.; Gan, H.; Huang, Q.; Cai, T.; Cao, K. Disaster Image Classification by Fusing Multimodal Social Media Data. ISPRS Int. J. Geoinf. 2021, 10, 636. [Google Scholar] [CrossRef]
Lord, D.; Mannering, F. The Statistical Analysis of Crash-Frequency Data: A Review and Assessment of Methodological Alternatives. Transp. Res. Part. A Policy Pr. 2010, 44, 291–305. [Google Scholar]
Savolainen, P.T.; Mannering, F.L.; Lord, D.; Quddus, M.A. The Statistical Analysis of Highway Crash-Injury Severities: A Review and Assessment of Methodological Alternatives. Accid. Anal. Prev. 2011, 43, 1666–1676. [Google Scholar]
Delen, D.; Sharda, R.; Bessonov, M. Identifying Significant Predictors of Injury Severity in Traffic Accidents Using a Series of Artificial Neural Networks. Accid. Anal. Prev. 2006, 38, 434–444. [Google Scholar]
Erdogan, S.; Yilmaz, I.; Baybura, T.; Gullu, M. Geographical Information Systems Aided Traffic Accident Analysis System Case Study: City of Afyonkarahisar. Accid. Anal. Prev. 2008, 40, 174–181. [Google Scholar] [CrossRef] [PubMed]
Ashqar, H.I.; Shaheen, Q.H.Q.; Ashur, S.A.; Rakha, H.A. Impact of Risk Factors on Work Zone Crashes Using Logistic Models and Random Forest. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; pp. 1815–1820. [Google Scholar]
Theofilatos, A.; Chen, C.; Antoniou, C. Comparing Machine Learning and Deep Learning Methods for Real-Time Crash Prediction. Transp. Res. Rec. 2019, 2673, 169–178. [Google Scholar] [CrossRef]
Iranitalab, A.; Khattak, A. Comparison of Four Statistical and Machine Learning Methods for Crash Severity Prediction. Accid. Anal. Prev. 2017, 108, 27–36. [Google Scholar] [CrossRef] [PubMed]
Santos, K.; Dias, J.P.; Amado, C. A Literature Review of Machine Learning Algorithms for Crash Injury Severity Prediction. J. Saf. Res. 2022, 80, 254–269. [Google Scholar] [CrossRef]
Wahab, L.; Jiang, H. A Comparative Study on Machine Learning Based Algorithms for Prediction of Motorcycle Crash Severity. PLoS ONE 2019, 14, e0214966. [Google Scholar] [CrossRef]
Hwang, Y.; Song, J. Recent Deep Learning Methods for Tabular Data. Commun. Stat. Appl. Methods 2023, 30, 215–226. [Google Scholar] [CrossRef]
Chen, J.; Tao, W.; Jing, Z.; Wang, P.; Jin, Y. Traffic Accident Duration Prediction Using Multi-Mode Data and Ensemble Deep Learning. Heliyon 2024, 10, e25957. [Google Scholar]
Dingus, T.A.; Guo, F.; Lee, S.; Antin, J.F.; Perez, M.; Buchanan-King, M.; Hankey, J. Driver Crash Risk Factors and Prevalence Evaluation Using Naturalistic Driving Data. Proc. Natl. Acad. Sci. USA 2016, 113, 2636–2641. [Google Scholar] [CrossRef]
Adanu, E.K.; Li, X.; Liu, J.; Jones, S. An Analysis of the Effects of Crash Factors and Precrash Actions on Side Impact Crashes at Unsignalized Intersections. J. Adv. Transp. 2021, 2021, 6648523. [Google Scholar] [CrossRef]
Ma, Z.; Zhao, W.; Steven, I.; Chien, J.; Dong, C. Exploring Factors Contributing to Crash Injury Severity on Rural Two-Lane Highways. J. Saf. Res. 2015, 55, 171–176. [Google Scholar]
Osman, M.; Paleti, R.; Mishra, S.; Golias, M.M. Analysis of Injury Severity of Large Truck Crashes in Work Zones. Accid. Anal. Prev. 2016, 97, 261–273. [Google Scholar] [PubMed]
Jaradat, S.; Nayak, R.; Paz, A.; Elhenawy, M. Ensemble Learning with Pre-Trained Transformers for Crash Severity Classification: A Deep NLP Approach. Algorithms 2024, 17, 284. [Google Scholar] [CrossRef]
Jaradat, S.; Alhadidi, T.I.; Ashqar, H.I.; Hossain, A.; Elhenawy, M. Exploring Traffic Crash Narratives in Jordan Using Text Mining Analytics. arXiv 2024, arXiv:2406.09438. [Google Scholar]
Das, S.; Sun, X.; Dadashova, B.; Rahman, M.A.; Sun, M. Identifying Patterns of Key Factors in Sun Glare-Related Traffic Crashes. Transp. Res. Rec. 2021, 2676, 165–175. [Google Scholar] [CrossRef]
Khadka, A.; Parkin, J.; Pilkington, P.; Joshi, S.K.; Mytton, J. Completeness of Police Reporting of Traffic Crashes in Nepal: Evaluation Using a Community Crash Recording System. Traffic Inj. Prev. 2022, 23, 79–84. [Google Scholar] [CrossRef]
Muni, K.M.; Ningwa, A.; Osuret, J.; Zziwa, E.B.; Namatovu, S.; Biribawa, C.; Nakafeero, M.; Mutto, M.; Guwatudde, D.; Kyamanywa, P.; et al. Estimating the Burden of Road Traffic Crashes in Uganda Using Police and Health Sector Data Sources. Inj. Prev. 2021, 27, 208. [Google Scholar] [CrossRef]
Kim, J.; Trueblood, A.B.; Kum, H.-C.; Shipp, E.M. Crash Narrative Classification: Identifying Agricultural Crashes Using Machine Learning with Curated Keywords. Traffic Inj. Prev. 2021, 22, 74–78. [Google Scholar] [CrossRef]
Zhang, X.; Green, E.; Chen, M.; Souleyrette, R.R. Identifying Secondary Crashes Using Text Mining Techniques. J. Transp. Saf. Secur. 2020, 12, 1338–1358. [Google Scholar]
Boggs, A.M.; Wali, B.; Khattak, A.J. Exploratory Analysis of Automated Vehicle Crashes in California: A Text Analytics & Hierarchical Bayesian Heterogeneity-Based Approach. Accid. Anal. Prev. 2020, 135, 105354. [Google Scholar]
Chandraratna, S.; Stamatiadis, N.; Stromberg, A. Crash Involvement of Drivers with Multiple Crashes. Accid. Anal. Prev. 2006, 38, 532–541. [Google Scholar]
Adetiloye, T.; Awasthi, A. Multimodal Big Data Fusion for Traffic Congestion Prediction. In Multimodal Analytics for Next-Generation Big Data Technologies and Applications; Springer: Cham, Switzerland, 2019; pp. 319–335. [Google Scholar] [CrossRef]
Jaradat, S.; Nayak, R.; Paz, A.; Ashqar, H.I.; Elhenawy, M. Multitask Learning for Crash Analysis: A Fine-Tuned LLM Framework Using Twitter Data. Smart Cities 2024, 7, 2422–2465. [Google Scholar] [CrossRef]
Rahman, S.; Khan, S.; Porikli, F. A Unified Approach for Conventional Zero-Shot, Generalized Zero-Shot, and Few-Shot Learning. IEEE Trans. Image Process. 2018, 27, 5652–5667. [Google Scholar]
Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Koubaa, A. GPT-4 vs. GPT-3.5: A Concise Showdown. Preprints 2023. [Google Scholar] [CrossRef]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.R.; Le, Q.V. Xlnet: Generalized Autoregressive Pretraining for Language Understanding. In Advances in Neural Information Processing Systems 32 (NeurIPS 2019), Vancouver, BC, Canada, 8–14 December 2019; Curran Associates, Inc.: Red Hook, NY, USA, 2019; pp. 5753–5763. [Google Scholar]
Sammoudi, M.; Habaybeh, A.; Ashqar, H.I.; Elhenawy, M. Question-Answering (QA) Model for a Personalized Learning Assistant for Arabic Language. arXiv 2024, arXiv:2406.08519. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 6 April 2024).
Zhong, B.; Pan, X.; Love, P.E.D.; Ding, L.; Fang, W. Deep Learning and Network Analysis: Classifying and Visualizing Accident Narratives in Construction. Autom. Constr. 2020, 113, 103089. [Google Scholar] [CrossRef]
Alhadidi, T.I.; Jaber, A.; Jaradat, S.; Ashqar, H.I.; Elhenawy, M. Object Detection Using Oriented Window Learning Vi-Sion Transformer: Roadway Assets Recognition. arXiv 2024, arXiv:2406.10712. [Google Scholar]
Elhenawy, M.; Abdelhay, A.; Alhadidi, T.I.; Ashqar, H.I.; Jaradat, S.; Jaber, A.; Glaser, S.; Rakotonirainy, A. Eyeballing Combinatorial Problems: A Case Study of Using Multimodal Large Language Models to Solve Traveling Salesman Problems. In Proceedings of the Intelligent Systems, Blockchain, and Communication Technologies (ISBCom 2024), Sharm El-Sheikh, Egypt, 10–11 May 2025; Springer: Cham, Switzerland, 2025; pp. 341–355. [Google Scholar] [CrossRef]
Gilardi, F.; Alizadeh, M.; Kubli, M. Chatgpt Outperforms Crowd-Workers for Text-Annotation Tasks. arXiv 2023, arXiv:2303.15056. [Google Scholar]
Yin, P.; Neubig, G.; Yih, W.; Riedel, S. TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data. arXiv 2020, arXiv:2005.08314. [Google Scholar]
Li, Y.; Li, J.; Suhara, Y.; Doan, A.; Tan, W.-C. Deep Entity Matching with Pre-Trained Language Models. arXiv 2020, arXiv:2004.00584. [Google Scholar]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-Train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
Liu, S.; Wang, X.; Hou, Y.; Li, G.; Wang, H.; Xu, H.; Xiang, Y.; Tang, B. Multimodal Data Matters: Language Model Pre-Training over Structured and Unstructured Electronic Health Records. IEEE J. Biomed. Health Inf. 2023, 27, 504–514. [Google Scholar] [CrossRef]
Jaotombo, F.; Adorni, L.; Ghattas, B.; Boyer, L. Finding the Best Trade-off between Performance and Interpretability in Predicting Hospital Length of Stay Using Structured and Unstructured Data. PLoS ONE 2023, 18, e0289795. [Google Scholar] [CrossRef]
Oliaee, A.H.; Das, S.; Liu, J.; Rahman, M.A. Using Bidirectional Encoder Representations from Transformers (BERT) to Classify Traffic Crash Severity Types. Nat. Lang. Process. J. 2023, 3, 100007. [Google Scholar] [CrossRef]
Varotto, G.; Susi, G.; Tassi, L.; Gozzo, F.; Franceschetti, S.; Panzica, F. Comparison of Resampling Techniques for Imbalanced Datasets in Machine Learning: Application to Epileptogenic Zone Localization from Interictal Intracranial EEG Recordings in Patients with Focal Epilepsy. Front. Neuroinform. 2021, 15, 715421. [Google Scholar]
Ambati, N.S.R.; Singara, S.H.; Konjeti, S.S.; Selvi, C. Performance Enhancement of Machine Learning Algorithms on Heart Stroke Prediction Application Using Sampling and Feature Selection Techniques. In Proceedings of the 2022 International Conference on Augmented Intelligence and Sustainable Systems (ICAISS), Trichy, India, 24–26 November 2022; pp. 488–495. [Google Scholar]
Gong, L.; Jiang, S.; Bo, L.; Jiang, L.; Qian, J. A Novel Class-Imbalance Learning Approach for Both within-Project and Cross-Project Defect Prediction. IEEE Trans. Reliab. 2019, 69, 40–54. [Google Scholar]
Ekambaram, V.; Jati, A.; Nguyen, N.H.; Dayama, P.; Reddy, C.; Gifford, W.M.; Kalagnanam, J. TTMs: Fast Multi-Level Tiny Time Mixers for Improved Zero-Shot and Few-Shot Forecasting of Multivariate Time Series. arXiv 2024, arXiv:2401.03955. [Google Scholar]
Osborne, J.W. Best Practices in Data Cleaning: A Complete Guide to Everything You Need to Do Before and After Collecting Your Data; Sage Publications: New York, NY, USA, 2012; ISBN 1452289670. [Google Scholar]
Zhu, Q.; Chen, X.; Jin, Q.; Hou, B.; Mathai, T.S.; Mukherjee, P.; Gao, X.; Summers, R.M.; Lu, Z. Leveraging Professional Radiologists’ Expertise to Enhance LLMs’ Evaluation for Radiology Reports. arXiv 2024, arXiv:2401.16578. [Google Scholar]
Shi, D.; Chen, X.; Zhang, W.; Xu, P.; Zhao, Z.; Zheng, Y.; He, M. FFA-GPT: An Interactive Visual Question Answering System for Fundus Fluorescein Angiography. Res. Sq. 2023. [Google Scholar] [CrossRef]
Rezapour, M.M.M.; Khaled, K. Utilizing Crash and Violation Data to Assess Unsafe Driving Actions. J. Sustain. Dev. Transp. Logist. 2017, 2, 35–46. [Google Scholar]
Moomen, M.; Rezapour, M.; Ksaibati, K. An Analysis of Factors Influencing Driver Action on Downgrade Crashes Using the Mixed Logit Analysis. J. Transp. Saf. Secur. 2022, 14, 2111–2136. [Google Scholar] [CrossRef]
Sirrianni, J.; Sezgin, E.; Claman, D.; Linwood, S.L. Medical Text Prediction and Suggestion Using Generative Pretrained Transformer Models with Dental Medical Notes. Methods Inf. Med. 2022, 61, 195–200. [Google Scholar] [CrossRef] [PubMed]
Lajkó, M.; Horváth, D.; Csuvik, V.; Vidács, L. Fine-Tuning Gpt-2 to Patch Programs, Is It Worth It? In Proceedings of the International Conference on Computational Science and Its Applications, Malaga, Spain, 4–7 July 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 79–91. [Google Scholar]
Nguyen-Mau, T.; Le, A.-C.; Pham, D.-H.; Huynh, V.-N. An Information Fusion Based Approach to Context-Based Fine-Tuning of GPT Models. Inf. Fusion. 2024, 104, 102202. [Google Scholar] [CrossRef]
Yin, C.; Du, K.; Nong, Q.; Zhang, H.; Yang, L.; Yan, B.; Huang, X.; Wang, X.; Zhang, X. PowerPulse: Power Energy Chat Model with LLaMA Model Fine-tuned on Chinese and Power Sector Domain Knowledge. Expert. Syst. 2024, 41, e13513. [Google Scholar] [CrossRef]
He, K.; Pu, N.; Lao, M.; Lew, M.S. Few-Shot and Meta-Learning Methods for Image Understanding: A Survey. Int. J. Multimed. Inf. Retr. 2023, 12, 14. [Google Scholar] [CrossRef]
Park, G.; Hwang, S.; Lee, H. Low-Resource Cross-Lingual Summarization through Few-Shot Learning with Large Language Models. arXiv 2024, arXiv:2406.04630. [Google Scholar]
Ono, D.; Dickson, D.W.; Koga, S. Evaluating the Efficacy of Few-shot Learning for GPT-4Vision in Neurodegenerative Disease Histopathology: A Comparative Analysis with Convolutional Neural Network Model. Neuropathol. Appl. Neurobiol. 2024, 50, e12997. [Google Scholar] [CrossRef]
Loukas, L.; Stogiannidis, I.; Malakasiotis, P.; Vassos, S. Breaking the Bank with ChatGPT: Few-Shot Text Classification for Finance. arXiv 2023, arXiv:2308.14634. [Google Scholar]
Pornprasit, C.; Tantithamthavorn, C. Fine-Tuning and Prompt Engineering for Large Language Models-Based Code Review Automation. Inf. Softw. Technol. 2024, 175, 107523. [Google Scholar] [CrossRef]
Ashqar, H.I.; Alhadidi, T.I.; Elhenawy, M.; Khanfar, N.O. The Use of Multimodal Large Language Models to Detect Objects from Thermal Images: Transportation Applications. arXiv 2024, arXiv:2406.13898. [Google Scholar]
Rouzegar, H.; Makrehchi, M. Generative AI for Enhancing Active Learning in Education: A Comparative Study of GPT-3.5 and GPT-4 in Crafting Customized Test Questions. arXiv 2024, arXiv:2406.13903. [Google Scholar]

Figure 1. The proposed MDF framework for combining tabular and textual data for traffic crashes.

Figure 2. Tabular data serialized into textual data.

Figure 3. Label generation prompt framework.

Figure 4. Confusion matrix for driver fault prediction using fused data.

Figure 6. Tabular data word cloud.

Figure 7. Narrative data word cloud.

Figure 8. Fused data word cloud.

Figure 9. F1-score comparison for fatal, non-fatal, at-fault, and not-at-fault predictions.

Figure 10. Jaccard score comparison for driver actions and crash factors.

Figure 11. Crash factors when the driver fault is (a) yes and (b) no.

Figure 12. Driver actions when the driver fault is (a) yes and (b) no.

Figure 13. Crash factors when the severity is (a) fatal and (b) non-fatal.

Figure 14. Driver actions when the severity is (a) fatal and (b) non-fatal.

Table 1. Comparison of related studies and the proposed MDF framework for traffic safety analysis.

Study	Data Type	Model/Method	Learning Type	Labels/Tasks	Evaluation Metrics	Key Contributions	Limitation Compared to the Study
Arteaga et al. [12]	Narrative only	Gradient Boosting + SHAP	Supervised ML	Injury Severity	Accuracy, AUC, SHAP	Used interpretable ML to predict severity from crash text	No multimodal fusion; limited to severity prediction from text only
Adanu et al. [42]	Tabular	Random Forest + TF-IDF	Supervised ML	Precrash Actions, Injury Severity	Marginal Effects, Log-Likelihood, AIC, Pseudo R²	Introduced behavioral pathway analysis linking precrash actions to crash severity using path modeling	No narrative data used; no multimodal or LLM-based modeling; focused on indirect effects, not prediction
Dingus et al. [41]	Narrative only	Manual Analysis	Descriptive	Crash Causality	None	Provided insight into human error, impairment, distraction	No automation or computational modeling
Liu et al. [69]	Structured + unstructured (EHR)	MedM-PLM (Pre-Trained Multimodal Language Model with Cross-Modal Transformer Module)	Self-supervised pretraining + fine-tuned supervised learning	Medication Recommendation, 30-Day Readmission, ICD Coding	Accuracy, F1, AUC (per task)	Novel cross-modal fusion of clinical notes and codes; pretraining over both modalities captures richer patient representations	Focused on general clinical decision support; does not address LOS or crash-specific applications directly
Jaotombo et al. [70]	Tabular + clinical notes	Ensemble ML (AutoGluon), BioClinicalBERT, Fusion via LDA Topic Modeling)	Supervised ML	Prolonged Hospital Length of Stay (binary classification)	ROC AUC, PRC AUC, F1, Accuracy	Demonstrated that combining structured and unstructured data significantly boosts both performance (ROC AUC) and interpretability	Domain-specific (ICU/healthcare); does not use few-shot or generative LLMs; limited to discharge notes only
Ye et al. [27]	Tabular + narrative	Hybrid Fusion (BERT, RoBERTa, ClinicalBERT)	Supervised ML	Injury Diagnosis	Accuracy, F1-Score (Macro and Weighted), Top-k Accuracy	Proposed a hybrid fusion method integrating structured EHR and narrative clinical notes for enhanced injury prediction	Domain is healthcare; no use of few-/zero-shot LLMs or generative approaches
Oliaee et al. [71]	Unstructured crash narrative data (750,000+ reports from Louisiana, 2011–2016)	Fine-tuned BERT (transformer-based language model) for multi-label classification	Supervised learning	Multi-class classification of crash severity (5 levels: KABCO scale)	Accuracy, Weighted F1, AUROC, Macro F1	First large-scale application of a fine-tuned BERT model on a statewide crash narrative dataset for severity classification; demonstrated strong predictive capability and scalability	Uses only unstructured data (narratives); does not integrate structured crash data; lacks multimodal fusion or cross-modal interaction modeling present in this study
This study (2025)	Fused tabular + narrative	GPT-2, GPT-3.5, GPT-4.5	Few-shot, zero-shot, fine-tuning	Severity, driver fault, driver actions, crash factors	Accuracy, F1-Score, Jaccard Index	First to fuse structured and unstructured crash data for LLMs using an MDF framework; enables scalable multi-label classification via few-shot and zero-shot learning across diverse traffic safety tasks	Most previous studies in traffic safety rely on a single data source without data fusion and typically use fine-tuning without exploring zero-shot or few-shot learning. While data fusion has been considered in other domains, few have compared zero-shot, few-shot, and fine-tuning approaches across prediction and information retrieval tasks

Table 2. Dataset class distribution.

Class	Severity
	Fatal	Non-Fatal	Total
Training set	357	299	571
Testing set	204	177	381
Total	476	476	952

Table 3. Few-shot learning prompt template to generate new labels.

Role	Instruction
System Prompt	Your role is to categorize accident details into specific labels. For each accident description provided, you need to determine (1) if the driver was at fault (yes or no), (2) what actions the driver took that contributed to the accident (such as speeding, ran off road), and (3) what external factors were involved (such as collision, tree, animal, overcorrection, embankment). Please format your response in JSON, following this structure exactly:{\“driver_fault\”: \“yes/no\”,\“driver_action\”: [\“action1\”, \“action2\”,…], \“factors\”: [\“factor1\”, \“factor2\”, …]\n}NOTE: Ensure your response strictly follows this JSON template without including any additional text.
User Input	Your role is to categorize accident details into specific labels. For each accident description provided, you need to determine (1) if the driver was at fault (yes or no), (2) what actions the driver took that contributed to the accident (such as speeding, ran off the road), and (3) what external factors were involved (e.g., collision, tree, animal, overcorrection, embankment). Please format your response in JSON, following this structure exactly: {“driver_fault”: “yes/no”, “driver_action”: [“action1”, “action2”, …], “factors”: [“factor1”, “factor2”, …]}
Assistance Output	{“driver_fault”: “yes”, “driver_action”: [“aggressive driving”, “went off-road”], “factors”: [“collision with fixed object”, “daylight”, “sliding”, “struck bluff”, “struck tree”]}

Table 4. Distribution of driver at fault label.

Driver at Fault	Yes	No	Total
Training set	554	17	571
Testing set	337	44	381
Total	891	61	952

Table 5. Newly created datasets.

Consolidated Crash Data (Tabular + Narrative)	Severity	Driver Fault	Driver Actions	Crash Factors
Narrative 1: The accident was reported by Missouri State Highway Patrol-Troop A, county JACKSON in city LEES SUMMIT. The driver was at the scene after the accident. The accident occurred before the intersection on street IS 470, ERM EAST IS 470 MILE 100 in East direction. The vehicle collided with Fixed Object. The vehicle went off road. The light conditions were Daylight. The driver was driving aggressively. The accident happened on day Sun. The narrative by the reporters is as follows: CRASH OCCURRED AS VEH1 TRAVELED OFF THE ROADWAY, STRUCK A SIGN AND A FENCE. ASSISTED BY CPL C S KUTZNER /462/.	0	Yes	[‘aggressive driving’, ‘went off-road’]	[‘collision with fixed object’, ‘daylight’, ‘struck sign’, ‘struck fence’]
Narrative 2: The accident was reported by Missouri State Highway Patrol-Troop C, county LINCOLN in city MOSCOW MILLS. The driver was at the scene after the accident. The accident occurred before the intersection on street US 61, PVT ANDERSON ROAD in South direction. The vehicle collided with Pedestrian. The light conditions were Dark-Unlighted. The pedestrian had done drugs. The vehicle was a commercial motor vehicle. The accident happened on day Thu. The narrative by the reporters is as follows: FATALITY REPORT- VEHICLE 1 WAS SOUTHBOUND ON HIGHWAY 61. PEDESTRIAN 1 WAS IN LANE 1. VEHICLE 1 STRUCK THE PEDESTRIAN. PEDESTRIAN 1 WAS PRONOUNCED ON SCENE AT 2251 HOURS ON 8/29/2019 BY LINCOLN COUNTY CORONER KELLY WALTERS. ASSISTED BY MSGT. J. L. DECKER (240), CPL. A. H. MICHAJLICZENKO (769), TPR. W. H. ABEL (132), AND TPR. J. L. HUGHES (879).	1	No	[]	[‘collision with pedestrian’, ‘dark-unlighted’, ‘pedestrian under influence of drugs’]

Table 6. GPT-2 model final hyperparameters.

Hyperparameter	Value
Batch_size: Number of training examples processed in one iteration	16
Num_train_epochs: Number of times the entire training dataset is processed	40
Save_steps: Frequency of saving the model’s parameters during training	50,000
Block_size: Size of text blocks during training, managing memory	256
Do_sample: Binary flag for using sampling during text generation	True
Max_length: Maximum length of the generated output	600
Top_k: Considers the top-k most likely next words during text generation	1
Top_p: Influences text diversity by setting a probability threshold for word selection	1

Table 7. Few-shot prompt engineering.

Role	Instruction
System	Given the details of the crash answer the following questions. Return the response in the mentioned JSON format mentioned after the questions. Is it fatal or not fatal? (fatality) Ideal response: If it is fatal then return ‘1’ if it is not fatal then return ‘0’ Is it the driver’s fault? (driver_fault) Ideal response: If it is the driver’s fault then return ‘yes’ otherwise ‘no’ What were the driver’s actions? (driver_actions) Ideal response: What were the driver’s actions that caused the crash? e.g., overspeeding, drinking and driving, drugs, overturning etc. Return a list of the driver’s actions in a JSON list. What were the factors that caused the crash? (factors) Ideal response: What were the factors that caused the crash? e.g., struck a tree, struck a pedestrian, embankment, dark light conditions, etc. Return a list of the factors in a JSON list. JSON TEMPLATE {\n\“fatality\” : <integer>, “driver_fault\” : <string>, \“driver_actions\” : <list>, \“factors\” : <list>\n} NOTE: Strictly follow the template as mentioned above and restrain from adding any preambles or filler text.
User	CRASH DETAILS: The crash was reported by Missouri State Highway Patrol-Troop C, county JEFFERSON. The driver was at the scene after the crash. The crash occurred after the intersection on street MO 21, ERM NORTH MO 21 MILE 1780 in North direction. The vehicle collided with Fixed Object. The vehicle went off road. The light conditions were Dark-Unlighted The driver’s age was over 64. The driver was driving aggressively. The vehicle was a truck. The vehicle was a pickup vehicle. The crash happened on day Wed. The narrative by the reporters is as follows: VEHICLE 1 WAS TRAVELING NORTHBOUND ON MO-21. DRIVER 1 CHANGED LANES AND LOST CONTROL OF VEHICLE 1 ON THE WET PAVEMENT. DRIVER 1 ATTEMPTED TO RECOVER VEHICLE 1 AND VEHICLE 1 TRAVELED OFF THE LEFT SIDE OF THE ROADWAY AND STRUCK A ROCK BLUFF. VEHICLE 1 CONTINUED SPINNING, EJECTING DRIVER 1 INTO THE ROADWAY. DRIVER 1 PRONOUNCED AT 0644 HOURS ON 10302019 BY EMS PERSONNEL. ASSISTED BY MSGT C ARBUTHNOT (694).
Assistance	“ {\“fatality\” : 1, \“driver_fault\” : \“no\”, \“driver_actions\” : [], \“factors\” : [‘collision with pedestrian’, ‘dark-unlighted’, ‘pedestrian under influence of alcohol’]}

Table 8. Zero-shot prompt engineering.

Role

Instruction

System

Given the details of the crash answer the following questions. Return the response in the mentioned JSON format mentioned after the questions.
1. Is it fatal or not fatal? (fatality)
Ideal response: If it is fatal then return ‘1’ if it is not fatal then return ‘0’
2. Is it the driver’s fault? (driver_fault)
Ideal response: If it is the driver’s fault then return ‘yes’ otherwise ‘no’
3. What were the driver’s actions? (driver_actions)
Ideal response: What were the driver’s actions that caused the crash? e.g., overspeeding, drinking and driving, drugs, overturning etc. Return a list of the driver’s actions in a JSON list.
4. What were the factors that caused the crash? (factors)
Ideal response: What were the factors that caused the crash? e.g., struck a tree, struck a pedestrian, embankment, dark light conditions, etc.
Return a list of the factors in a JSON list.
JSON TEMPLATE
{\n\“fatality\”: <integer>,\n\“driver_fault\” : <string>,\n\“driver_actions\” : <list>,\n\“factors\” : <list>\n}
NOTE: Strictly follow the template as mentioned above and restrain from adding any preambles or filler text.

User

Narrative (Text)

Table 9. GPT-3.5 and GPT-4.5 zero-shot and few-shot hyperparameters.

Hyperparameter	Value
Temperature: Controls text randomness	0
Top_p: Influences text diversity by setting a probability threshold for word selection	1
Frequency_penalty: Adjusts the likelihood of selecting common words during generation	0
Presence_penalty: Influences the inclusion of specified tokens in the generated text	0

Table 10. Baseline model performance (GPT-2 fine-tuned) across labels for comparison with advanced models.

Label	Class	Accuracy	Precision	Recall	F1-Score	Jaccard Score
Severity	Non-Fatal	99%	98%	100%	99%	-
	Fatal		100%	98%	99%	-
Driver Fault	Not at Fault	88%	0%	0%	0%	-
	At Fault		88%	100%	94%	-
Driver Actions	-	-	-	-	-	72.2
Crash Factors	-	-	-	-	-	79.7

Table 11. Advanced models’ (GPT-3.5 and GPT-4.5) performance across labels.

Model	Label	Class	Accuracy	Precision	Recall	F1-Score	Jaccard Score
GPT-3.5 Zero-Shot	Severity	Non-Fatal	99%	98%	92%	95%	-
		Fatal		93%	99%	96%	-
GPT-3.5 Zero-Shot	Severity	Non-Fatal	96%	98%	93%	96%
		Fatal		94%	99%	96%
GPT-4.5 Zero-Shot	Severity	Non-Fatal	98%	98%	99%	98%
		Fatal		99%	98%	99%
GPT-4.5 Zero-Shot	Severity	Non-Fatal	99%	98%	100%	99%
		Fatal		100%	98%	99%
GPT-3.5 Zero-Shot	Driver Fault	Non-Fatal	78%	33%	89%	48%
		Fatal		98%	77%	86%
GPT-3.5 Zero-Shot	Driver Fault	Non-Fatal	96%	97%	64%	77%
		Fatal		95%	100%	98%
GPT-4.5 Zero-Shot	Driver Fault	Non-Fatal	96%	97%	64%	77%
		Fatal		95%	100%	98%
GPT-4.5 Zero-Shot	Driver Fault	Non-Fatal	98%	97%	86%	92%
		Fatal		98%	100%	99%
GPT-3.5 Zero-Shot	Driver Actions						51.5%
GPT-3.5 Few-Shot	Driver Actions						54.1%
GPT-4.5 Zero-Shot	Driver Actions						60.9%
GPT-4.5 Few-Shot	Driver Actions						73.1%
GPT-3.5 Zero-Shot	Crash Factors						56.4%
GPT-3.5 Few-Shot	Crash Factors						74.7%
GPT-4.5 Zero-Shot	Crash Factors						65.9%
GPT-4.5 Few-Shot	Crash Factors						82.9%

Table 12. Results of extracting new labels from the fusion of tabular and textual datasets.

Consolidated Crash Data (Tabular + Narrative)	Severity	Driver Fault	Driver Actions	Crash Factors
The crash was reported by Missouri State Highway Patrol-Troop E, county MADISON. The driver was at the scene after the crash. on street CRD 447, CRD 422 The crash happened due to Overturn. The light conditions were Daylight. The driver’s age was under 21. The driver’s age was under 25. The driver was driving aggressively. The vehicle was an all-terminal vehicle. The crash happened on day Thu. The narrative by the reporters is as follows: CRASH OCCURRED AS DRIVER 1 LOST CONTROL ON THE ROADWAY, OVERTURNED, AND TRAVELED OFF THE RIGHT SIDE OF THE ROADWAY.	0	yes	[‘aggressive driving’, ‘lost control’, ‘went off-road’]	[‘overturn’]
The crash was reported by Missouri State Highway Patrol-Troop A, county JACKSON in city LEES SUMMIT. The driver was at the scene after the crash. on street IS 470, ERM EAST IS 470 MILE 100 The vehicle collided with Fixed Object. The vehicle went off road. The light conditions were Daylight. The driver was driving aggressively. The crash happened on day Sun. The narrative by the reporters is as follows: CRASH OCCURRED AS VEH1 TRAVELED OFF THE ROADWAY, STRUCK A SIGN AND A FENCE. ASSISTED BY CPL C S KUTZNER /462/.	0	yes	[‘aggressive driving’, ‘went off-road’]	[‘collision with fixed object’, ‘struck sign’, ‘struck fence’]
The crash was reported by Missouri State Highway Patrol-Troop D, county BARRY. The driver was at the scene after the crash. on street RT C, CRD 1120 The vehicle struck another vehicle. The vehicle went off road. The light conditions were Daylight. The driver was driving aggressively. The vehicle was a truck. The vehicle was a pickup vehicle. The crash happened on day Mon. The narrative by the reporters is as follows: THIS IS TROOP D’S 110TH FATALITY CRASH FOR 2019. CRASH OCCURRED AS VEHICLE #1 LOST CONTROL OF VEHICLE ON THE ICE COVERED ROADWAY. VEHICLE #1 RAN OFF THE RIGHT SIDE OF ROADWAY, AND STRUCK A PEDESTRIAN THAT WAS STANDING NEAR VEHICLE #2, THAT HAD CRASH PRIOR TO THIS INCIDENT. THE PEDESTRIAN WAS PRONOUNCED BY BARRY COUNTY DEPUTY CORONER GARY SWEARINGEN, AT 15:10 HOURS ON SCENE.	1	yes	[‘aggressive driving’, ‘lost control’, ‘went off-road’]	[‘collision with another vehicle’, ‘icy road’, ‘struck pedestrian’]
The crash was reported by Missouri State Highway Patrol-Troop C, county ST. LOUIS. The driver was at the scene after the crash. on street IS 270, ERM SOUTH IS 270 MILE 156 The cause of crash was Other Non Collision. The light conditions were Daylight. The driver’s age was over 55. The driver’s age was over 64. The vehicle had a defect. The crash happened on day Fri. The narrative by the reporters is as follows: BOTH VEHICLES WERE TRAVELING SOUTHBOUND I-270, VEHICLE #1 TIRE BLEW OUT, DRIVER #1 LOST CONTROL OF VEHICLE #1 AND STRUCK VEHICLE #2.	0	no	[‘lost control due to tire blowout’]	[‘vehicle defect’, ‘collision with another vehicle’]
The crash was reported by Missouri State Highway Patrol-Troop C, county ST. LOUIS in city FENTON. The driver was at the scene after the crash. The crash occurred after the intersection on street MO 30, CO SUMMIT RD TO MO30W in the West direction. The vehicle collided with Pedestrian. The light conditions were Dark-Unlighted. The pedestrian was drinking and driving. The pedestrian had done drugs. The driver’s age was under 25. The crash happened on day Thu. The narrative by the reporters is as follows: PEDESTRIAN 1 WAS ATTEMPTING TO CROSS THE ROADWAY AND WAS STRUCK BY VEHICLE 1. PEDESTRIAN 1 WAS PRONOUNCED ON SCENE BY FENTON FIRE DISTRICT PERSONNEL AT 2331 HOURS. ASSISTED TPR. J. S. HUSKEY (1265).	1	no	[]	[‘collision with pedestrian’, ‘dark-unlighted’, ‘pedestrian under influence of alcohol and drugs’]

Table 13. Impact of data fusion on driver fault prediction.

Case	Consolidated Crash Data (Tabular + Narrative)	Ground Truth	Tabular Pred.	Narrative Pred.	Combined Pred.	Tabular Factors	Narrative Factors	Combined Factors
Example 1: Fault Change Due to Combined Data	The accident was reported by Missouri State Highway Patrol-Troop C, county JEFFERSON. The driver was at the scene after the accident. On street IS 55, ERM NORTH IS 55 MILE 1892 in the north direction. The vehicle collided with a fixed object. The vehicle went off-road. The light conditions were daylight. The driver’s age was over 55. The driver’s age was over 64. The accident happened on a Friday. The narrative by the reporters is as follows: VEHICLE 1 WAS TRAVELING NORTH ON INTERSTATE 55 IN LANE 3. Driver 1 swerved to the left to avoid a collision with an unknown vehicle that was changing lanes. Vehicle 1 traveled off the left side of the road and impacted the concrete median barrier with its front left side.	No	Yes	No	No	Collision with fixed object; daylight	Swerved to avoid collision; collision with concrete median barrier	Collision with fixed object; daylight; swerved to avoid collision; collision with concrete
Example 2: Fault Change Due to Combined Data	The accident was reported by Missouri State Highway Patrol-Troop A, county PETTIS. The driver was at the scene after the accident. On street RT H, BRIDGE W0535 in the north direction. The vehicle collided with a fixed object. The vehicle went off-road. The light conditions were daylight. The driver’s age was over 55. The accident happened on a Wednesday. The narrative by the reporters is as follows: CRASH OCCURRED AS DRIVER 1 SWERVED TO AVOID A DEER IN THE ROADWAY AND STRUCK A BRIDGE RAIL. Assisted by Trooper W. C. Grose /280/ and Pettis County Sheriff’s Department.	No	Yes	No	No	Collision with fixed object; daylight; animal in roadway	Swerved to avoid deer; struck bridge rail	Collision with fixed object; daylight; swerved to avoid deer; struck bridge rail
Example 3: Additional Factors in Combined Data	The accident was reported by Missouri State Highway Patrol-Troop H, county ANDREW. The driver was at the scene after the accident. On street IS 29, ERM SOUTH IS 29 MILE 512. The vehicle collided with a pedestrian. The light conditions were dark-unlighted. The pedestrian was drinking and driving. The pedestrian had done drugs. The driver’s age was over 55. The driver’s age was over 64. The vehicle was a commercial motor vehicle. The accident happened on a Thursday. The narrative by the reporters is as follows: FATALITY REPORT—NEXT OF KIN NOTIFIED—CRASH OCCURRED AS VEHICLE 1 WAS SOUTHBOUND IN THE DRIVING LANE OF I-29, AND PEDESTRIAN 1 WAS WALKING SOUTHBOUND IN THE DRIVING LANE OF I-29. Vehicle 1 struck pedestrian 1 with its front passenger fender. Vehicle 1 came to a controlled stop just south of the crash scene. Pedestrian 1 was pronounced deceased on 8/22/2019 at 2130 h by Andrew County Coroner Doug Johnson. Assisted by Chief CVO M.A. McCartney (W003), Sgt. H.A. Sears (1200), Cpl. R.P. Dudeck (516), Cpl. S.J. Cool (546), Tpr. R.A. Allee (465), Andrew County Sheriff’s Office, and Buchanan County Sheriff’s Office.	No	No	No	No	Collision with pedestrian; dark, unlit; pedestrian under influence of alcohol and drugs	Collision with pedestrian; pedestrian in driving lane	Collision with pedestrian; dark, unlit; pedestrian under influence of alcohol; collision with pedestrian; pedestrian in driving lane

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jaradat, S.; Elhenawy, M.; Nayak, R.; Paz, A.; Ashqar, H.I.; Glaser, S. Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models. AI 2025, 6, 72. https://doi.org/10.3390/ai6040072

AMA Style

Jaradat S, Elhenawy M, Nayak R, Paz A, Ashqar HI, Glaser S. Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models. AI. 2025; 6(4):72. https://doi.org/10.3390/ai6040072

Chicago/Turabian Style

Jaradat, Shadi, Mohammed Elhenawy, Richi Nayak, Alexander Paz, Huthaifa I. Ashqar, and Sebastien Glaser. 2025. "Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models" AI 6, no. 4: 72. https://doi.org/10.3390/ai6040072

APA Style

Jaradat, S., Elhenawy, M., Nayak, R., Paz, A., Ashqar, H. I., & Glaser, S. (2025). Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models. AI, 6(4), 72. https://doi.org/10.3390/ai6040072

Article Menu

Multimodal Data Fusion for Tabular and Textual Data: Zero-Shot, Few-Shot, and Fine-Tuning of Generative Pre-Trained Transformer Models

Abstract

1. Introduction

2. Related Work

2.1. Tabular Data in Crash Analysis

2.2. Textual Data and NLP in Crash Analysis

2.3. Multimodal Data Fusion: Bridging Structured and Unstructured Data

3. Methodology

3.1. Dataset

3.2. Multimodal Data Fusion (MDF) Framework

3.3. Data Preparation and Serialization (Tabular-to-Test Conversion)

3.4. Label Generation via GPT Models

3.5. Dataset Creation

3.6. Modeling

3.6.1. Fine-Tuning (FT)

3.6.2. Prompt Engineering for Few-Shot and Zero-Shot Learning

3.7. Evaluation Metrics

3.8. Experiment Setup

3.9. Experiment 1: Comparative Analysis of Narrative, Tabular, and Fused Data Performance

3.9.1. Experimental Setup

3.9.2. Model Evaluation and Results

3.10. Experiment 2: Multimodal Data Framework (MDF) Modeling and Evaluation

3.10.1. Fine-Tuning with GPT-2

3.10.2. Prompt Engineering Using Few-Shot and Zero-Shot Learning (GPT-3.5 and GPT-4.5)

4. Analysis and Results

4.1. Model Performance Evaluation: Baseline vs. Advanced Models

4.1.1. Baseline Model: GPT-2 Fine-Tuned (FT)

4.1.2. Advanced Models Through Prompt Engineering: GPT-3.5 and GPT-4.5 (Few-Shot and Zero-Shot)

4.1.3. Summary of Baseline vs. Advanced Models

4.2. Enhancing Traffic Crash Prediction and Insights via MDF

Insights from Tabular and Narrative Data Fusion

4.3. Case Studies: Enhancing Crash Prediction and Factor Extraction with Multimodal Data

5. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI