A Hybrid Human-AI Model for Enhanced Automated Vulnerability Scoring in Modern Vehicle Sensor Systems
Abstract
1. Introduction
- Development of a sensor-centric vulnerability taxonomy tailored to autonomous vehicles, identifying and categorizing over 15 unique attack types. The taxonomy highlights LiDAR and radar systems as the most frequently targeted components, due to their critical role in autonomous perception and control.
- Application of generative AI models for context-aware vulnerability scoring to evaluate and utilize state-of-the-art generative AI platforms (e.g., ChatGPT, Gemini, Copilot, and DeepSeek) to enhance automated scoring by incorporating contextual factors and dynamic threat indicators often missed by static methods.
- Development of a hybrid vulnerability scoring model to integrate expert-driven knowledge with generative AI outputs to adapt and extend CVSS v4.0 for the specific context of autonomous vehicle sensor systems.
2. Background and Related Work
2.1. Background
2.1.1. Cybersecurity Vulnerabilities in Autonomous and Connected Vehicular Systems
2.1.2. Common Vulnerability Scoring System
- The Exploitability Sub-Score (ESS): reflects the ease with which a vulnerability can be exploited. It is determined by four metrics—Attack Vector (AV), Attack Complexity (AC), Privileges Required (PR), User Interaction (UI), and Attack Requirements (AT). Each has specific values that contribute to the overall score. The formula for calculating the ESS is illustrated in Equation (1) (all equations, as well as the values specified below, are extracted from [14]):
- –
- Attack Vector (AV): This reflects the relative ease with which an attacker can access the vulnerable component, effectively indicating the remoteness of the attack. It comprises four standardized values, Network (N), Adjacent (A), Local (L), and Physical (P), each associated with a specific numerical weight that contributes to the Exploitability Sub-Score (ESS). A Network (N) vector, valued at 1.0, implies that the vulnerability can be exploited remotely over one or more network hops, such as in a denial-of-service attack triggered via a crafted TCP packet. An Adjacent (A) vector, with a value of 0.77, indicates the attacker must be in a logically adjacent topology, such as the same subnet, as in an ARP flooding attack. A Local (L) vector, valued at 0.62, requires the attacker to have local access or to rely on user interaction—such as social engineering—to trick a user into executing a malicious file. A Physical (P) vector, with a score of 0.29, represents scenarios where the attacker must physically interact with the device, as in a cold boot attack aimed at extracting encryption keys. This metric highlights that the broader the potential attack surface, the greater the number of potential attackers and the higher the severity score.
- –
- Attack Complexity (AC): This metric evaluates the level of difficulty an attacker faces in successfully exploiting a vulnerability, considering any conditions beyond the attacker’s control. It has two possible values: Low (L) and High (H). A Low (L) complexity, assigned a score of 0.77, implies that the exploit is straightforward, repeatable, and does not depend on specific conditions or defenses in the target environment. Conversely, a High (H) complexity, valued at 0.44, indicates that the attacker must overcome additional barriers such as security mechanisms—e.g., Address Space Layout Randomization (ASLR) and Data Execution Prevention (DEP)—or must possess target-specific knowledge like cryptographic keys. Thus, a higher attack complexity reduces the likelihood of successful exploitation and leads to a lower severity score.
- –
- Privileges Required (PR): This metric quantifies the level of access privileges an attacker must possess prior to initiating an exploit. It includes three predefined values: None (N), Low (L), and High (H). A None (N) value, scored at 0.85, signifies that the attack can be conducted without any prior access to the vulnerable system. A Low (L) value, with a score of 0.62, indicates that basic user-level access is required. A High (H) value, rated at 0.27, denotes that the attacker must already have elevated privileges such as administrative or root-level access. The more privileged access required, the less severe the vulnerability is considered to be.
- –
- The User Interaction (UI): This metric assesses whether the successful exploitation of a vulnerability depends on any user action. It has two possible values: None (N) and Required (R). A None (N) value, scored at 0.85, indicates that no user involvement is needed—an attacker can exploit the vulnerability independently. In contrast, a Required (R) value, assigned a score of 0.62, means that a user must perform an action (e.g., opening a malicious file or clicking a crafted link) to enable the attack. The required user interaction reduces exploitability and lowers the overall score.
- –
- Attack Requirements (AT): This metric evaluates whether specific conditions or unusual configurations, beyond the attacker’s control, must be present on the vulnerable system for a successful exploit. It has two possible values: None (N) and Present (P). A None (N) value, scored at 1.0, indicates that the vulnerability is exploitable under typical, default, or common configurations, requiring no special environmental factors for a successful attack. In contrast, a Present (P) value, assigned a score of 0.90, means that successful exploitation depends on additional conditions, such as specific non-default configurations, unusual system states, or other prerequisites on the target. The presence of such requirements reduces the overall exploitability of the vulnerability and lowers the overall score.
- Impact sub-score (ISS): This quantifies the consequences of a successfully exploited vulnerability. It considers the effect on confidentiality, integrity, availability, safety, and the automated process of the affected systems. The formula for calculating the ISS is given in Equation (2) (all equations, as well as the values specified below, are extracted from [14]):
- –
- Confidentiality (C): This metric reflects the degree of information exposure resulting from a vulnerability. It has three possible values: High (H), Low (L), and None (N). A High (H) impact, valued at 0.56, indicates complete compromise of confidential information—e.g., all sensitive data is accessible. A Low (L) impact, scored at 0.22, represents the partial disclosure of non-critical or limited data. A None (N) value, with a score of 0.0, means there is no impact on confidentiality.
- –
- Integrity (I): This metric evaluates the degree to which data can be modified or destroyed by an attacker. The values mirror those of confidentiality. A High (H) impact, valued at 0.56, indicates total compromise—critical data can be arbitrarily altered. A Low (L) impact, scored at 0.22, refers to the partial modification of non-critical data. A None (N) value, rated at 0.0, implies no integrity violation.
- –
- Availability (A): This metric considers the extent to which the vulnerability impacts the availability of the affected component. A High (H) impact, with a score of 0.56, corresponds to a total service disruption or sustained denial of service. A Low (L) impact, scored at 0.22, implies intermittent or degraded availability. A None (N) value, scored at 0.0, means that the availability of the system remains unaffected.
- Base Score (BS): This represents the intrinsic severity of a vulnerability in CVSS v4.0. It is derived by summing the Exploitability Sub-Score (ESS) and the Impact Sub-Score (ISS), then applying a maximum threshold of 10. The result is rounded up to one decimal place as shown in the formula
2.1.3. Vehicle Safety System (In-Vehicle Network)
2.1.4. Sensor Technologies in Autonomous Vehicles
- LiDAR: Light Detection and Ranging (LiDAR) provides a 360-degree view using laser channels. It maps the surrounding environment in 3D using mechanical, semisolid-state, and solid-state systems [3].
- Radar: Millimeter-wave radar detects non-transparent materials and supports functions such as blind spot detection and parking assistance [3].
- GPS: The Global Positioning System enables geographical location tracking by communicating with satellites. Its openness makes it vulnerable to cyberattacks [4].
- Magnetic Encoders: These measure angular velocity using magnetoresistance or Hall ICs and are essential for the Anti-lock Braking System (ABS) and TPMS systems [4].
- TPMS: Tire Pressure Monitoring Systems include four sensors and an ECU. They transmit tire pressure data securely by filtering out unauthorized IDs [4].
- Camera: Cameras support object detection, traffic sign recognition, parking, collision avoidance, and night vision capabilities [4].
- Ultrasonic Sensors: These detect nearby obstacles and are primarily used for low-speed operations like parking [4].
2.1.5. Generative AI in Cybersecurity
- Threat Intelligence and Adaptive Threat Detection: Generative AI refines threat intelligence by efficiently filtering vast data to prioritize organization-specific risks, reducing noise, and learning from interactions to detect anomalies, enabling rapid adaptation and clear AI-generated risk summaries [30].
- Predictive and Vulnerability Analysis: GenAI predicts future cyber threats and identifies critical vulnerabilities by analyzing past attacks, enabling organizations to prioritize high-risk areas, proactively strengthen security, and reduce exposure to potential exploits [31].
- Malware Analysis and Biometric Security: Generative AI enables researchers to create synthetic data and realistic malware samples, safely studying threat behaviors and enhancing biometric security, ultimately strengthening cybersecurity measures through controlled experimentation and analysis [32].
- Development Assistance and Coding Security: Generative AI assists developers by providing real-time feedback, promoting secure coding practices, and flagging risks early. It learns from past examples to prevent errors and enhance software security throughout development [33].
- Alerts, Documentation, and Incident Response: Generative AI streamlines alert management by summarizing complex data, improving clarity and response time. It helps cybersecurity teams prioritize threats accurately and offers actionable recommendations for effective risk mitigation [34].
- Employee Training and Education: Generative AI creates interactive training modules to educate employees on cybersecurity and protocols, reducing human error and strengthening organizational defenses by promoting awareness and adherence to security best practices [35].
2.1.6. The Common Vulnerabilities and Exposures (CVE)
2.2. Related Work
3. Methodology
3.1. Phase 1: Systematic Literature Review
- Empirical Validation: The inclusion of experimental or simulation-based studies (e.g., LiDAR jamming tests [4]).
- Real-World Incidents: The documentation of actual cybersecurity breaches in autonomous vehicles (e.g., Tesla OBD malware injection [12]).
- Sensor-Specific Analysis: Focus on the cybersecurity of LiDAR, radar, and ultrasonic sensor systems.
3.2. Phase 2: CVSS v4.0 Scoring
- Exploitability of vulnerabilities within vehicular communication channels (e.g., CAN bus vulnerabilities).
- The potential impact on the automated driving system.
- Attack Vector (AV): What level of proximity is required to exploit the vulnerability?
- –
- Network (N): Exploitable remotely (e.g., via the internet);
- –
- Adjacent (A): Requires same local network (e.g., Bluetooth and Wi-Fi);
- –
- Local (L): Requires local access (e.g., USB port or internal interface);
- –
- Physical (P): Requires direct physical contact (e.g., hardware tampering).
- Attack Complexity (AC): How difficult is it to successfully execute the attack?
- –
- Low (L): Easy to exploit, no significant preconditions;
- –
- High (H): Requires advanced techniques or overcoming security mechanisms.
- Attack Requirements (AT): Are any specific conditions or configurations needed for the attack?
- –
- None (N): Works with standard/default settings;
- –
- Present (P): Requires specific or non-default settings (e.g., debug mode enabled).
- Privileges Required (PR): What level of system privileges must an attacker have to perform the attack?
- –
- None (N): No prior access required;
- –
- Low (L): Basic user-level access;
- –
- High (H): Administrative or elevated privileges.
- User Interaction (UI): Does the attack require any user action to succeed?
- –
- None (N): Fully automated, with no user input needed;
- –
- Required (R): Needs user engagement (e.g., clicking a malicious link).
- Confidentiality (C): To what extent does the attack compromise data confidentiality?
- –
- None (N): No data exposure;
- –
- Low (L): Limited or non-sensitive data exposure;
- –
- High (H): Full or critical data disclosure.
- Integrity (I): To what extent can the attack modify or corrupt data?
- –
- None (N): No data tampering;
- –
- Low (L): Minor or non-critical data changes;
- –
- High (H): Complete or critical data manipulation.
- Availability (A): What is the impact on system availability?
- –
- None (N): No effect on operations;
- –
- Low (L): Performance degradation or intermittent disruption;
- –
- High (H): Total or persistent service failure.
Type | Attack | Description | LiDAR | Radar | GPS | Magnetic Encoder | TPM | Camera | Ultrasonic Sensor | Reference |
---|---|---|---|---|---|---|---|---|---|---|
Blinding Attack | Emitting infrared light pulses matching the sensor’s wavelength, saturating its detectors and causing service denial | Yes | No | No | No | No | Yes | Yes | [4,12,42] | |
Jamming Attack | Emitting light at the same frequency as the LiDAR’s laser, directly interfering with the sensor’s laser signal. | Yes | Yes | Yes | No | No | No | Yes | [4,12,24,40,43] | |
DOS | Black-hole attacks | Drops data packets instead of forwarding them to their intended destination, creating a disruption where packets are unable to traverse the network to other vehicles. | Yes | Yes | Yes | No | No | No | No | [3,4,44,45,46] |
Timing attacks | Introduces delays in time-sensitive communications, disrupting vehicle coordination and safety-critical applications. | Yes | Yes | No | No | No | No | No | [3,43,45,47] | |
Disruptive Attack | Placing an electromagnetic actuator between the wheel speed sensors—exposed beneath the vehicle body—and the ABS tone wheel | No | No | No | Yes | No | No | No | [4,26,48] | |
Replay attacks | Intercepting and retransmitting or delaying valid data transmissions. | Yes | Yes | No | No | No | No | No | [3,4,11,26,41,48,49] | |
MitM | Relay Attack | Intercepts signals and forwards them to a remote receiver | Yes | Yes | No | No | No | No | No | [4,26,41,43] |
Eavesdropping Attack | Passive monitoring of sensor transmissions, posing a significant threat to location privacy | No | No | No | No | Yes | No | No | [4,11,12,24,45,48] | |
Sybil attacks | Creates multiple fake identities to manipulate a network. | Yes | Yes | No | No | No | No | No | [3,40,48,49] | |
Blind Spot Exploitation Attack | Exploits detect objects of minimal thickness within blind spot regions by placing a thin object in the vehicle’s blind spot. | No | No | No | No | No | No | Yes | [12,24,26,43] | |
Sensor Interference Attack | Positions ultrasonic sensors opposite a target vehicle’s sensors, causing signal interference | No | No | No | No | No | No | Yes | [12,24,26,43] | |
Spoofing | Acoustic Cancellation Attack | Transmitting an inverted-phase signal that neutralizes legitimate sensor signals | No | No | No | No | No | No | Yes | [4] |
Impersonation attacks | Mimicking identities, credentials, or communication patterns to trick victims | Yes | Yes | No | No | No | No | No | [3,11,40,44,48,49] | |
Falsified-information attack | Spreading misleading data to manipulate sensors | Yes | Yes | Yes | Yes | Yes | No | No | [4,11,15,40,44,49] | |
Cloaking Attack | Modify attack signatures or behaviors to avoid matching known threat patterns | No | No | No | No | No | No | Yes | [4,12,24] |
3.3. Phase 3: Assessing LLM Robustness in Extracting CVSS Attributes
4. Analysis and Severity Assessment of Cybersecurity Threats Targeting Sensor Systems in Modern Vehicles
4.1. Security Attacks
4.2. Attack Vector Analysis from Expert Surveys
4.3. Attack Vector Analysis from AI Engine Surveys
4.3.1. ChatGPT Survey Results
4.3.2. DeepSeek Survey Results
4.3.3. Copilot Survey Results
4.3.4. Gemini Survey Results
4.3.5. Comparative Analysis Between AI Agents
4.4. Comparison Between AI Agents and Expert Survey
5. Application of Generative AI Models for Context-Aware Vulnerability Scoring
5.1. Dataset Characteristics
5.2. Methodologies of CVSS Attribute Extraction
- ChatGPT: Rule-based NLP heuristics ChatGPT leveraged rule-based NLP heuristics to parse vulnerability descriptions. This approach involved applying predefined keyword-driven heuristics to infer CVSS attributes rather than accessing external CVE databases. The methodology aimed to mimic a human analyst’s logical deductions by recognizing phrases commonly associated with specific CVSS metrics. For instance, keywords like “remote” indicated a network attack vector, while “physical” or “local” suggested local or physical. Phrases such as “easily exploitable” implied low attack complexity, whereas “complex setup required” suggested high. Terms like “unauthenticated” indicated none for Privileges Required, while “admin” or “root” denoted high. User interaction was inferred from mentions like “user must open” or “social engineering”. References to broader system impact suggested a changed scope, while their absence implied unchanged. Impact metrics (confidentiality, integrity, and availability) were influenced by mentions of “data leakage”, “tampering”, and “denial of service”, respectively.
- DeepSeek: Structured question-based approach DeepSeek adopted a structured question-driven methodology, explicitly querying vulnerability descriptions against predefined conditions. Each CVSS attribute was determined through specific queries designed to guide the model in analyzing relevant phrases. For the attack vector, it identified remote attacks as network and local execution as local. Attack complexity was differentiated between low-complexity attacks (“trivial to exploit”) and high-complexity scenarios (“race condition”). The Privileges Required were categorized into none, low, or high based on authentication levels. User interaction involved evaluating whether explicit user action was required. Scope determined cross-system impact if stated. The impact metrics (confidentiality, integrity, and availability) assessed explicit mentions of data exposure, manipulation, or service disruption.
- Gemini: Systematic text translation using the worst-case principle, Gemini employed a systematic attribute extraction approach that relied strictly on the vulnerability description itself, avoiding assumptions beyond explicit statements. Its methodology incorporated exclusive reliance on provided text to prevent external data usage and ensure self-contained analysis. A key principle was the worst-case impact principle, where, if ambiguity existed, values were assigned based on the maximum potential risk. Attribute-by-attribute derivation iteratively evaluated each CVSS metric following strict definitional guidance.
5.3. Performance Evaluation of LLMs
5.3.1. ChatGPT Results
5.3.2. DeepSeek Results
5.3.3. Gemini Results
5.4. Comparative Analysis of LLM Results
5.4.1. Similarities
- Class Imbalance Impact: All three models consistently demonstrated superior performance on majority classes (e.g., NETWORK Attack Vector, LOW Attack Complexity, UNCHANGED Scope, and NONE User Interaction). Conversely, they all struggled significantly with minority classes (e.g., ADJACENT Attack Vector for ChatGPT, HIGH complexity across all models, CHANGED Scope, and REQUIRED User Interaction). This pattern is attributed to the inherent class imbalance in the dataset, where models tended to prioritize the more frequent categories.
- Difficulty with “HIGH” Class Predictions: A universal challenge for all three models was the accurate prediction of the “HIGH” class across various attributes, particularly Attack Complexity, Privileges Required, Confidentiality Impact, Integrity Impact, and Availability Impact. For instance, ChatGPT completely failed to predict HIGH complexity (0% precision, recall, and F1), and DeepSeek also showed 0% for this class. Gemini, while not zero, still had an abysmal F1 score of 15.38% for HIGH complexity. This suggests a fundamental difficulty in recognizing nuanced indicators for severe impact or effort levels, possibly due to limited training examples for these critical but rarer scenarios.
- Multi-Class Attribute Challenges: For attributes with three classes (Privileges Required, Confidentiality Impact, Integrity Impact, and Availability Impact), all models showed declining performance for the less frequent classes (LOW and HIGH) compared to the NONE class. This indicates general confusion or less robust discrimination between these closely related impact levels.
5.4.2. Differences
- Attack Vector Handling: ChatGPT treated ADJACENT as a separate class, leading to poor performance (28.57% precision, 40.00% recall) due to its low representation. DeepSeek and Gemini merged ADJACENT into the NETWORK class, which significantly boosted their NETWORK class performance (DeepSeek: 98.15% precision, 100.00% recall for NETWORK; Gemini: 97.14% precision, 98.08% recall for NETWORK). This methodological difference directly impacted reported metrics for the attack vector attribute.
- Overall Performance on Multi-Class Attributes: DeepSeek generally demonstrated stronger macro-average F1 scores for the three-class attributes (Privilege Required: 80.47%; Confidentiality: 74.42%; Integrity: 72.19%; Availability: 63.85%) compared to ChatGPT (Privilege Required: 57.44%; Confidentiality: 56.57%; Integrity: 56.33%; Availability: 49.69%) and Gemini (Privilege Required: 74.04%; Confidentiality: 76.31%; Integrity: 69.37%; Availability: 63.30%). DeepSeek’s structured question-based approach appears to yield more balanced performance across these classes, especially for “HIGH” values, compared to ChatGPT’s rule-based heuristics, which struggled significantly with minority classes. Gemini showed competitive performance with DeepSeek on Confidentiality and Availability but slightly lower on Privilege Required and Integrity.
- Recall for “REQUIRED” User Interaction: DeepSeek exhibited a slightly better recall for the “REQUIRED” user interaction class (44.44%) compared to ChatGPT (40.00%) and Gemini (43.75%), although all models still showed a “critical gap” in detecting these interactions due to imbalance.
- Approach to Ambiguity: Gemini’s “worst-case impact principle” is a distinct feature, aiming to assign values based on maximum potential risk in ambiguous cases. While the direct impact of this principle on specific metrics is not explicitly quantified as a separate variable in the provided data, it represents a notable difference in its underlying decision-making logic compared to the other models.
5.4.3. Discussion
6. Hybrid Vulnerability Scoring Model
6.1. Model Architectures
- Traditional ML Pipeline (TF-IDF + Logistic Regression)
- Text Representation: TF-IDF vectorization (max 3000 features)
- Classification: Logistic Regression with class weighting
- Class Imbalance Handling: Inverse frequency weighting
- Transformer-Based Model (BERT)
- Base Architecture: bert-base-uncased
- Fine-tuning: 3 epochs (batch size = 8)
- Sequence Length: 512 tokens
- Evaluation Metrics: Precision, Recall, F1 (per class)
6.2. Performance Comparison
- Universal Challenge: All evaluated approaches—including ChatGPT, DeepSeek, Gemini, and specialized classifiers (TF-IDF/Logistic Regression, BERT)—exhibited critical failures in detecting minority classes. The complete inability of BERT to identify HIGH-complexity vulnerabilities (0% recall) underscores that architectural sophistication alone cannot overcome inherent data distribution limitations.
- LLM Performance Hierarchy:
- Attack Complexity (HIGH) emerged as the most challenging attribute across LLMs (F1 ≤ 15.38%), justifying its selection for targeted enhancement.
- Increasing class granularity improved discrimination, with multi-class attributes (e.g., privilege required) yielding higher accuracy than binary classifications.
- Traditional ML Advantage: In low-data regimes, classical methods (TF-IDF + Logistic Regression) demonstrated greater robustness to imbalance compared to transformers, achieving non-zero detection (40.0% F1) for HIGH-complexity cases where fine-tuned BERT failed completely.
Model | Accuracy | Precision (LOW) | Precision (HIGH) | Recall (LOW) | Recall (HIGH) | Macro F1 | Processing Time |
---|---|---|---|---|---|---|---|
TF-IDF + Logistic Regression | 82.1% | 88.2% | 40.0% | 90.9% | 33.3% | 63.0% | 2 min |
BERT (Fine-tuned) | 84.6% | 84.6% | 0.0% | 100.0% | 0.0% | 45.8% | 30 min |
6.3. Key Observations
- Class imbalance sensitivity: Both models exhibited bias toward the majority class (LOW), mirroring the LLMs’ performance limitations. BERT completely failed to identify HIGH-complexity cases.
- Traditional vs. Transformer Tradeoffs:
- Logistic regression showed limited but non-zero capability for HIGH- complexity detection.
- BERT achieved superior accuracy but collapsed predictions to the majority class.
- Both underperformed compared to the best LLMs (e.g., Gemini achieved 15.38% F1 on HIGH).
- Data Requirements: The transformer model required substantially more data than available (193 samples), highlighting a key limitation for vulnerability classification tasks.
6.4. Proposed Model
6.5. Model Evaluation
- Attack Vector: The hybrid model correctly predicted 10 out of 15 values, yielding a 66.7% match with expert annotations. Errors were most common in distinguishing between Local, Adjacent, and Network, whereas Physical and Network vectors were more consistently identified. A summary of these results is provided in Table 13, which highlights the strengths of the model and the misclassifications patterns across different attack vectors.Table 13. Attack vector classification results: hybrid vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack Physical Physical Physical Network Physical Physical True Jamming Attack Network Network Adjacent Network Network Adjacent False Black Hole Attack Adjacent Network Network Network Network Network True Timing Attack Network Network Local Network Network Network True Disruptive Attack Network Network Physical Network Network Physical False Replay Attack Network Network Network Network Network Network True Relay Attack Physical Network Adjacent Network Network Adjacent False Eavesdropping Attack Network Network Adjacent Network Network Network True Sybil Attack Network Network Network Network Network Network True Blind Spot Exploitation Physical Physical Physical Local Physical Physical True Sensor Interference Attack Network Physical Physical Network Network Adjacent False Acoustic Attack Physical Physical Physical Local Physical Physical True Impersonation Attack Network Network Network Network Network Network True Falsified Information Attack Network Network Network Network Network Network True Cloaking Attack Network Network Network Network Network Local False - Attack Complexity: Predicted via machine learning, the model achieved 9 out of 15 correct predictions (60.0%). This result aligns closely with the model’s internal validation performance. The model exhibited better performance in recognizing High complexity than Low, particularly when contextual factors were less obvious.These findings are summarized in Table 14, which highlights the model’s accuracy and the influence of context on classification performance.Table 14. Attack complexity classification results: ML hybrid model vs. expert labels.
Attack Hybrid Prediction Expert Label Match Blinding Attack LOW Low True Jamming Attack LOW Low True Black Hole Attack HIGH Low False Timing Attack LOW High False Disruptive Attack HIGH Low False Replay Attack HIGH High True Relay Attack LOW High False Eavesdropping Attack LOW Low True Sybil Attack HIGH High True Blind Spot Exploitation HIGH Low False Sensor Interference Attack HIGH Low False Acoustic Attack HIGH High True Impersonation Attack HIGH High True Falsified Information Attack HIGH High True Cloaking Attack HIGH High True - Attack Requirements: Also reaching a 60.0% match (9/15), the hybrid model performed moderately in identifying whether specific conditions or requirements were necessary before an attack could be successfully executed. This metric appeared sensitive to implicit contextual cues, which may not be fully captured by LLMs. A detailed breakdown of these results is presented in Table 15, illustrating the model’s strengths and contextual limitations.Table 15. Attack requirements classification results: LLMs vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack None Present Present Present Present None False Jamming Attack None Present Present Present Present None False Black Hole Attack Present Present Present Present Present None False Timing Attack Present Present Present Present Present Present True Disruptive Attack None None None Present None None True Replay Attack None Present Present Present Present None False Relay Attack None Present Present Present Present Present True Eavesdropping Attack None Present Present None None None True Sybil Attack Present Present Present Present Present Present True Blind Spot Exploitation Present Present Present Present Present Present True Sensor Interference Attack Present Present Present Present Present None False Acoustic Attack Present Present Present Present Present None False Impersonation Attack None Present Present Present Present Present True Falsified Information Attack None Present Present Present Present Present True Cloaking Attack Present Present Present Present Present Present True - Privileges Required: This metric showed the weakest performance, with only 7 out of 15 correct predictions (46.7%). Confusion was prevalent between None, Low, and High privileges, indicating a need for deeper contextual comprehension or enriched training data. These results are summarized in Table 16, which details the model’s confusion across privilege levels.Table 16. Privileges Required classification results: LLMs vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack None None None None None None True Jamming Attack None None None None None Low False Black Hole Attack Low None None Low Low High False Timing Attack None Low High Low Low Low True Disruptive Attack None None None Low None None True Replay Attack None None None None None High False Relay Attack None None None None None High False Eavesdropping Attack None None None None None None True Sybil Attack None None Low Low None High False Blind Spot Exploitation None None None None None None True Sensor Interference Attack None None None None None None True Acoustic Attack None None None None None None True Impersonation Attack Low Low Low Low Low None False Falsified Information Attack None Low None Low Low High False Cloaking Attack High None None Low None High False - User Interaction: Achieving the highest accuracy (14/15, or 93.3%), the hybrid model was very effective at determining whether human intervention was necessary. However, it is worth noting that 14 of the 15 expert annotations were “None”, which may skew the perception of model generalizability. The corresponding results are provided in Table 17, highlighting the impact of class distribution on performance.Table 17. User interaction classification results: LLMs vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack None None None None None None True Jamming Attack None None None None None None True Black Hole Attack None None None None None None True Timing Attack None None None None None None True Disruptive Attack None None None None None None True Replay Attack None None None None None Required False Relay Attack None None None None None None True Eavesdropping Attack None None None None None None True Sybil Attack None None None None None None True Blind Spot Exploitation None None None None None None True Sensor Interference Attack None None None None None None True Acoustic Attack None None None None None None True Impersonation Attack None None None None None None True Falsified Information Attack None None None None None None True Cloaking Attack None None None None None None True - Confidentiality: The model accurately predicted confidentiality impacts in 12 out of 15 cases (80.0%). Misclassifications largely occurred between None and Low, while High confidentiality impacts were predicted reliably. These results are summarized in Table 18, which presents the model’s classification performance across confidentiality levels.Table 18. Confidentiality classification results: LLMs vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack None None High None None None True Jamming Attack None Low Low None None Low False Black Hole Attack None Low High None None High False Timing Attack High Low High High High High True Disruptive Attack None None Low None None None True Replay Attack None High High None High High True Relay Attack None High High Low High High True Eavesdropping Attack High High High High High High True Sybil Attack None High High High High High True Blind Spot Exploitation None None Low None None None True Sensor Interference Attack None None Low None None None True Acoustic Attack None None None Low None None True Impersonation Attack None High High High High High True Falsified Information Attack None High High High High None False Cloaking Attack None High High High High High True - Integrity: With 10 matches out of 15 (66.7%), the hybrid model showed strong performance in recognizing High integrity impacts. However, predictions involving None and Low were more error-prone, similar to the confidentiality results. The detailed results are presented in Table 19, highlighting the model’s effectiveness in identifying high-impact casesTable 19. Integrity classification results: LLMs vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack High High High High High Low False Jamming Attack None Low Low High Low High False Black Hole Attack Low High High High High High True Timing Attack None High High High High High True Disruptive Attack None Low Low High Low High False Replay Attack High High High High High High True Relay Attack High High High High High High True Eavesdropping Attack None None None None None High False Sybil Attack High High High High High High True Blind Spot Exploitation High High High High High None False Sensor Interference Attack High High High High High None False Acoustic Attack Low High None High High High True Impersonation Attack High High High High High High True Falsified Information Attack High High High High High High True Cloaking Attack High High High High High High True - Availability: This metric showed strong performance with 13 out of 15 matches (86.7%), and errors were limited to cases where the expert labeled the impact as Low or None. High availability impact cases were robustly captured. These results are detailed in Table 20, illustrating the model’s accuracy in high-severity scenarios.Table 20. Availability classification results: LLMs vs. expert labels.
Attack ChatGPT DeepSeek Copilot Gemini Hybrid Expert Match Blinding Attack Low High High High High High True Jamming Attack High High High High High High True Black Hole Attack High High High High High Low False Timing Attack None High Low High High Low False Disruptive Attack High High High High High High True Replay Attack None Low High High High High True Relay Attack None Low High High High None False Eavesdropping Attack None None None None None None True Sybil Attack High Low High High High High True Blind Spot Exploitation Low High High High High High True Sensor Interference Attack High High High High High High True Acoustic Attack High High High High High High True Impersonation Attack None Low High High High High True Falsified Information Attack High Low High High High High True Cloaking Attack None Low High High High High True - Discussion and Conclusion: As shown in Figure 8, and Table 21, the hybrid AI model demonstrates promising performance in automating CVSS scoring, particularly for impact-related metrics such as Availability, Confidentiality, and User Interaction. However, performance drops for more context-dependent attributes like Privileges Required, suggesting limitations in the LLM interpretability of implicit assumptions and access conditions. The analysis of metric-level prediction accuracy reveals key strengths and weaknesses in the current CVSS component prediction model. The model demonstrates strong performance in identifying User Interaction (93.3%), Availability (86.7%), and Confidentiality (80.0%), suggesting a solid understanding of direct impacts on system behavior and user involvement. However, noticeable gaps emerge in metrics such as Privileges Required (46.7%), Attack Complexity (60.0%), and Attack Requirements (60.0%), indicating challenges in assessing contextual and precondition-based factors that influence exploitability. These components likely require more nuanced data or refined logic in the prediction algorithm. While the overall average accuracy stands at a reasonable 70%, the variation across metrics highlights the need for targeted improvements—particularly in privilege assessment and attack path analysis—to enhance the consistency and reliability of automated CVSS scoring tools. The consistent performance across several metrics supports the feasibility of hybrid AI models in vulnerability triage. Yet, certain metrics—especially those with nuanced human interpretations—still necessitate expert validation. Future improvements could focus on prompt engineering, fine-tuned model training with contextualized examples, and the integration of more domain-specific rules.Figure 8. Accuracy by CVSS metric.Table 21. Summary of hybrid model prediction accuracy per CVSS base metric.
Metric Correct Predictions Total Cases Accuracy (%) Attack Vector 10 15 66.7% Attack Complexity 9 15 60.0% Attack Requirements 9 15 60.0% Privileges Required 7 15 46.7% User Interaction 14 15 93.3% Confidentiality 12 15 80.0% Integrity 10 15 66.7% Availability 13 15 86.7% Overall Average 84 120 70.0%
6.6. Limitations
7. Conclusions and Future Work
- Domain-Specific Fine-Tuning of AI Models: Current generative AI models lack training on automotive cybersecurity datasets. Future work should involve fine-tuning LLMs using domain-specific corpora, including CVEs, threat intelligence feeds, and technical whitepapers related to vehicular sensor systems.
- Integration of Real-Time Threat Intelligence: Enhancing the model with dynamic threat feeds and anomaly detection tools could improve responsiveness to emerging attack vectors and zero-day vulnerabilities.
- Multi-Modal Vulnerability Analysis: Future iterations should consider incorporating additional data modalities (e.g., sensor logs and CAN bus traffic) for deeper behavioral analysis and richer context in vulnerability assessment.
- Improved Scoring Mechanisms for Physical Attacks: Since physical-layer threats are currently underweighted by CVSS and misclassified by AI models, research should aim to augment or revise scoring criteria to better capture their practical implications in vehicular environments.
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- de Oliveira, V.F.; Matiolli, G.; Júnior, C.J.B.; Gaspar, R.; Lins, R.G. Digital twin and cyber-physical system integration in commercial vehicles: Latest concepts, challenges and opportunities. IEEE Trans. Intell. Veh. 2024, 9, 4804–4819. [Google Scholar] [CrossRef]
- Kennedy, J.; Holt, T.; Cheng, B. Automotive cybersecurity: Assessing a new platform for cybercrime and malicious hacking. J. Crime Justice 2019, 42, 632–645. [Google Scholar] [CrossRef]
- Raja, K.; Theerthagiri, S.; Swaminathan, S.V.; Suresh, S.; Raja, G. Harnessing Generative Modeling and Autoencoders Against Adversarial Threats in Autonomous Vehicles. IEEE Trans. Consum. Electron. 2024, 70, 6216–6223. [Google Scholar] [CrossRef]
- El-Rewini, Z.; Sadatsharan, K.; Sugunaraj, N.; Selvaraj, D.F.; Plathottam, S.J.; Ranganathan, P. Cybersecurity attacks in vehicular sensors. IEEE Sens. J. 2020, 20, 13752–13767. [Google Scholar] [CrossRef]
- Lautenbach, A.; Almgren, M.; Olovsson, T. Proposing HEAVENS 2.0—An Automotive Risk Assessment Model. In Proceedings of the 5th ACM Computer Science in Cars Symposium (CSCS ’21), Ingolstadt, Germany, 30 November 2021; pp. 1–6. [Google Scholar]
- Ward, D.; Wooderson, P. Automotive Cybersecurity: An Introduction to ISO/SAE 21434; SAE: Warrendale, PA, USA, 2021. [Google Scholar]
- Intel Corporation. Threat Analysis and Risk Assessment (TARA) Methodology; Intel White Papper; SAE International: Warrendale, PA, USA, 2015; Volume 1, pp. 1–20. [Google Scholar]
- Wang, Y.; Wang, Y.; Qin, H.; Ji, H.; Zhang, Y.; Wang, J. A Systematic Risk Assessment Framework for Automotive Cybersecurity. Automot. Innov. 2021, 4, 374–386. [Google Scholar] [CrossRef]
- Macher, G.; Schmittner, C.; Veledar, O.; Brenner, E. ISO/SAE DIS 21434 Automotive Cybersecurity Standard—In a Nutshell. In Computer Safety, Reliability, and Security, Proceedings of the SAFECOMP 2020 Workshops, Lisbon, Portugal, 15 September 2020; Casimiro, A., Ortmeier, F., Schoitsch, E., Bitsch, F., Ferreira, P., Eds.; Springer: Cham, Switzerland, 2020; pp. 123–135. [Google Scholar]
- Abouelnaga, M.; Jakobs, C. Security Risk Analysis Methodologies for Automotive Systems. arXiv 2023, arXiv:2307.02261. [Google Scholar]
- Sun, X.; Yu, F.R.; Zhang, P. A survey on cyber-security of connected and autonomous vehicles (CAVs). IEEE Trans. Intell. Transp. Syst. 2021, 23, 6240–6259. [Google Scholar] [CrossRef]
- Chowdhury, A.; Karmakar, G.; Kamruzzaman, J.; Jolfaei, A.; Das, R. Attacks on self-driving cars and their countermeasures: A survey. IEEE Access 2020, 8, 207308–207342. [Google Scholar] [CrossRef]
- Yassin, A.M.; Aslan, H.K.; Abdel Halim, I.T. Smart automotive diagnostic and performance analysis using blockchain technology. J. Sens. Actuator Netw. 2023, 12, 32. [Google Scholar] [CrossRef]
- Mell, P.; Scarfone, K.; Romanosky, S. Common vulnerability scoring system. IEEE Secur. Priv. 2006, 4, 85–89. [Google Scholar] [CrossRef]
- Beyrouti, M.; Lounis, A.; Lussier, B.; Bouabdallah, A.; Samhat, A.E. Vulnerability-oriented risk identification framework for IoT risk assessment. Internet Things 2024, 27, 101333. [Google Scholar] [CrossRef]
- Ur-Rehman, A.; Gondal, I.; Kamruzzaman, J.; Jolfaei, A. Vulnerability modelling for hybrid industrial control system networks. J. Grid Comput. 2020, 18, 863–878. [Google Scholar] [CrossRef]
- Kim, H.; Kim, D. Methodological Advancements in Standardizing Blockchain Assessment. IEEE Access 2024, 12, 35552–35570. [Google Scholar] [CrossRef]
- Figueroa-Lorenzo, S.; Añorga, J.; Arrizabalaga, S. A survey of IIoT protocols: A measure of vulnerability risk analysis based on CVSS. ACM Comput. Surv. (CSUR) 2020, 53, 44. [Google Scholar] [CrossRef]
- Zhang, Z.; Kumar, V.; Pfahringer, B.; Bifet, A. Ai-enabled automated common vulnerability scoring from common vulnerabilities and exposures descriptions. Int. J. Inf. Secur. 2025, 24, 16. [Google Scholar] [CrossRef]
- Kuhn, P.; Relke, D.N.; Reuter, C. Common vulnerability scoring system prediction based on open source intelligence information sources. Comput. Secur. 2023, 131, 286–298. [Google Scholar] [CrossRef]
- Hilario, E.; Azam, S.; Sundaram, J.; Imran Mohammed, K.; Shanmugam, B. Generative AI for pentesting: The good, the bad, the ugly. Int. J. Inf. Secur. 2024, 23, 2075–2097. [Google Scholar] [CrossRef]
- Mirtaheri, S.L.; Pugliese, A. Leveraging Generative AI to Enhance Automated Vulnerability Scoring. In Proceedings of the 2024 IEEE Conference on Dependable, Autonomic and Secure Computing (DASC), Boracay Island, Philippines, 5–8 November 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 57–64. [Google Scholar]
- Ferrara, E. Fairness and Bias in Artificial Intelligence: A Brief Survey of Sources, Impacts, and Mitigation Strategies. arXiv 2023, arXiv:2304.07683. [Google Scholar]
- Islam, T.; Sheakh, M.A.; Jui, A.N.; Sharif, O.; Hasan, M.Z. A review of cyber attacks on sensors and perception systems in autonomous vehicle. J. Econ. Technol. 2023, 1, 242–258. [Google Scholar] [CrossRef]
- Kaur, U.; Mahajan, A.N.; Kumar, S.; Dutta, K. Security Vulnerabilities in VANETs and SDN-based VANETS: A Study of Attacks. Int. J. Comput. Networks Appl. (IJCNA) 2024, 11, 774–802. [Google Scholar] [CrossRef]
- Mudhivarthi, B.R.; Thakur, P.; Singh, G. Aspects of cyber security in autonomous and connected vehicles. Appl. Sci. 2023, 13, 3014. [Google Scholar] [CrossRef]
- Gupta, M.; Akiri, C.; Aryal, K.; Parker, E.; Praharaj, L. From chatgpt to threatgpt: Impact of generative ai in cybersecurity and privacy. IEEE Access 2023, 11, 80218–80245. [Google Scholar] [CrossRef]
- Worrell, J.L. A Survey of the Current and Emerging Ransomware Threat Landscape. EDP Audit. Control. Secur. Newsl. 2024, 69, 1–11. [Google Scholar] [CrossRef]
- Khadka, K. Forecasting Risks, Challenges, and Innovations: A Cybersecurity Perspective. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
- Saddi, V.R.; Gopal, S.K.; Mohammed, A.S.; Dhanasekaran, S.; Naruka, M.S. Examine the role of generative AI in enhancing threat intelligence and cyber security measures. In Proceedings of the 2024 2nd International Conference on Disruptive Technologies (ICDT), Greater Noida, India, 15–16 March 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 537–542. [Google Scholar]
- Alauthman, M.; Almomani, A.; Aoudi, S.; Al-Qerem, A.; Aldweesh, A. Automated Vulnerability Discovery Generative AI in Offensive Security. In Examining Cybersecurity Risks Produced by Generative AI; IGI Global Scientific Publishing: Hershey, PA, USA, 2025; pp. 309–328. [Google Scholar]
- Sai, S.; Yashvardhan, U.; Chamola, V.; Sikdar, B. Generative ai for cyber security: Analyzing the potential of chatgpt, dall-e and other models for enhancing the security space. IEEE Access 2024, 12, 53497–53516. [Google Scholar] [CrossRef]
- Vadisetty, R.; Polamarasetti, A.; Prajapati, S.; Butani, J.B. Leveraging Generative AI for Automated Code Generation and Security Compliance in Cloud-Based DevOps Pipelines. SSRN Electron. J. 2023, 31, 1–11. [Google Scholar] [CrossRef]
- Grigorev, A.; Saleh, A.S.M.K.; Ou, Y. IncidentResponseGPT: Generating traffic incident response plans with generative artificial intelligence. arXiv 2024, arXiv:2404.18550. [Google Scholar]
- Kallonas, C.; Piki, A.; Stavrou, E. Empowering professionals: A generative AI approach to personalized cybersecurity learning. In Proceedings of the 2024 IEEE global engineering education conference (EDUCON), Kos Island, Greece, 8–11 May 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–10. [Google Scholar]
- MITRE Corporation. CVE Program. Available online: https://www.cve.org/ (accessed on 6 July 2025).
- National Institute of Standards and Technology (NIST). National Vulnerability Database (NVD). Available online: https://nvd.nist.gov/ (accessed on 6 July 2025).
- Biju, A.; Ramesh, V.; Madisetti, V.K. Security Vulnerability Analyses of Large Language Models (LLMs) through Extension of the Common Vulnerability Scoring System (CVSS) Framework. J. Softw. Eng. Appl. 2024, 17, 340–358. [Google Scholar] [CrossRef]
- Çaylı, O. AI-Enhanced Cybersecurity Vulnerability-Based Prevention, Defense, and Mitigation using Generative AI. Orclever Proc. Res. Dev. 2024, 5, 655–667. [Google Scholar] [CrossRef]
- Sharma, P.; Gillanders, J. Cybersecurity and forensics in connected autonomous vehicles: A review of the state-of-the-art. IEEE Access 2022, 10, 108979–108996. [Google Scholar] [CrossRef]
- Dong, C.; Chen, Y.; Wang, H.; Wang, L.; Li, Y.; Ni, D.; Zhao, D.; Hua, X. Evaluating impact of remote-access cyber-attack on lane changes for connected automated vehicles. Digit. Commun. Netw. 2024, 10, 1480–1492. [Google Scholar] [CrossRef]
- Raveling, A.; Qu, Y. Quantifying the Effects of Operational Technology or Industrial Control System based Cybersecurity Controls via CVSS Scoring. Eur. J. Electr. Eng. Comput. Sci. 2023, 7, 1–6. [Google Scholar] [CrossRef]
- Jakobsen, S.B.; Knudsen, K.S.; Andersen, B. Analysis of sensor attacks against autonomous vehicles. In Proceedings of the 8th International Conference on Internet of Things, Big Data and Security (IoTBDS), Prague, Czech Republic, 21–23 April 2023; SciTePress—Science and Technology Publications: Setúbal, Portugal, 2023; pp. 131–139. [Google Scholar] [CrossRef]
- Ansariyar, A. Investigating the Attacks and Defensive Mechanisms on Connected Vehicles (CVs). SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
- Gupta, S.; Maple, C.; Passerone, R. An investigation of cyber-attacks and security mechanisms for connected and autonomous vehicles. IEEE Access 2023, 11, 90641–90669. [Google Scholar] [CrossRef]
- Hossain, S.M.; Banik, S.; Banik, T.; Shibli, A.M. Survey on Security Attacks in Connected and Autonomous Vehicular Systems. In Proceedings of the 2023 IEEE International Conference on Computing (ICOCO), Langkawi, Malaysia, 9–12 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 295–300. [Google Scholar]
- Andah, M.E.O. A Survey on Cyber Security Issues of Autonomous Vehicles; Technical Report; Carleton University: Ottawa, ON, Canada, 2021. [Google Scholar]
- Jadoon, A.K.; Wang, L.; Li, T.; Zia, M.A. Lightweight cryptographic techniques for automotive cybersecurity. Wirel. Commun. Mob. Comput. 2018, 2018, 1640167. [Google Scholar] [CrossRef]
- Bagga, P.; Das, A.K.; Wazid, M.; Rodrigues, J.J.; Park, Y. Authentication protocols in internet of vehicles: Taxonomy, analysis, and challenges. IEEE Access 2020, 8, 54314–54344. [Google Scholar] [CrossRef]
Attack Type | AV | AC | AT | PR | UI | C | I | A | Avg. CVSS Score (Std. Dev) |
---|---|---|---|---|---|---|---|---|---|
Blinding Attack | P(50%) N(20%) A(20%) L(10%) | L(70%) H(30%) | N(80%) P(20%) | N(80%) L(10%) H(10%) | N(70%) R(30%) | N(50%) L(40%) H(10%) | N(20%) L(60%) H(20%) | N(10%) L(30%) H(60%) | 5.09 (1.5) |
Jamming Attack | P(30%) N(20%) A(50%) L(10%) | L(70%) H(30%) | N(70%) P(30%) | N(60%) L(30%) H(10%) | N(90%) R(10%) | N(90%) L(10%) H(0%) | N(30%) L(50%) H(20%) | N(0%) L(40%) H(60%) | 5.06 (1.2) |
Black Hole Attack | P(0%) N(90%) A(10%) L(0%) | L(80%) H(20%) | N(70%) P(30%) | N(20%) L(50%) H(30%) | N(60%) R(40%) | N(30%) L(50%) H(20%) | N(30%) L(20%) H(50%) | N(0%) L(40%) H(60%) | 6.0 (1.8) |
Timing Attack | P(10%) N(60%) A(10%) L(20%) | H(80%) L(20%) | P(70%) N(30%) | N(10%) L(60%) H(30%) | N(70%) R(30%) | N(10%) L(40%) H(50%) | N(10%) L(40%) H(50%) | N(10%) L(50%) H(40%) | 5.72(1.6) |
Disruptive Attack | P(70%) N(10%) A(10%) L(10%) | L(60%) H(40%) | N(60%) P(40%) | N(40%) L(30%) H(30%) | N(60%) R(40%) | N(50%) L(20%) H(30%) | N(30%) L(10%) H(60%) | N(0%) L(10%) H(90%) | 4.66 (1.9) |
Replay Attack | P(0%) N(70%) A(30%) L(0%) | L(20%) H(80%) | N(80%) P(20%) | N(20%) L(10%) H(70%) | N(30%) R(70%) | N(10%) L(10%) H(80%) | N(0%) L(10%) H(90%) | N(30%) L(0%) H(70%) | 6.35 (1.7) |
Relay Attack | P(0%) N(30%) A(60%) L(10%) | L(30%) H(70%) | N(20%) P(80%) | N(20%) L(30%) H(50%) | N(60%) R(40%) | N(0%) L(40%) H(60%) | N(20%) L(20%) H(60%) | N(60%) L(20%) H(20%) | 5.43 ( 1.4) |
Eavesdropping Attack | P(0%) N(60%) A(10%) L(30%) | L(90%) H(10%) | N(60%) P(40%) | N(50%) L(30%) H(20%) | N(80%) R(20%) | N(0%) L(20%) H(80%) | N(40%) L(10%) H(50%) | N(60%) L(0%) H(40%) | 7.19 ( 1.3) |
Sybil Attack | P(0%) N(70%) A(30%) L(0%) | L(10%) H(90%) | N(10%) P(90%) | N(40%) L(0%) H(60%) | N(60%) R(40%) | N(20%) L(20%) H(60%) | N(0%) L(20%) H(80%) | N(10%) L(20%) H(70%) | 6.76 (1.6) |
Blind Spot Exploitation | P(50%) N(10%) A(40%) L(0%) | L(70%) H(30%) | N(60%) P(40%) | N(70%) L(10%) H(20%) | N(80%) R(20%) | N(80%) L(10%) H(10%) | N(50%) L(20%) H(30%) | N(10%) L(40%) H(50%) | 4.4 (1.1) |
Acoustic Attack | P(20%) N(10%) A(70%) L(0%) | L(80%) H(20%) | N(70%) P(30%) | N(90%) L(10%) H(0%) | N(90%) R(10%) | N(70%) L(30%) H(0%) | N(20%) L(60%) H(20%) | N(20%) L(30%) H(50%) | 5.03 ( 1.5) |
Sensor Interference Attack | P(70%) N(0%) A(20%) L(10%) | L(20%) H(80%) | N(60%) P(40%) | N(60%) L(20%) H(20%) | N(80%) R(20%) | N(50%) L(30%) H(20%) | N(10%) L(30%) H(60%) | N(30%) L(30%) H(40%) | 4.06 ( 1.7) |
Impersonation Attack | P(10%) N(40%) A(30%) L(20%) | L(20%) H(80%) | N(10%) P(90%) | N(60%) L(10%) H(30%) | N(60%) R(40%) | N(20%) L(30%) H(50%) | N(10%) L(30%) H(60%) | N(30%) L(30%) H(40%) | 5.41 ( 1.4) |
Falsified Information Attack | P(10%) N(50%) A(40%) L(00%) | L(20%) H(80%) | N(30%) P(70%) | N(40%) L(10%) H(50%) | N(60%) R(40%) | N(50%) L(10%) H(40%) | N(10%) L(10%) H(80%) | N(20%) L(30%) H(50%) | 6.22 (1.8) |
Cloaking Attack | P(10%) N(30%) A(20%) L(40%) | L(10%) H(90%) | N(10%) P(90%) | N(20%) L(30%) H(50%) | N(80%) R(20%) | N(30%) L(0%) H(70%) | N(0%) L(10%) H(90%) | N(20%) L(30%) H(50%) | 6.66 (1.5) |
Attack Type | AV | AC | AT | PR | UI | C | I | A | Avg. CVSS Score |
---|---|---|---|---|---|---|---|---|---|
Blinding Attack | Physical | Low | None | None | None | None | High | Low | 5.2 |
Jamming Attack | Network | Low | None | None | None | None | None | High | 8.7 |
Black Hole | Adjacent | Low | Present | Low | None | None | Low | High | 5.9 |
Timing | Network | High | Present | None | None | High | None | None | 8.2 |
Disruptive | Network | Low | None | None | None | None | None | High | 8.7 |
Replay attacks | Network | Low | None | None | None | None | High | None | 8.7 |
Relay attack | Physical | Low | None | None | None | None | High | None | 5.1 |
Eavesdropping attacks | Network | Low | None | None | None | High | None | None | 8.7 |
Sybil attack | Network | Low | Present | None | None | None | High | High | 8.3 |
Blind Spot exploitation | Physical | Low | Present | None | None | None | High | Low | 4.3 |
Sensor Interference Attack | Network | Low | Present | None | None | None | High | High | 8.3 |
Acoustic attack | Physical | High | Present | None | None | None | Low | High | 4.3 |
Impersonation attack | Network | Low | None | Low | None | None | High | None | 7.1 |
Falsified Information attack | Network | Low | None | None | None | None | High | High | 8.8 |
Cloaking attacks | Network | High | Present | High | None | None | High | None | 5.9 |
Attack Type | AV | AC | AT | PR | UI | C | I | A | Avg. CVSS Score |
---|---|---|---|---|---|---|---|---|---|
Blinding Attack | Physical | Low | Present | None | None | None | High | High | 4.4 |
Jamming Attack | Network | Low | Present | None | None | Low | Low | High | 8.3 |
Black Hole Attack | Network | Low | Present | None | None | Low | High | High | 8.4 |
Timing Attack | Network | High | Present | Low | None | Low | High | High | 6.1 |
Disruptive Attack | Network | Low | None | None | None | None | Low | High | 8.8 |
Replay Attack | Network | Low | Present | None | None | High | High | Low | 9.2 |
Relay Attack | Network | Low | Present | None | None | High | High | Low | 9.2 |
Eavesdropping Attack | Network | Low | Present | None | None | High | None | None | 8.2 |
Sybil Attack | Network | High | Present | None | None | High | High | Low | 9.2 |
Blind Spot Exploitation | Physical | Low | Present | None | None | None | High | High | 4.4 |
Sensor Interference Attack | Physical | Low | Present | None | None | None | High | High | 4.4 |
Acoustic Attack | Physical | High | Present | None | None | None | High | High | 4.4 |
Impersonation Attack | Network | High | Present | Low | None | High | High | Low | 9.2 |
Falsified Information Attack | Network | High | Present | Low | None | High | High | Low | 7.6 |
Cloaking Attack | Network | High | Present | None | None | High | High | Low | 9.2 |
Attack Type | AV | AC | AT | PR | UI | C | I | A | Avg. CVSS Score |
---|---|---|---|---|---|---|---|---|---|
Blinding Attack | Physical | High | Present | None | None | High | High | High | 5.4 |
Jamming Attack | Adjacent | Low | Present | None | None | Low | Low | High | 6.1 |
Black Hole Attack | Network | Low | Present | None | None | High | High | High | 9.2 |
Timing Attack | Local | High | Present | High | None | High | High | Low | 7.1 |
Disruptive Attack | Physical | High | Present | None | None | Low | Low | High | 4.4 |
Replay Attack | Network | Low | Present | None | None | High | High | High | 9.2 |
Relay Attack | Adjacent | Low | Present | None | None | High | High | High | 7.7 |
Eavesdropping Attack | Adjacent | Low | None | None | None | High | None | None | 7.1 |
Sybil Attack | Network | High | Present | Low | None | High | High | High | 7.7 |
Blind Spot Exploitation | Physical | High | Present | None | None | Low | High | High | 4.6 |
Sensor Interference Attack | Physical | High | Present | None | None | Low | High | High | 4.6 |
Acoustic Attack | Physical | High | Present | None | None | None | None | High | 4.1 |
Impersonation Attack | Network | High | Present | Low | None | High | High | High | 7.7 |
Falsified Information Attack | Network | High | Present | None | None | High | High | High | 9.2 |
Cloaking Attack | Network | High | Present | None | None | High | High | High | 9.2 |
Attack Type | AV | AC | AT | PR | UI | C | I | A | Avg. CVSS Score |
---|---|---|---|---|---|---|---|---|---|
Blinding Attack | Network | High | Present | None | None | None | High | High | 8.3 |
Jamming Attack | Network | Low | Present | None | None | None | High | High | 8.3 |
Black Hole Attack | Network | High | Present | Low | None | None | High | High | 6.1 |
Timing Attack | Network | High | Present | Low | None | High | High | High | 7.7 |
Disruptive Attack | Network | High | Present | Low | None | None | High | High | 6.1 |
Replay Attack | Network | High | Present | None | None | None | High | High | 8.3 |
Relay Attack | Network | High | Present | None | None | Low | High | High | 8.4 |
Eavesdropping Attack | Network | Low | Present | None | None | High | None | None | 8.2 |
Sybil Attack | Network | High | Present | Low | None | High | High | High | 7.7 |
Blind Spot Exploitation | Local | Low | Present | None | None | None | High | High | 5.9 |
Sensor Interference Attack | Network | High | Present | None | None | None | High | High | 8.3 |
Acoustic Attack | Local | High | Present | None | None | Low | High | High | 6.0 |
Impersonation Attack | Network | High | Present | Low | None | High | High | High | 7.7 |
Falsified Information Attack | Network | High | Present | Low | None | High | High | High | 7.7 |
Cloaking Attacks | Network | High | Present | Low | None | High | High | High | 7.7 |
Metric | Class | Precision | Recall | F1 Score | Support | Accuracy |
---|---|---|---|---|---|---|
Attack Vector | NETWORK | 89.13% | 84.54% | 86.77% | 97 | 80.62% |
LOCAL | 66.67% | 74.07% | 70.18% | 27 | ||
ADJACENT | 28.57% | 40.00% | 33.33% | 5 | ||
Attack Complexity | LOW | 90.55% | 95.83% | 93.12% | 120 | 87.12% |
HIGH | 0% | 0% | 0% | 12 | ||
Scope | CHANGED | 60.00% | 30.00% | 40.00% | 20 | 85.25% |
UNCHANGED | 87.50% | 96.08% | 91.59% | 102 | ||
User Interaction | REQUIRED | 66.67% | 40.00% | 50.00% | 30 | 81.82% |
NONE | 84.21% | 94.12% | 88.89% | 102 | ||
Privilege Required | NONE | 78.13% | 83.33% | 80.65% | 60 | 69.75% |
LOW | 66.67% | 66.67% | 66.67% | 45 | ||
HIGH | 30.00% | 21.43% | 24.99% | 14 | ||
Confidentiality | NONE | 72.92% | 70.00% | 71.43% | 50 | 59.83% |
LOW | 53.19% | 62.50% | 57.47% | 40 | ||
HIGH | 45.45% | 37.04% | 40.82% | 27 | ||
Integrity | NONE | 69.77% | 66.67% | 68.18% | 45 | 59.28% |
LOW | 56.00% | 62.22% | 58.95% | 45 | ||
HIGH | 45.00% | 39.13% | 41.86% | 23 | ||
Availability | NONE | 62.86% | 62.86% | 62.86% | 35 | 51.61% |
LOW | 51.43% | 51.43% | 51.43% | 35 | ||
HIGH | 34.78% | 34.78% | 34.78% | 23 |
Metric | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Attack Vector | NETWORK | 98.15% | 100.00% | 99.07% | 98.29% |
LOCAL | 100.00% | 81.82% | 89.92% | ||
Attack Complexity | LOW | 93.91% | 98.18% | 96.00% | 92.31% |
HIGH | 0% | 0% | 0% | ||
Scope | CHANGED | 62.50% | 38.46% | 47.62% | 90.60% |
UNCHANGED | 92.66% | 97.12% | 94.81% | ||
User Interaction | REQUIRED | 75.00% | 44.44% | 55.81% | 83.76% |
NONE | 85.15% | 95.56% | 90.00% | ||
Privilege Required | NONE | 90.00% | 93.75% | 91.84% | 84.21% |
LOW | 80.65% | 75.76% | 78.13% | ||
HIGH | 71.43% | 71.43% | 71.43% | ||
Confidentiality | NONE | 83.33% | 85.71% | 84.51% | 75.58% |
LOW | 68.97% | 68.97% | 68.97% | ||
HIGH | 71.43% | 68.18% | 69.77% | ||
Integrity | NONE | 83.33% | 83.33% | 83.33% | 73.17% |
LOW | 66.67% | 73.33% | 69.84% | ||
HIGH | 68.42% | 59.09% | 63.41% | ||
Availability | NONE | 72.00% | 78.26% | 75.00% | 65.15% |
LOW | 65.22% | 60.00% | 62.50% | ||
HIGH | 52.63% | 55.56% | 54.05% |
Metric | Class | Precision | Recall | F1 Score | Accuracy |
---|---|---|---|---|---|
Attack Vector | NETWORK | 97.14% | 98.08% | 97.60% | 95.73% |
LOCAL | 83.33% | 76.92% | 80.00% | ||
Attack Complexity | LOW | 94.83% | 95.65% | 95.24% | 90.98% |
HIGH | 16.67% | 14.29% | 15.38% | ||
Scope | CHANGED | 61.54% | 40.00% | 48.48% | 86.07% |
UNCHANGED | 89.00% | 95.10% | 91.95% | ||
User Interaction | REQUIRED | 82.35% | 43.75% | 57.14% | 82.50% |
NONE | 82.52% | 96.59% | 89.01% | ||
Privilege Required | NONE | 87.72% | 90.91% | 89.29% | 80.37% |
LOW | 77.78% | 73.68% | 75.68% | ||
HIGH | 57.14% | 57.14% | 57.14% | ||
Confidentiality | NONE | 83.33% | 87.50% | 85.37% | 77.55% |
LOW | 69.44% | 75.76% | 72.46% | ||
HIGH | 80.00% | 64.00% | 71.11% | ||
Integrity | NONE | 80.00% | 80.00% | 80.00% | 71.43% |
LOW | 69.77% | 75.00% | 72.29% | ||
HIGH | 60.00% | 52.17% | 55.81% | ||
Availability | NONE | 68.97% | 74.07% | 71.43% | 63.75% |
LOW | 66.67% | 60.00% | 63.16% | ||
HIGH | 54.17% | 56.52% | 55.32% |
Model | Accuracy | Precision (L) | Precision (H) | Recall (L) | Recall (H) | F1 Score (L) | F1 Score (H) | Macro F1 |
---|---|---|---|---|---|---|---|---|
TF-IDF + Logistic Regression | 0.821 | 0.882 | 0.400 | 0.909 | 0.333 | 0.896 | 0.364 | 0.630 |
BERT | 0.846 | 0.865 | 0.500 | 0.970 | 0.167 | 0.914 | 0.250 | 0.582 |
USE | 0.795 | 0.879 | 0.333 | 0.879 | 0.333 | 0.879 | 0.333 | 0.606 |
SVM | 0.821 | 0.882 | 0.400 | 0.909 | 0.333 | 0.896 | 0.364 | 0.630 |
BERT_Enhanced | 0.821 | 0.882 | 0.400 | 0.909 | 0.333 | 0.896 | 0.364 | 0.630 |
Model | Accuracy | Precision (L) | Recall (L) | F1 Score (L) | Precision (H) | Recall (H) | F1 Score (H) | Time |
---|---|---|---|---|---|---|---|---|
TF-IDF + Logistic Regression | 0.87 | 0.95 | 0.78 | 0.86 | 0.81 | 0.96 | 0.88 | 1 s |
USE | 0.70 | 0.72 | 0.66 | 0.69 | 0.69 | 0.74 | 0.71 | 211 s |
SVM (TF-IDF) | 0.87 | 0.93 | 0.80 | 0.86 | 0.82 | 0.94 | 0.88 | 1 s |
Best Tuned USE (Threshold = 0.5) | 0.63 | 0.69 | 0.48 | 0.56 | 0.60 | 0.78 | 0.68 | 51 s |
Tuned USE (Custom Threshold) | 0.64 | 1.00 | 0.28 | 0.44 | 0.58 | 1.00 | 0.74 | 1 s |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Farghaly, M.S.; Aslan, H.K.; Abdel Halim, I.T. A Hybrid Human-AI Model for Enhanced Automated Vulnerability Scoring in Modern Vehicle Sensor Systems. Future Internet 2025, 17, 339. https://doi.org/10.3390/fi17080339
Farghaly MS, Aslan HK, Abdel Halim IT. A Hybrid Human-AI Model for Enhanced Automated Vulnerability Scoring in Modern Vehicle Sensor Systems. Future Internet. 2025; 17(8):339. https://doi.org/10.3390/fi17080339
Chicago/Turabian StyleFarghaly, Mohamed Sayed, Heba Kamal Aslan, and Islam Tharwat Abdel Halim. 2025. "A Hybrid Human-AI Model for Enhanced Automated Vulnerability Scoring in Modern Vehicle Sensor Systems" Future Internet 17, no. 8: 339. https://doi.org/10.3390/fi17080339
APA StyleFarghaly, M. S., Aslan, H. K., & Abdel Halim, I. T. (2025). A Hybrid Human-AI Model for Enhanced Automated Vulnerability Scoring in Modern Vehicle Sensor Systems. Future Internet, 17(8), 339. https://doi.org/10.3390/fi17080339