Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning

Liu, Ruitong; Xie, Yaxuan; Dang, Zexu; Hao, Jinyi; Quan, Xiaowen; Xiao, Yongcai; Peng, Chunlei

doi:10.3390/electronics14122334

Open AccessArticle

Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning

by

Ruitong Liu

^1,2,

Yaxuan Xie

^1,3,*,

Zexu Dang

¹,

Jinyi Hao

¹,

Xiaowen Quan

⁴,

Yongcai Xiao

⁵ and

Chunlei Peng

^1,3

¹

School of Cyber Engineering, Xidian University, Xi’an 710126, China

²

Key Laboratory of Cyberspace Security, Zhengzhou 450001, China

³

State Key Laboratory of Integrated Services Networks (ISN), Xi’an 710126, China

⁴

Yuanjiang Shengbang Safety Technology Group Co., Beijing 100085, China

⁵

State Grid Jiangxi Electric Power Research Institute, Nanchang 330052, China

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(12), 2334; https://doi.org/10.3390/electronics14122334

Submission received: 7 May 2025 / Revised: 29 May 2025 / Accepted: 5 June 2025 / Published: 7 June 2025

(This article belongs to the Special Issue Cryptography and Computer Security)

Download

Browse Figures

Versions Notes

Abstract

With the increasing number of network security threats and the frequent occurrence of software vulnerability attacks, the effective management and large-scale retrieval of vulnerability data have become urgent needs. Existing vulnerability information is scattered across heterogeneous sources and is difficult to integrate, which in turn makes it hard for security analysts to quickly retrieve and analyze relevant security knowledge. To address this problem, this paper proposes a method to construct a vulnerability knowledge graph by integrating multi-source vulnerability data, combining graph embedding technology with large language model reasoning to aggregate, infer, and enrich vulnerability knowledge. Experiments demonstrated that our domain-tuned Bidirectional Long Short-Term Memory–Conditional Random Field (BiLSTM-CRF) named entity recognition (NER), enhanced with a cybersecurity dictionary, achieved a 90.1% F1-score for entity extraction. For link prediction, a hybrid Graph Attention Network fused with GPT-3 reasoning boosted Hits1 by 0.137, Hits3 by 0.116, and Hits10 by 0.101 over the baseline. These results confirm that our approach markedly enhanced entity identification and relationship inference, yielding a more complete and dynamically updatable cybersecurity knowledge graph.

Keywords:

knowledge graph; cybersecurity; vulnerability data; graph embedding; large language model; link prediction

1. Introduction

Cybersecurity threats have escalated in both frequency and sophistication, rendering the effective management of vulnerability data increasingly critical. Software vulnerabilities—such as Cross-Site Request Forgery (CSRF), Server-Side Request Forgery (SSRF), and SQL injection—pose significant risks by undermining the confidentiality, integrity, and availability of systems. In this context, the efficient collection, integration, and analysis of multi-source vulnerability data have become essential to enable proactive defense and timely threat mitigation. Data analysis [1], causal reasoning [2], semantic understanding [3], and other technologies grounded in knowledge modeling have emerged as critical solutions in the era of big data. This paper focuses on constructing a comprehensive vulnerability knowledge graph that unifies heterogeneous data from multiple sources. Our objective is to overcome the challenges inherent in traditional vulnerability data management approaches and to leverage advanced AI techniques to enhance security analysis and threat prediction.

Current research on multi-source vulnerability data primarily relies on two main methodologies. However, existing studies in vulnerability data management still face several limitations:

Data Fragmentation and Static Limitations: Traditional methods that depend heavily on established vulnerability databases (such as the National Vulnerability Database (NVD) or the China National Vulnerability Database (CNVD)) suffer from data fragmentation. Information is distributed across multiple databases and formats, making unified integration difficult [4]. Moreover, these structured databases often provide incomplete coverage and fail to capture implicit relationships (e.g., causal or prerequisite links between vulnerabilities). They also tend to remain static, lacking real-time updates or dynamic insights.
Challenges in Unstructured Data Extraction: Researchers have used NLP methods, such as Named Entity Recognition (NER) and relation extraction, to mine vulnerabilities from unstructured sources (e.g., bulletins, blogs, threat reports). However, cybersecurity texts contain specialized jargon, non-standard abbreviations, and ambiguous or noisy language, which impedes accurate extraction and normalization. As a result, merging these unstructured insights with structured databases is prone to errors and demands tailored solutions.
Heterogeneous Data Fusion and Dynamic Update Issues: Attempts to fuse structured and unstructured data into knowledge graphs—via TransH embeddings or Graph Neural Networks—are hampered by data heterogeneity. Existing security ontologies lack the flexibility to represent the full spectrum of vulnerability information, and even advanced embeddings fail to capture many implicit links. Consequently, most graphs remain static snapshots that do not auto-update with new vulnerabilities or attack patterns, reducing their real-time threat-prediction and analysis utility.

To address these challenges, we propose a multi-faceted approach that constructs a dynamic vulnerability knowledge graph by integrating multi-source data and leveraging state-of-the-art artificial intelligence techniques [5]. The construction of a cybersecurity knowledge graph offers the following advantages: (1) by employing knowledge-graph building and refinement techniques—such as ontology construction [6], information extraction [7,8], and entity disambiguation [9]—the graph can effectively extract and integrate existing knowledge from multi-source vulnerability data. Our solution consists of the following key components:

Multi-Source Data Integration: We aggregate vulnerability data from both structured sources (e.g., NVD, CNVD) and unstructured web content, then map everything to a unified cybersecurity ontology built by extending the UCO [10] and IDS frameworks [11]. This ontology standardizes entities (Vulnerability, Weakness, Product, Attack) and relation types. Finally, we apply cleaning, deduplication, and entity normalization to ensure consistent, high-quality integration across all sources.
Domain-Specific Knowledge Extraction: For unstructured text, we use a BiLSTM-CRF NER model augmented with a dictionary of over 150 cybersecurity terms to accurately extract CVE identifiers, product and vendor names, and weakness categories. An attention-based CNN then identifies relations such as “affects”, “has weakness”, and “exploited by”, converting free text into structured triples ready for knowledge graph ingestion.
Link Prediction via Graph Embedding and LLM Reasoning: Even a high-quality initial knowledge graph can miss important connections, so we employ a two-pronged link prediction strategy. First, we combine TransH-based KG embeddings with BERT-derived textual node vectors to power a text-enhanced Graph Attention Network (GAT) that uncovers structural relationships. Second, we leverage large language models—GPT-3 [12], GPT-4 [13], or open-source alternatives like LLaMA [14]—to generate candidate links from vulnerability descriptions, which are then validated via BERT semantic similarity checks. Finally, a weighted fusion of GAT and LLM confidence scores integrates both structural and semantic evidence, yielding a more complete and reliable cybersecurity knowledge graph [15].
Dynamic Graph Updating: Our system is designed to operate as an autonomous agent. It continuously monitors new vulnerability data, updates the knowledge graph by re-running the extraction and prediction pipelines, and ensures that the graph remains current and relevant for ongoing threat analysis.

The main contributions of this paper can be summarized as follows:

Integration Framework: We present an ontology-driven integration method that seamlessly fuses structured vulnerability databases and unstructured textual sources into a unified knowledge graph.
Enhanced Extraction: We develop a domain-enhanced NER model and an attention-based relation extraction model, which together improve the accuracy and completeness of extracted vulnerability information compared to generic NLP approaches.
Hybrid Link Prediction: We introduce a novel combination of graph-based and LLM-based reasoning for link prediction. By fusing a text-enhanced GAT with GPT-3-generated insights, our method can infer missing relationships that neither approach could fully capture alone.
Dynamic Updating: We design a system capable of making continuous, autonomous updates to the knowledge graph as new data emerge.

For example, in a real-world enterprise scenario, when a new critical vulnerability is disclosed, our system can automatically gather its details from structured databases and unstructured sources. It then identifies the affected product and vendor, and immediately updates the knowledge graph with a new 〈Product, hasVulnerability, Vulnerability〉 link. This real-time integration alerts security analysts to the emerging threat and its context, demonstrating the applicability of our approach in practical threat monitoring situations.

The remainder of this paper is organized as follows: Section 2 reviews related work on vulnerability databases, security knowledge graphs, and relevant NLP and graph techniques. Section 3 details our methodology, including data integration, knowledge extraction, graph construction, and link inference components. Section 4 presents the experiments and evaluation results for each component of our approach. Section 5 provides a discussion of the results and the system’s capabilities and limitations. Section 6 concludes the paper and outlines future research directions.

2. Related Work

Recent years have witnessed significant advances in knowledge graph construction and information extraction for cybersecurity. Early efforts in this domain focused on developing semantic networks and ontologies to represent security knowledge. At present, common English knowledge bases include YAGO [10], DBpedia [16], Wikidata [17], etc., and Chinese knowledge graphs include OpenKG [18], CN-DBpedia [19], etc. These knowledge bases provide high-quality data for retrieval. The National Vulnerability Database (NVD) and related repositories such as CNVD provide standardized vulnerability records; however, these resources often lack rich contextual relationships. Several works have attempted to build cybersecurity knowledge graphs by integrating multiple data sources. For instance, Jia [20] et al. proposed frameworks that merge security data from vulnerability databases and threat reports using domain ontologies. Similarly, the Apache Metron project demonstrated real-time threat analysis by combining various streaming security data into a unified graph model (incorporating events from intrusion detection systems, logs, etc.) [21]

Security information extraction has evolved from brittle rule-based and CRF methods to deep learning: BiLSTM-CRF with domain-specific dictionaries now leads for NER accuracy, while relation extraction has moved from simple feature-based classifiers to attention-enhanced CNNs and BiLSTMs that capture long-range dependencies and complex technical relations.

Knowledge graph completion (link prediction) has also drawn considerable attention in recent years. Embedding-based methods such as TransE [22] and TransH [23] learn low-dimensional vector representations of entities and relations to predict missing links in a knowledge graph by modeling relationships among vulnerabilities to prioritize risks [24]. These translational models interpret relations as geometric transformations in the embedding space. Beyond conventional embedding-based models, integrating LLMs can enrich semantic context [25]. More recently, Graph Neural Networks (GNNs)—including Graph Attention Networks (GAT)—have been applied to combine structural information from the graph with node attributes (like textual descriptions) to yield higher accuracy in link prediction [26]. Additionally, large language models (LLMs) such as GPT-3 and BERT have been leveraged to generate or verify candidate relations using natural language understanding [27].

Overall, prior KG efforts have yet to integrate advanced NER/RE with hybrid GNN-LLM link prediction; our approach uniquely fuses structured databases, domain-specific text extraction, and multi-source inference to build a dynamically updatable vulnerability knowledge graph.

3. Methodology

In this section, we outline our multi-stage pipeline for constructing a dynamically updatable vulnerability knowledge graph (Figure 1). We fuse structured and unstructured data, apply advanced NLP for domain-specific extraction, and leverage graph embeddings alongside an LLM to infer missing links. Designed for continuous real-time updates, the methodology comprises five components: (1) Data Collection and Preprocessing; (2) Domain-Specific Information Extraction; (3) Graph Embedding and Link Prediction; (4) LLM-Based Relation Inference and Fusion; and (5) System Deployment and Update Strategy.

3.1. Data Collection and Preprocessing

Our first step was to aggregate vulnerability data from multiple sources and prepare it for integration into a unified knowledge graph.

3.1.1. Data Sources and Ontology Integration

We collected data from two types of sources:

Structured Sources: These include established databases such as the National Vulnerability Database (NVD). For structured data, a total of 4003 CVE entities, 891 CWE entities [28], and 522 CAPEC entities were obtained. These entities constitute the nodes in the knowledge graph. The obtained nodes are stored in csv files, and the dataset is preprocessed to construct subsequent triples.
Unstructured Sources: We also harvested data via web crawling from security blogs, threat intelligence reports, and exploit databases [29].

All raw data are converted into a unified JSON format and mapped to a custom cybersecurity ontology that extends the Unified Cybersecurity Ontology (UCO) with elements from IDS ontologies and standards such as CVE, CWE, and CAPEC. As shown in Figure 2, the ontology defines standardized entity classes (e.g., Vulnerability, Software (Product), Vendor, Weakness, AttackPattern, Exploit) and relationship types (e.g., affects, hasWeakness, parentOf, relatedTo), ensuring semantic consistency across data sources. To further enhance entity recognition, we incorporate a cybersecurity dictionary into the BiLSTM-CRF model, providing prior knowledge of domain-specific terms and significantly improving the recall for uncommon entities such as product and vendor names. Additionally, an attention mechanism in the CNN-based relation extraction focuses on context words around entity mentions (e.g., version numbers or keywords like “affect”), disambiguating entities and improving classification accuracy. Together, this dictionary augmentation and attention-focused context modeling yield higher precision and recall in recognizing Vulnerability, Product, and Vendor entities from noisy text.

3.1.2. Data Cleaning and Normalization

Data preprocessing is critical for ensuring quality and consistency. Our process includes

Format Unification: Diverse records are transformed into a standardized JSON schema aligned with our ontology.
Text Cleaning: We remove HTML tags, boilerplate content, and non-informative text segments to isolate relevant information.
Deduplication and Entity Normalization: Duplicate entries (e.g., the same CVE reported in multiple sources) are merged, and variant expressions (e.g., “Windows 10” vs. “Microsoft Windows 10”) are normalized using a curated dictionary.

3.2. Domain-Specific Information Extraction

After preprocessing, we extract structured knowledge (entities and relations) from the collected data using two key components: Named Entity Recognition (NER) and Relation Extraction (RE).

Named Entity Recognition (NER)

The Named Entity Recognition (NER) model based on Bi-directional Long Short-Term Memory (BiLSTM) [30] uses distributed representations of words or characters as input to the BiLSTM to capture contextual features, and the prediction results output by the BiLSTM are used as features for CRF learning to obtain the optimal tagging results. An overview architecture of the BiLSTM-CRF model is shown in Figure 3. This model consists of five layers: a one-hot layer, an embedding layer, a BiLSTM layer, a CRF layer, and an output layer.

Using the BIO tagging scheme, an input sequence of n characters is represented as

X = (x_{1}, x_{2}, \dots, x_{n})

, where each character

x_{i}

in the sequence is tagged as B- (Begin), I- (Inside), or O- (Outside), and then the one-hot encoding for each character is calculated. The following provides a detailed explanation of each layer:

Embedding Layer

The embedding layer uses a simplified neural network to map discrete one-hot high-dimensional vectors into low-dimensional character embeddings. Each character in the input sequence X is embedded into a distributed space, with the specific expression as follows:

x = x \times E \in R^{d}

(1)

E = (e_{1}, e_{2}, \dots, e_{n})

represents a randomly initialized or pre-trained embedding lookup table, where d is the dimension of the embedding. Using character-based distributed representations for text embedding is considered a better choice than word-based embeddings, especially for multi-class mixed character situations.

BiLSTM Layer

The BiLSTM layer is used to automatically capture sentence-level features and can effectively capture long-term dependencies to predict the label probability for each position. BiLSTM receives the embedding of each character and predicts the label probability for each corresponding character. LSTM is a special type of recurrent neural network that introduces a gating mechanism to control the input, output, and forgetting of information through gating units.

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(2)

c_{t}^{'} = tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(3)

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(4)

c_{t} = f_{t} \cdot c_{t - 1} + i_{t} \cdot c_{t}^{'}

(5)

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(6)

h_{t} = o_{t} \cdot tanh (c_{t})

(7)

The core idea of LSTM is to introduce three gating units: the input gate

i_{t}

, the forget gate

f_{t}

, and the output gate

o_{t}

. These gates allow the LSTM network to selectively retain or forget information, thereby better handling sequential data. The corresponding gate equations are given below, where

c_{t}

is the current cell state. The forget gate receives

h_{t - 1}

and

x_{t}

and, via Formula (2), outputs a value

f_{t} \in [0, 1]

; this value is applied to the previous cell state

c_{t - 1}

, with 1 meaning full retention and 0 full forgetting. The input gate receives

h_{t - 1}

and

x_{t}

and, via Formula (3), outputs

i_{t} \in [0, 1]

to control how much of the candidate state

c_{t}^{'}

is kept. The candidate state

c_{t}^{'}

is created by a tanh layer in Formula (4), and then the current cell state is updated by Formula (5) based on

c_{t - 1}

,

f_{t}

,

i_{t}

, and

c_{t}^{'}

. The output gate receives

h_{t - 1}

and

x_{t}

and, via Formula (6), outputs

o_{t} \in [0, 1]

. Finally, Formula (7) determines how much of

c_{t}

is exposed to the next hidden state

h_{t}

.

CRF Layer

By adding a CRF layer, we enforce sequence-labeling constraints via tag transition probabilities, capturing contextual token dependencies and ensuring valid tag sequences for improved accuracy. In a Named Entity Recognition task, the CRF layer defines constraints via feature functions—for example, enforcing that an entity start tag is “B–LABEL” rather than “I–LABEL.” By learning the tight dependencies between tags, the CRF assigns a score

s (X, y)

to each candidate tag sequence

y = (y_{1}, \dots, y_{n})

for an input

X

, and selects the highest-scoring sequence via

s (X, y) = \sum_{i = 1}^{n} P_{i, y_{i}} + \sum_{i = 1}^{n - 1} A_{y_{i}, y_{i + 1}}

(8)

where

P_{i, y_{i}}

is the BiLSTM-computed score for assigning tag

y_{i}

at position i, and

A_{y_{i}, y_{i + 1}}

is the transition score from tag

y_{i}

to tag

y_{i + 1}

. The conditional probability of

y

given

X

is

P (y ∣ X) = \frac{exp (s (X, y))}{\sum_{y^{'} \in Y_{X}} exp (s (X, y^{'}))}

(9)

and during training, we maximize the log-likelihood

log P (y ∣ X) = s (X, y) - log \sum_{y^{'} \in Y_{X}} exp (s (X, y^{'})) .

(10)

At decoding time, the Viterbi algorithm finds the best tag sequence

y^{*} = arg max_{y^{'} \in Y_{X}} s (X, y^{'}) .

(11)

Domain Dictionary Correction

To enhance NER performance, we propose a domain-dictionary–based NER correction method. We constructed the cybersecurity term dictionary by compiling over 250 domain-specific entries from reputable sources (including common vendor names, CVE prefixes, and weakness abbreviations). Each term was reviewed and validated by domain experts to ensure its correctness and relevance. In our NER model, tokens matching this dictionary are tagged with additional features, allowing the BiLSTM-CRF to more confidently recognize rare or abbreviated security terms. First, the TF–IDF scores of textual keywords are employed to build a cyber-security domain dictionary. By computing and ranking the TF–IDF values of contextual keywords, we ultimately obtain a domain dictionary containing 155 specialized terms.

In the NER task, a token sequence is denoted as

S_{1} = (s_{1}, s_{2}, \dots, s_{n})

and the dictionary text has length L. The variables forward_result and backward_result are initialized to store the results of the forward and backward matching, respectively.

Algorithm 1 describes the bi-directional maximum match (BMM) procedure.

Algorithm 1 Bi-Directional Maximum Match (BMM).

Require: Dictionary sequence

S = (s_{1}, \dots, s_{n})

Ensure:forwardResult, backwardResult

1:: $L \leftarrow n$
2:: forwardResult $\leftarrow []$ , backwardResult $\leftarrow []$
3:: $i \leftarrow 1$ , $j \leftarrow L$
4:: while $i \leq L$ and $j \geq 1$ do
5:: forwardMatch ← longest match from position i forward
6:: backwardMatch ← longest match from position j backward
7:: if $| forwardMatch | > 0$ or $| backwardMatch | > 0$ then
8:: if $| forwardMatch | > | backwardMatch |$ then
9:: append forwardMatch to forwardResult
10:: $i \leftarrow i + | forwardMatch |$
11:: else
12:: append backwardMatch to backwardResult
13:: $j \leftarrow j - | backwardMatch |$
14:: end if
15:: else
16:: break
17:: end if
18:: end while

3.3. Graph Embedding and Link Prediction

3.3.1. Relation Extraction (RE)

For relation extraction, we employ an attention-based CNN comprising four stages: embedding, convolution, attention, and a multi-layer perceptron (MLP). In the embedding stage, we compare two static word embeddings—CBOW [31] and GloVe [32]—alongside position and POS embeddings to capture both local context and global co-occurrence statistics. The convolutional layers then extract local n-gram features, with max-pooling distilling the most salient signals into a sentence convolution vector. An attention layer computes weights over tokens to produce a contextually focused vector, which we concatenate with the CNN output to form a unified representation [33]. Finally, a fully connected MLP processes this vector and selects the relation class with the highest probability.

3.3.2. TransH Embedding

We first employ the TransH model to learn low-dimensional vector representations of entities and relations. For any given triple

(h, r, t)

, entities h and t are represented as vectors in

R^{d}

. In TransH, each relation r is associated with a hyperplane defined by its normal vector

w_{r}

(with

∥ w_{r} ∥ = 1

) and a translation vector

d_{r}

. The projection of entity h onto the hyperplane is computed as

h_{⊥} = h - (w_{r}^{⊤} h) \cdot w_{r},

(12)

and similarly for t:

t_{⊥} = t - (w_{r}^{⊤} t) \cdot w_{r} .

(13)

The scoring function for the triple is then defined as

f_{r} (h, t) = {∥ h_{⊥} + d_{r} - t_{⊥} ∥}_{2}^{2} .

(14)

This function enforces that, for valid triples,

h_{⊥} + d_{r}

is close to

t_{⊥}

.

To further enhance entity representations, we derive textual embeddings

h_{d}

and

t_{d}

from entity descriptions using a CNN encoder with word2vec features. The final combined scoring function is

f_{r}^{final} (h, t) = f_{s s} (h, t) + f_{d d} (h, t) + f_{s d} (h, t) + f_{d s} (h, t)

(15)

where

f_{s s} (h, t)

is the structural score,

f_{d d} (h, t)

is the descriptive score, and

f_{s d}

and

f_{d s}

capture cross-interactions between structural and descriptive embeddings.

3.3.3. Text-Enhanced Graph Attention Network (GAT)

To predict missing links, we extend our embedding approach with a Graph Attention Network (GAT). Each node’s feature is constructed by concatenating its TransH embedding with its textual embedding obtained from BERT. The GAT aggregates neighbor information via attention coefficients. For a node i with neighbor j, the attention coefficient is computed as

α_{i j} = \frac{exp (LeakyReLU (a^{T} [W h_{i} ∥ W h_{j}]))}{\sum_{k \in N_{i}} exp (LeakyReLU (a^{T} [W h_{i} ∥ W h_{k}]))},

(16)

where ‖ denotes concatenation,

W

is a weight matrix,

a

is a learnable attention vector, and

N_{i}

is the neighborhood of node i. Multiple attention heads enable the aggregation of multi-hop neighborhood information, capturing both explicit and implicit relationships. The GAT is trained on a supervised link prediction task using margin-based ranking loss, with performance measured by Mean Reciprocal Rank (MRR) and Hits@K.

3.4. LLM-Based Relation Inference and Fusion

To further complement graph-based predictions, we incorporate LLM-based reasoning to infer missing links. The basic idea is as follows: First, the initially generated vulnerability knowledge graph (the nodes and relationship representations output by the GAT model) is used as input. By designing specific prompts, the vulnerability entities and their descriptions are passed to GPT-3, allowing it to generate potential vulnerability relationships. At the same time, BERT is used to semantically match and verify the generated candidate relationships, ensuring that the predicted relationships are highly consistent with the actual security context. Our hybrid relation prediction framework consists of four key components that work in tandem:

(i): Graph-based module: A text-enhanced Graph Attention Network (GAT) that leverages the structure of the existing knowledge graph (nodes and known links) and node textual descriptions to predict potential relations.
(ii): LLM-based module: A large language model (GPT-3) that generates candidate relations by reasoning over vulnerability descriptions in natural language (as described in Section 3.4.1).
(iii): Validation module: A BERT-based semantic matcher that evaluates each candidate relation’s plausibility by measuring contextual similarity, filtering out spurious suggestions (Section 3.4.2).
(iv): Fusion mechanism: A weighted integration strategy that combines the confidence scores from the GAT and the LLM (Section 3.4.3).

These components complement each other—the GAT excels at capturing implicit links from graph topology and known data, while the LLM injects external knowledge and interprets the context to propose links beyond the graph’s immediate scope. The BERT validation serves as a bridge, ensuring only high-confidence, semantically sound relations from the LLM are retained. Finally, by fusing both sources with appropriate weights, the framework achieves more robust and complete relation inference than either method alone, as evidenced by our results (the hybrid approach significantly outperformed using GAT or LLM individually). In essence, the GAT provides precision from structural patterns and the LLM provides recall of novel associations, and together they substantially enhance the knowledge graph’s relationship reasoning capability.

In this way, the large model not only automatically completes the missing vulnerability relationships in the knowledge graph, but also improves the semantic accuracy of the existing relationships, providing more comprehensive and dynamic support for the entire knowledge graph construction.

3.4.1. GPT-3 Candidate Generation

We use the OpenAI GPT-3 API (text-davinci-003) to generate relationship hypotheses. A carefully crafted prompt template provides the model with a vulnerability’s description and asks it to predict related entities and their relation types. The prompt explicitly instructs GPT-3 to output results in structured triple format (as shown in the example below). We set a moderate creativity level (temperature = 0.7) to encourage plausible links without veering off-topic, and limit the maximum tokens to control response length. The API returns text completions which we parse to extract candidate ⟨head, relation, tail⟩ triples. For example,

CVE-2023-1231 is a buffer overflow vulnerability that mainly affects Windows systems and may lead to memory corruption and data leakage. This vulnerability may interact with other vulnerabilities, leading to more widespread attacks. Based on the description above, predict which other vulnerabilities might be related to this one, and output the candidate relationships in the form of triples, ensuring that each triple is formatted as ⟨head, entity, relation, tail, entity⟩.

In this way, GPT-3 can generate candidate triples such as

<CVE-2023-1231, CanPrecede, CWE-79>

revealing potential connections between vulnerabilities in specific attack scenarios. This process fully utilizes GPT-3’s advantage in understanding complex semantics and generating coherent text, allowing the system to capture implicit relationships that traditional methods might miss.

3.4.2. BERT-Based Validation

To validate the GPT-3–generated candidate relations, we pass each triple and its vulnerability description through a BERT model for semantic matching. Leveraging its bidirectional Transformer and self-attention layers, BERT encodes each word with full left–right context and aggregates global sentence-level semantics. We then compute the cosine similarity between head and tail entity embeddings, retaining only those triples whose score exceeds a preset threshold. This filters out spurious or low-confidence relations. By combining GPT-3’s generative power with BERT’s precise semantic verification, our joint pipeline both proposes novel links and ensures their accuracy. Consequently, the knowledge graph is dynamically enriched with validated relationships, improving its completeness and reliability for downstream security analysis and risk assessment.

3.4.3. Knowledge Completion and Fusion Strategy

After generating and validating candidate vulnerability relations with the large-scale model module, we implement a fusion strategy to construct a complete and accurate vulnerability knowledge graph by weighted integration of two prediction sources. On one hand, the text-augmented GAT model yields preliminary relation scores based on graph structure and textual embeddings; on the other hand, the large model (e.g., GPT-3) proposes candidate relations, whose semantic consistency is verified by BERT to produce a confidence score. We then fuse these two sources to automatically complete missing links and refine predictions.

For each candidate relation, we compute

Structural score $S_{GAT}$ : obtained by the GAT model from entity and relation vectors via cosine similarity or Euclidean distance, reflecting local graph connectivity.
Language-model score $S_{LM}$ : returned by GPT-3 and adjusted by BERT’s semantic matching verification, indicating the plausibility and trustworthiness of the generated relation.

We combine these by weighted summation:

S_{final} = α S_{GAT} + β S_{LM}, α + β = 1,

(17)

where

α

and

β

are weights for the GAT and LM modules, respectively. These weights are tuned on a validation set using Bayesian optimization to maximize the F₁ score, precision, and recall. A relation is accepted into the final graph only if

S_{final}

exceeds a threshold T. This filters out low-confidence or noisy candidates, while preserving high-quality predictions.

The combination weights

α

(for GAT’s structural score) and

β

(for the LLM’s semantic score) were not chosen arbitrarily; we optimized these parameters on a validation set. In particular, we employed Bayesian optimization to maximize link prediction

F_{1}

, precision, and recall, which yielded an optimal setting of

α = 0.7

and

β = 0.3

(with

α + β = 1

) in our experiments. This indicates that the final predictions gave slightly more emphasis to the structural cue. We fixed these weights in the test phase. We also set a confidence threshold

T = 0.62

for

S_{final}

based on validation to decide which new relations to accept into the graph.

This fusion strategy offers several advantages:

Complementarity: GAT captures local structural cues, while the large model leverages global semantic knowledge, yielding richer relation hypotheses.
Adaptivity: By dynamically adjusting $α$ and $β$ , the system can favor structural or semantic evidence according to the data density and ambiguity.
Validation loop: BERT semantic matching provides feedback to filter spurious outputs and reinforce contextually valid relations.

In summary, our weighted fusion of GAT-based and LM-based predictions produces a more complete, dynamically updated vulnerability knowledge graph, combining structural precision with semantic depth, and thus furnishing robust data support for downstream security analysis and risk assessment.

3.5. System Deployment and Update Strategy

Our system continuously monitors multiple sources for new vulnerability data. Specifically, we have scheduled data collection jobs that periodically fetch the latest entries from structured databases (e.g., NVD’s RSS feed, CNVD updates) and crawl unstructured sources like security advisories or threat blogs for any new vulnerability reports. Once new data are detected, they enter the update pipeline. Our framework is designed for dynamic operation in a continuously changing threat landscape. The system functions as an autonomous agent that periodically ingests new vulnerability reports. The update process involves

Data Ingestion: New structured and unstructured data are preprocessed .
Triple Extraction: The NER/RE pipeline extracts new entities and relations from the incoming data.
Embedding Update: Node embeddings are recalculated, and the GAT predicts new links between existing and new nodes.
LLM Inference: Ambiguous or novel cases are processed through the GPT-3 module, with BERT validation applied.
Graph Merging: Newly predicted relations are merged into the existing knowledge graph with provenance logs for auditing.

The above process ensures that the knowledge graph is refreshed in near real time whenever new information emerges. Because updates are processed incrementally (adding or updating only the affected nodes and relations), the system maintains timeliness without needing an expensive complete rebuild of the graph each time. This design is inherently scalable—as the volume of vulnerability data grows, we can parallelize the ingestion and extraction steps, and the graph storage back-end can incrementally accommodate new nodes/edges. In other words, the knowledge graph expands dynamically with incoming data while preserving performance, ensuring both up-to-dateness and the ability to handle increasing scale. This modular design enables continuous refinement and expansion of the vulnerability knowledge graph in real time, ensuring that it remains current and valuable for proactive cybersecurity defense [34].

While this architecture provides clear advantages in terms of adaptability, scalability, and real-time knowledge updating, it also presents several practical challenges that must be carefully addressed to maintain performance, integration, and data quality.

Performance and Scalability: The pipeline must handle high-volume data streams (e.g., surges of vulnerability reports) in near real-time, which requires efficient parallel processing and optimization of each module (our design allows the NER/RE, GAT, and LLM components to run in parallel or be distributed across servers to scale).
Integration with Existing Workflows: The knowledge graph should integrate with enterprise security dashboards and incident response systems, which may require developing APIs or adapters and ensuring data format compatibility.
Latency and Cost of LLM Calls: Relying on a large language model (GPT-3/4) can incur latency and usage costs; caching frequent queries and using local fine-tuned models for less critical tasks are strategies to mitigate this.
Data Quality and Maintenance: Ensuring the continuous quality of the graph is non-trivial, as newly ingested data might introduce noise or inconsistencies, so we implement validation steps (like the BERT-based check) and maintain provenance metadata for auditing.

3.6. Security Knowledge Graph Construction

Vulnerability entries in CWE, CVE, and CAPEC datasets can be viewed as software-security entities, each indexed by a unique identifier. These entities participate in a rich set of intra-type and cross-type relations. Based on these three datasets and their complex interconnections, we constructed a Security Knowledge Graph (Figure 4). It comprises 4003 CVE entities, 891 CWE entities, and 522 CAPEC entities, linked by nine relation types.

Formally, we represent the graph as

G = (E, R, S),

where E, R, and S denote the sets of entities, relations, and triples, respectively. Each triple is an ordered triple

〈 h, r, t 〉

of head entity h, relation r, and tail entity t. Figure 5 illustrates the triple

〈 CWE - 733, TargetOf, CAPEC - 10 〉

and its associated entity descriptions.

3.7. Data Modeling and Storage

For CVE, CWE, and CAPEC entries, we select the following attributes as common fields:

Vulnerability type
Identifier (ID)
Description
Related vulnerabilities

These fields are stored in a Neo4j graph database with a uniform schema. We extract each entity’s attributes into key—value pairs.

3.8. Dynamic Updates and Front-End Visualization

We embed a dynamic update mechanism and large-model inference module to automatically complete missing relations. All generated candidate relations are constrained to the nine predefined relation types to ensure consistency and domain specificity.

For visualization, we connect Neo4j to a GraphXR front-end. The connection parameters (host, port, database name, username, password) are entered into GraphXR’s data source settings. After successful connection, data are imported and fetched via Cypher queries for interactive exploration and analysis.

4. Results

Based on our methodology, we conducted experiments to verify the effectiveness of the domain-dictionary–based vulnerability named entity recognition model and the relation extraction model in the field of cybersecurity. In particular, this chapter focuses on demonstrating how to predict vulnerability relations and complete the knowledge graph by integrating large models (e.g., GPT-3, BERT) with the GAT model. We introduce the experimental environment setup, the datasets, and the experimental procedures, and compare our proposed models against existing ones to validate their efficiency.

4.1. Experimental Environment Setup

The purpose of the experiments in this chapter was to verify that the proposed relation extraction model and the text-enhanced GAT model integrating large language models outperformed traditional machine learning models, and to compare them against conventional approaches in order to highlight the advantages of our design. The hardware configuration used for the simulation experiments is shown in Table 1.

Implementation Details: For the NER model, we employed a 2-layer BiLSTM with a hidden size of 200, using pre-trained 300-dimensional GloVe embeddings to initialize the word vectors. We applied a dropout rate of 0.5 during training to prevent overfitting. The CRF layer decoded the best label sequence. Our cybersecurity dictionary contributed an additional binary feature for each token, indicating dictionary matches. The relation extraction CNN used 128 filters of size 3 and an attention mechanism over the token-level features; it was trained with cross-entropy loss to classify relation types. For the GAT in link prediction, we used 2 graph attention layers, each with 8 attention heads and 128-dimensional node feature vectors (TransH + BERT embeddings concatenated). Training hyperparameters (learning rate

1 \times 10^{- 3}

for NER/CNN,

1 \times 10^{- 4}

for GAT, batch size 32) were tuned on a validation set. We accessed GPT-3 via OpenAI’s API with max_tokens=150 and moderated temperature as described, and limited calls to at most 5 per vulnerability to manage costs.

4.2. Domain-Dictionary Corrected BiLSTM-CRF Experiments

4.2.1. Dataset and Experimental Setup

During the ontology design phase, we constructed the domain ontology shown in Figure 2 based on the current vulnerability dataset. In this experiment, we defined four NER task categories: Vendor, Product, Vulnerability, and Weakness. Each category targeted the corresponding entity mentions in unstructured text. Additionally, we compared our approach against three baseline models—CRF [35], BiLSTM [36], and BiLSTM-CRF [37]—evaluating all methods on the same dataset for both training and testing. A total of 4002 named entities were annotated across 1092 threat reports. For reports containing vulnerability names, types, and targets, we treated vulnerability name recognition as the primary NER task.

We employed cross-validation by splitting the original dataset into training, validation, and test sets with a ratio of 7:2:1. The model was trained on the training set, and hyperparameters were tuned on the validation set. To handle the mixed Chinese–English entities common in cybersecurity texts, we pre-trained the character embeddings on a Chinese network security threat corpus using word2vec, and used these embeddings to initialize the NER model. Both the character embedding size and the BiLSTM hidden size were set to 100. We used the Adam optimizer with an initial learning rate of 0.001 and a batch size of 64. To prevent overfitting, we applied a dropout rate of 0.5 to both the embedding layer and the BiLSTM layer. Following the exact-match evaluation for NER, a prediction was considered correct only if both entity boundaries and entity categories matched the ground truth.

4.2.2. Evaluation Metrics

We used Precision (P), Recall (R), and F1–score as the evaluation metrics for the vulnerability entity recognition experiments. Precision measured the proportion of correctly identified entities among all predicted entities, while Recall measured the proportion of correctly identified entities among all true entities in the dataset. The definitions of TP, FP, and FN are given in Table 2. The F1–score is the harmonic mean of Precision and Recall and provides an overall measure of model accuracy. The metrics are computed as follows:

P = \frac{TP}{TP + FP}

(18)

R = \frac{TP}{TP + FN}

(19)

F 1 = 2 \times \frac{P \times R}{P + R}

(20)

4.2.3. Experimental Results

The BiLSTM model outperformed the CRF model in [35] in both Recall and Precision metrics, achieving a 5.45% higher Precision (Figure 6). This improvement was attributed to BiLSTM’s bidirectional deep learning of contextual information, which captured both forward and backward dependencies, whereas CRF only learned local feature–label dependencies, without directly leveraging bidirectional context. The BiLSTM–CRF model in [37] achieved a significantly higher Recall (VUL) and Precision (VUL) than both the BiLSTM [36] and CRF [35] baselines, since it jointly considered contextual features and label constraints. Compared to BiLSTM–CRF, our domain-dictionary–corrected BiLSTM–CRF method showed further gains of 3.14% in Recall (VUL) and 3.56% in Precision (VUL), owing to the dictionary correction mechanism’s in-depth analysis of domain terminology and post-processing judgment, yielding the best overall performance.

4.3. Attention-Based CNN Experiments

4.3.1. Dataset and Experimental Setup

We used vulnerability information extracted from unstructured Chinese network security threat analysis reports to train the relation extraction model, aiming to identify entities and extract relations such as “ParentOf”, “ChildOf”, and “BelongOf” that captured the connections between vulnerabilities. The annotated dataset was split into training, validation, and test sets in a 7:2:1 ratio. Table 3 lists the hyperparameters used during training.

This method used two sets of comparative experiments to deeply investigate the impact of embedding methods and model choices on the performance of the relation extraction task. The first set of experiments focused on the effect of static word embeddings on the CNN model’s performance. We selected two different static embeddings—CBOW (Word2Vec) and GloVe—both of which generate word vectors by learning from large text corpora but via different mechanisms. These embeddings were applied to the CNN model and trained and tested on the same dataset to compare their influence on model performance.

The second set of experiments, under the same embedding method, compared the performance of different models for relation extraction. We chose several common relation extraction architectures—CNN, RNN [38], and LSTM [39]—and applied them to the same static embeddings and dataset, then compared their performance in the relation extraction task.

4.3.2. Experimental Results

The performance of the CNN model using two different static embeddings is shown in Figure 7, illustrating the differences between CBOW and GloVe embeddings. The CNN model with GloVe embedding achieved an average Precision of 87.5%, Recall of 93.4%, and F1 of 90.4%, while the CNN model with CBOW embedding attained an average Precision of 86.4%, Recall of 92.1%, and F1 of 89.2%. Error bars for each metric were also computed to assess the reliability of the results. The CNN with GloVe embeddings outperformed the CNN with CBOW embeddings.

These results indicate that different types of static word embeddings exhibited varying performance levels in domain-specific applications. The superior performance of our CNN-based model was due to its ability to capture local features around the given entity pair—critical for relation representation. CBOW predicts the target word based on its local context, focusing primarily on immediate neighboring words. In contrast, GloVe is a count-based model that leverages global statistical information, generating embeddings from word co-occurrence statistics across the entire corpus, thus capturing broader word relationships. While both CBOW and GloVe require large volumes of data during training, GloVe typically trains faster because its process involves sparse matrix factorization, which is often more efficient than the neural network training used by CBOW.

The precision–recall curves for our Attention-based CNN models are shown in Figure 8 and Figure 9, illustrating the trade-off between precision and recall at different probability thresholds and reporting the area under the curve (AUC). In Figure 8, using GloVe embeddings yielded an AUC of 0.975, while in Figure 9, using CBOW embeddings yielded an AUC of 0.973. These high and similar AUC values indicate that both types of static embeddings performed comparably in the cybersecurity domain, motivating future work on exploring dynamic embeddings.

Figure 10 and Figure 11 compares relation extraction results using different sequence models under the same embedding. In Figure 10 with GloVe embeddings, the RNN model achieved an average Precision of 84.4%, Recall of 90.1%, and F1 of 87.2%, while the LSTM model achieved 85.5%, 91.2%, and 88.1%, respectively. In Figure 11 with CBOW embeddings, the RNN model obtained 80.5% Precision, 85.1% Recall, and 82.7% F1, and the LSTM model obtained 81.6%, 86.1%, and 83.5%.

Across all static embedding methods, the CNN consistently outperformed the RNN and LSTM in the cybersecurity relation extraction task. This superiority was mainly due to the CNN’s ability to effectively capture local patterns and features—critical for relation extraction, where relationships often manifest as localized n-gram patterns between entity pairs. The parameter sharing mechanism in a CNN reduces the number of parameters, enhancing training efficiency and generalization, which is especially beneficial for high-dimensional, complex cybersecurity data. In contrast, LSTM excels at modeling long-range dependencies but is less effective at learning fine-grained local features. A RNN suffers from higher computational complexity and issues such as vanishing or exploding gradients due to its recurrent structure, which can hinder training. Therefore, for cybersecurity relation extraction tasks, the CNN’s strong local feature learning and efficient parameter sharing yield better overall performance.

4.4. GAT and LLM–Based Vulnerability Relation Prediction

4.4.1. Dataset and Experimental Setup

The dataset comprised existing vulnerability triples, covering nine relation types, as shown in Table 4 and Table 5. We implemented the text-enhanced GAT model in PyTorch 1.13.1. The GAT weight decay was set to

5 \times 10^{- 6}

, convolutional layer weight decay to

1 \times 10^{- 5}

, learning rate to

1 \times 10^{- 3}

, and batch size to 128. A dropout rate of 0.5 was applied to the convolutional layers to mitigate overfitting, and Adam was used as the optimizer.

For the large models, GPT-3 and BERT, we first generated candidate vulnerability relation triples with GPT-3 using custom prompts to elicit potential relations. We then semantically validated the generated triples using a fine-tuned BERT model by computing semantic similarity scores to ensure relation plausibility. Finally, the combined GAT + LLM system performed link prediction to infer and complete vulnerability relations.

4.4.2. Evaluation Metrics

We evaluated the link prediction using Mean Reciprocal Rank (MRR), Mean Rank (MR), and Hits@n, common metrics in knowledge graph embedding tasks. Let S be the set of test triples and

| S |

its cardinality; for each triple i, let

{rank}_{i}

denote its predicted rank. MRR measures the average reciprocal rank of the correct entity, with higher values indicating better performance.

MRR = \frac{1}{| S |} \sum_{i = 1}^{| S |} \frac{1}{{rank}_{i}}

(21)

MR measures the average rank (lower is better).

MR = \frac{1}{| S |} \sum_{i = 1}^{| S |} {rank}_{i}

(22)

Hits@n computes the proportion of correct entities ranked within the top n; in our experiments, we reported Hits@1, Hits@3, and Hits@10.

Hits @ n = \frac{1}{| S |} \sum_{i = 1}^{| S |} 1 ({rank}_{i} \leq n)

(23)

4.4.3. Experimental Results

The TransH model was used as the baseline and compared against the Text-Enhanced GAT model. Figure 12 and Figure 13 illustrates the MR and MRR values for TransH and Text-Enhanced GAT across the different link prediction tasks. Table 6 compares the Hits@1, Hits@3, and Hits@10 values for both models. The results show that the Text-Enhanced GAT model located the correct entity more accurately than TransH. In relation prediction, the MR decreased from 71 to 33, and the MRR increased from 0.597 to 0.721, outperforming head entity prediction (MRR = 0.541) and tail entity prediction (MRR = 0.559).

When predicting missing relations, Hits@1, Hits@3, and Hits@10 increased by 0.137, 0.116, and 0.101, respectively, demonstrating that the Text-Enhanced GAT model outperformed the TransH baseline in the missing relation prediction task. This indicates that the Text-Enhanced GAT model not only encoded textual descriptions of entities, but also extracted more beneficial relations from the 2-hop neighbors. By combining GPT-3—generated candidate relations with BERT semantic validation, the model effectively enhanced prediction credibility. GPT-3 can infer potential relations that are difficult for traditional methods to uncover, while BERT further verifies their semantic plausibility, ensuring that the generated triples align with real cybersecurity contexts. Based on this strategy, the experimental results show that the GAT combined with large language models significantly improved the prediction precision and recall compared to the traditional GAT model.

Ablation Analysis: We conducted ablation studies to quantify the contribution of each component. For entity extraction, removing the cybersecurity dictionary from the NER model caused the F1-score to drop from

90.1 %

to

84.6 %

, confirming the dictionary’s vital role in recognizing domain-specific entities (especially vendor and product names). For link prediction, we evaluated three settings: using only the GAT module, using only GPT-3 (with BERT filtering), and our hybrid approach. The GAT-only model achieved a

Hits @ 1

of

0.421

, while the GPT-3-only approach (treating GPT suggestions above threshold as final) achieved a

Hits @ 1

of

0.433

; in contrast, the combined framework reached a

Hits @ 1

of

0.558

. Similar trends were observed for

Hits @ 3

and

Hits @ 10

. This demonstrates that neither the GAT nor LLM alone was as effective as their combination—the GAT excelled at capturing obvious structural links, whereas the LLM proposed additional insightful connections, and together they yielded the best results. These ablation results highlight that each component of our system (dictionary-augmented NER, GAT, and LLM reasoning) provided a measurable performance gain.

5. Discussion

Our results show that integrating structured data, domain-enhanced extraction, and hybrid link prediction produces a far more complete vulnerability knowledge graph than structured sources alone. The fusion of a text-enhanced GAT with LLM inference uncovers latent parent–child chains and cross-vulnerability links often missed by conventional methods. Moreover, our system supports dynamic updates: as new CVEs or threat reports appear, it automatically ingests, parses, and enriches the graph, “learning” new terminology, adapting its ontology, and keeping relationships current. We foresee its integration into SIEM or threat-intelligence platforms to provide analysts with a real-time, queryable knowledge base. A key limitation is the reliance on GPT-3, impacting reproducibility and cost; future work will explore fine-tuning smaller open-source LMs locally. This would also allow more direct integration with the GAT. For example, the MuKDC framework, which utilizes multi-level knowledge distillation to generate supplementary knowledge in a sample-less environment to mitigate the data sparsity problem, has achieved SOTA performance on both multimodal and unimodal FKGC datasets [40]. In addition, the LLM’s initial attempt to classify nodes with few samples on incomplete graphs showed potential, which can provide new ideas for node labeling of sparse data in security scenarios [41]. Another area for improvement is scalability. In recent years, the Temporal Knowledge Graph Completion (TKGC) method has achieved fine-grained modeling of entity evolution by fusing higher-order topology and attribute information, which provides a powerful reference for dynamic updating [42]. Our current knowledge graph, with a few thousand nodes, is easily handled, but if we scale to tens of thousands of vulnerabilities and multiple languages for reports, performance could become a bottleneck. We might need to employ more efficient graph algorithms or indexing for quick updates.

Generality of the Approach: Our proposed framework was designed with generalizability in mind. The use of a standardized cybersecurity ontology and flexible data ingestion pipeline means that the system can incorporate new data sources or formats with minimal changes—for instance, it can integrate additional national vulnerability databases or incident report feeds by mapping their fields to our ontology. Moreover, because our NER and relation extraction models are domain-tuned (and can be retrained on new corpora if needed), and as our link prediction combines language-agnostic graph patterns with semantic reasoning, the approach can adapt to different scenarios (such as enterprise-specific vulnerability data or industry-specific threat intel) and even to other languages. Early experiments indicated that, with modest retraining (e.g., using multilingual embeddings and an expanded dictionary), our pipeline successfully processed non-English vulnerability reports. However, we acknowledge that extreme domain shifts (e.g., entirely new types of cybersecurity data) may require additional tuning or ontology extension.

Scalability to Large-Scale and Multilingual Data: Our architecture supports scaling to larger knowledge graphs—the modular pipeline allows incremental updates, so new data can be ingested without retraining the entire system from scratch. In practice, as the graph grows, we could distribute processing (e.g., run NER and relation extraction on multiple nodes, partition the graph for parallel GAT computation) to maintain efficiency. We are also exploring using faster distilled language models for link inference to reduce dependency on the costly GPT-3 API when handling millions of nodes. Regarding multilingual capability, the current system primarily processes English (and some structured fields from CNVD). To extend to other languages (e.g., Chinese vulnerability descriptions), we could leverage multilingual NER models or machine translation. Our framework could incorporate a multilingual dictionary and use language-specific embeddings (or a multilingual BERT), so that non-English text is normalized to the ontology. While these adaptations are feasible (and we have observed promising results on a small Chinese dataset after integrating a Chinese NER module), we acknowledge potential limitations: language nuances or lack of training data in some languages might affect the extraction accuracy, and integrating vastly more data could introduce performance bottlenecks without further optimization. We consider addressing these issues as important future work to improve the system’s universality.

6. Conclusions

We have presented a novel approach for constructing a comprehensive vulnerability knowledge graph by integrating multi-source data with advanced AI techniques. Our system unifies structured databases and unstructured texts under a custom cybersecurity ontology and employs domain-specific NER and RE models to extract critical information. A text-enhanced Graph Attention Network, combined with large language model reasoning via GPT-3 and BERT validation, effectively predicts missing links and enriches the graph. Extensive experiments demonstrated marked improvements in graph completeness and link prediction performance compared to baseline methods, making the resulting system highly beneficial for proactive cybersecurity analysis and defense. Future Work: We plan to further refine the inference modules; for example, exploring few-shot fine-tuning of the LLM for more accurate relation suggestions, and incorporating temporal dynamics (handling time-validity of vulnerabilities and patches) into the knowledge graph. We are also interested in integrating additional data modalities, such as network scan results or exploiting proof-of-concept code, to broaden the scope of the knowledge graph. We believe this work can lay a foundation for intelligent cybersecurity knowledge systems that learn and adapt continuously, helping to maintain an up-to-date and robust knowledge graph.

Author Contributions

Conceptualization, R.L.; Methodology, R.L.; Validation, R.L. and Y.X. (Yaxuan Xie); Formal analysis, Y.X. (Yaxuan Xie); Investigation, Y.X. (Yaxuan Xie); Resources, Z.D.; Data curation, J.H. and X.Q.; Writing—review & editing, Y.X. (Yongcai Xiao); Supervision, C.P. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2023YFE0111100), National Natural Science Foundation of China under Grant (No. 62272370), Young Elite Scientists Sponsorship Program by CAST (2022QNRC001), the China 111Project (No. B16037), Qinchuangyuan Scientist + Engineer Team Program of Shaanxi (No. 2024QCY-KXJ-149), Songshan Laboratory (No. 241110210200), Open Foundation of Key Laboratory of Cyberspace Security, Ministry of Education of China (No. KLCS20240405), the Fundamental Research Funds for the Central Universities (QTZX23071).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Yongcai Xiao was employed by the company State Grid Jiangxi Electric Power Research Institute; Author Xiaowen Quan was employed by the company Yuanjiang Shengbang Safety Technology Group Co. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xue, R.; Tang, P.; Fang, S. Prediction of computer network security situation based on association rules mining. Wirel. Commun. Mob. Comput. 2022, 2022, 1–9. [Google Scholar] [CrossRef]
Zeng, Z.R.; Peng, W.; Zeng, D.; Chen, Y. Intrusion detection framework based on causal reasoning for DDoS. J. Inf. Secur. Appl. 2022, 65, 103124. [Google Scholar] [CrossRef]
Yu, D.; Zhu, C.; Yang, Y.; Zengmm, M. Jaket: Joint pre-training of knowledge graph and language understanding. In Proceedings of the 36th AAAI Conference on Artificial Intelligence, Virtual Event, 22 February–1 March 2022; pp. 11630–11638. [Google Scholar]
Zhao, X.; Jiang, R.; Han, Y.; Li, A.; Peng, Z. A survey on cybersecurity knowledge graph construction. Comput. Secur. 2024, 136, 103524. [Google Scholar] [CrossRef]
Jiao, J.; Li, W.; Guo, D. The Vulnerability Relationship Prediction Research for Network Risk Assessment. Electronics 2024, 13, 3350. [Google Scholar] [CrossRef]
Rastogi, N.; Dutta, S.; Zaki, M.J.; Gittens, A.; Aggarwal, C. Malont: An ontology for malware threat intelligence. In Proceedings of the International Workshop on Deployable ML for Security Defense, Virtual Event, 24 August 2020; pp. 28–44. [Google Scholar]
Zhao, J.; Yan, Q.; Li, J.; Shao, M.; He, Z.; Bo Li, B. TIMiner: Automatically extracting and analyzing categorized cyber threat intelligence from social data. Comput. Secur. 2020, 95, 101867. [Google Scholar] [CrossRef]
Guo, Y.; Liu, Z.; Huang, C.; Liu, J.; Jing, W.; Wang, Z.; Wang, Y. CyberRel: Joint entity and relation extraction for cybersecurity concepts. In Proceedings of the 23rd International Conference on Information and Communications Security (ICICS), Part I, Chongqing, China, 19–22 August 2021; pp. 447–463. [Google Scholar]
Bouarroudj, W.; Boufaida, Z.; Bellatreche, L. Named entity disambiguation in short texts over knowledge graphs. Knowl. Inf. Syst. 2022, 64, 325–351. [Google Scholar] [CrossRef]
Mahdisoltani, F.; Biega, J.; Suchanek, F.M. YAGO3: A Knowledge Base from Multilingual Wikipedias. In Proceedings of the 9th Biennial Conference on Innovative Data Systems Research (CIDR), Asilomar, CA, USA, 6–9 January 2013. [Google Scholar]
Can, O.; Unalır, M.Ö.; Sezer, E.; Akar, G. An Ontology Based Approach for Host Intrusion Detection Systems. In Proceedings of the Metadata Semantic Research: 11th Int. Conf. MTSR 2017, Tallinn, Estonia, 28 November–1 December 2017; pp. 80–86. [Google Scholar]
Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
OpenAI. GPT-4 Technical Report. arXiv 2023, arXiv:2303.08774.
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. LLaMA: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
Zhang, J.; Bu, H.; Wen, H.; Liu, Y.; Fei, H.; Xi, R.; Li, L.; Yang, Y.; Zhu, H.; Meng, D. When LLMs meet cybersecurity: A systematic literature review. Cybersecurity 2025, 8, 55. [Google Scholar] [CrossRef]
Auer, S.; Bizer, C.; Kobilarov, G.; Lehmann, J.; Cyganiak, R.; Ives, Z. DBpedia: A nucleus for a web of open data. In Proceedings of the 6th International Semantic Web Conference, Busan, Korea, 11–15 November 2007; pp. 722–735. [Google Scholar]
Vrandečić, D. Wikidata: A new platform for collaborative data collection. In Proceedings of the 21st International Conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 1063–1064. [Google Scholar]
Färber, M.; Bartscherer, F.; Menne, C.; Rettinger, A. Linked Data Quality of DBpedia, Freebase, OpenCyc, Wikidata, and YAGO. Semant. Web 2018, 9, 77–129. [Google Scholar] [CrossRef]
Xu, B.; Xu, Y.; Liang, J.; Xie, C.; Liang, B.; Cui, W.; Xiao, Y. CN-DBpedia: A never-ending Chinese knowledge extraction system. In Proceedings of the 30th International Conference on Industrial Engineering and Other Applications of Applied Intelligent Systems, Arras, France, 27–30 June 2017; pp. 428–438. [Google Scholar]
Jia, Y.; Qi, Y.; Shang, H.; Jiang, R. A Practical Approach to Constructing a Knowledge Graph for Cybersecurity. Engineering 2018, 4, 53–60. [Google Scholar] [CrossRef]
Sun, Y.; Lin, D.; Song, H.; Yan, M.; Cao, L. A Method to Construct Vulnerability Knowledge Graph based on Heterogeneous Data. In Proceedings of the 16th International Conference on Mobility, Sensing and Networking (MSN), Tokyo, Japan, 17–19 December 2020. [Google Scholar] [CrossRef]
Bordes, A.; Usunier, N.; Garcia-Durán, A.; Weston, J.; Yakhnenko, O. Translating Embeddings for Modeling Multi-Relational Data. In Proceedings of the 27th Annual Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26. [Google Scholar]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge Graph Embedding by Translating on Hyperplanes. In Proceedings of the 28th AAAI Conference on Artificial Intelligence, Quebec City, QC, Canada, 27–31 July 2014; Volume 28, p. 1. [Google Scholar]
Yin, J.; Chen, G.; Hong, W.; Wang, H.; Cao, J.; Miao, Y. Empowering Vulnerability Prioritization: A Heterogeneous Graph-Driven Framework for Exploitability Prediction. In Proceedings of the 24th International Conference on Web Information Systems Engineering–WISE 2023, Melbourne, VIC, Australia, 25–27 October 2023; pp. 289–299. [Google Scholar]
Yue, P.; Tang, H.; Li, W.; Zhang, W.; Yan, B. MLKGC: Large Language Models for Knowledge Graph Completion Under Multimodal Augmentation. Mathmatics 2025, 13, 1463. [Google Scholar] [CrossRef]
Han, Z.; Li, X.; Liu, H.; Sun, F.; Zhang, N. Deepweak: Reasoning Common Software Weaknesses via Knowledge Graph Embedding. In Proceedings of the IEEE 25th International Conference on Software Analysis, Evolution and Reengineering (SANER), Campobasso, Italy, 20–23 March 2018; pp. 456–466. [Google Scholar]
Wei, Y.; Huang, Q.; Zhang, Y.; Kwok, J.T. KICGPT: Large Language Model with Knowledge in Context for Knowledge Graph Completion. In Proceedings of the Findings ACL EMNLP 2023, Singapore, 6–10 December 2023; pp. 8667–8683. [Google Scholar]
Cheng, X.; Sun, X.; Bo, L.; Wei, Y. KVS: A tool for knowledge-driven vulnerability searching. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), Singapore, 14–18 November 2022; pp. 1731–1735. [Google Scholar]
Guo, Y.; Liu, Z.; Huang, C.; Wang, N.; Min, H.; Guo, W.; Liu, J. A Framework for Threat Intelligence Extraction and Fusion. Comput. Secur. 2023, 132, 103371. [Google Scholar] [CrossRef]
Host, A.M.; Lison, P.; Moonen, L. Constructing a Knowledge Graph from Textual Descriptions of Software Vulnerabilities in the National Vulnerability Database. arXiv 2023, arXiv:2305.00382. [Google Scholar]
Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
Pennington, J.; Socher, R.; Manning, C.D. GloVe: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1532–1543. [Google Scholar]
Sorokoletova, O.; Antonioni, E.; Colò, G. Towards a scalable AI-driven framework for data-independent Cyber Threat Intelligence Information Extraction. arXiv 2025, arXiv:2501.06239. [Google Scholar]
Wang, J.; Zhu, T.; Xiong, C.; Chen, Y. MultiKG: Multi-Source Threat Intelligence Aggregation for High-Quality Knowledge Graph Representation of Attack Techniques. arXiv 2024, arXiv:2411.08359. [Google Scholar]
Wåreus, E.; Hell, M. Automated CPE Labeling of CVE Summaries with Machine Learning. In Proceedings of the Detection of Intrusions and Malware, and Vulnerability Assessment: 17th Int. Conf. DIMVA 2020, Lisbon, Portugal, 24–26 June 2020; pp. 3–22. [Google Scholar]
Wu, H.; Li, X.; Gao, Y. An Effective Approach of Named Entity Recognition for Cyber Threat Intelligence. In Proceedings of the 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC), Chongqing, China, 12–14 June 2020; Volume 1, pp. 1370–1374. [Google Scholar]
Ma, P.; Jiang, B.; Lu, Z.; Xue, M.; Shi, X. Cybersecurity Named Entity Recognition Using Bidirectional Long Short-Term Memory with Conditional Random Fields. Tsinghua Sci. Technol. 2020, 26, 259–265. [Google Scholar] [CrossRef]
Wang, X.; El-Gohary, N. Deep Learning-Based Relation Extraction and Knowledge Graph-Based Representation of Construction Safety Requirements. Autom. Constr. 2023, 147, 104696. [Google Scholar] [CrossRef]
Geng, Z.; Chen, G.; Han, Y.; Liu, X.; Shi, C. Semantic Relation Extraction Using Sequential and Tree-Structured LSTM with Attention. Inf. Sci. 2020, 509, 183–192. [Google Scholar] [CrossRef]
Li, Q.; Chen, Z.; Ji, C.; Jiang, S.; Li, J. LLM-based Multi-Level Knowledge Generation for Few-Shot Knowledge Graph Completion. In Proceedings of the 33rd International Joint Conference on Artificial Intelligence (IJCAI 2024), Yokohama, Japan, 3–9 August 2024; pp. 2135–2143. [Google Scholar]
Li, Y.; Yang, Y.; Zhu, J.; Chen, H.; Wang, H. LLM-Empowered Few-Shot Node Classification on Incomplete Graphs with Real Node Degrees. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management (CIKM), Boise, ID, USA, 18–22 October 2024. [Google Scholar]
Wen, M.; Mei, H.; Wang, W.; Zhang, X. Enhanced Temporal Knowledge Graph Completion via Learning High-Order Connectivity and Attribute Information. Appl. Sci. 2023, 13, 12392. [Google Scholar] [CrossRef]

Figure 1. Overall system workflow.

Figure 2. NER performance comparison across models for Vendor, Product, Vulnerability, and Weakness.

Figure 3. Simplified BiLSTM-CRF model diagram.

Figure 4. Overview of the security knowledge graph.

Figure 5. Example triple structure.

Figure 6. Comparison of model performance values: (a) recall, (b) precision, (c) F₁.

Figure 7. Comparison of CNN performance with CBOW and GloVe embeddings.

Figure 8. Attention-based CNN with GloVe (AUC = 0.975).

Figure 9. Attention-based CNN with CBOW (AUC = 0.973).

Figure 10. Relation extraction with GloVe embeddings: RNN (P = 84.4%, R = 90.1%, F1 = 87.2%); LSTM (P = 85.5%, R = 91.2%, F1 = 88.1%).

Figure 11. Relation extraction with CBOW embeddings: RNN (P = 80.5%, R = 85.1%, F1 = 82.7%); LSTM (P = 81.6%, R = 86.1%, F1 = 83.5%).

Figure 12. Relation extraction performance comparison (Part 1): model MR values across different prediction tasks.

Figure 13. Relation extraction performance comparison (Part 2): model MRR values across different prediction tasks.

Table 1. Hardware configuration.

Hardware Component	Specification
Operating System	Ubuntu 22.04.1 LTS
Processor	Intel Xeon^® Gold 6146 @ 3.20 GHz
Memory	128 GB RAM
Storage	256 GB SSD
GPU	NVIDIA GeForce RTX 3090, NVIDIA Corporation, Santa Clara, CA, USA

Table 2. Parameter definitions.

Abbreviation	Definition
TP (True Positives)	Number of correctly identified positive samples
FP (False Positives)	Number of incorrectly identified positive samples
FN (False Negatives)	Number of incorrectly identified negative samples

Table 3. Hyperparameters for attention-based CNN.

Hyperparameter	Value
Dropout rate	0.5
Word embedding dimension	300
Position embedding dimension	50
POS embedding dimension	15
Optimizer and Epoch	SGD, as set per experiment

Table 4. Intra-type vulnerability relations (six types).

Relation	Description	Count
ChildOf	hierarchical child	1436
ParentOf	hierarchical parent	1436
CanPrecede	may precede another	156
CanFollow	may follow another	220
PeerOf	peer relation	196
Semantic	semantic similarity	654

Table 5. Cross-type vulnerability relations (three types).

Relation	Description	Count
BelongOf	CVE → CWE	3975
AttackOf	CWE → CAPEC	2486
TargetOf	CWE → CAPEC	3215

Table 6. Hits@n comparison.

Prediction	Model	Hits@1	Hits@3	Hits@10
Head Entity	TransH	0.458	0.567	0.632
	Text-Enhanced GAT	0.588	0.632	0.726
Relation	TransH	0.432	0.543	0.691
	Text-Enhanced GAT	0.569	0.659	0.792
Tail Entity	TransH	0.451	0.581	0.612
	Text-Enhanced GAT	0.579	0.702	0.796

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, R.; Xie, Y.; Dang, Z.; Hao, J.; Quan, X.; Xiao, Y.; Peng, C. Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning. Electronics 2025, 14, 2334. https://doi.org/10.3390/electronics14122334

AMA Style

Liu R, Xie Y, Dang Z, Hao J, Quan X, Xiao Y, Peng C. Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning. Electronics. 2025; 14(12):2334. https://doi.org/10.3390/electronics14122334

Chicago/Turabian Style

Liu, Ruitong, Yaxuan Xie, Zexu Dang, Jinyi Hao, Xiaowen Quan, Yongcai Xiao, and Chunlei Peng. 2025. "Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning" Electronics 14, no. 12: 2334. https://doi.org/10.3390/electronics14122334

APA Style

Liu, R., Xie, Y., Dang, Z., Hao, J., Quan, X., Xiao, Y., & Peng, C. (2025). Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning. Electronics, 14(12), 2334. https://doi.org/10.3390/electronics14122334

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Dynamic Vulnerability Knowledge Graph Construction via Multi-Source Data Fusion and Large Language Model Reasoning

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Data Collection and Preprocessing

3.1.1. Data Sources and Ontology Integration

3.1.2. Data Cleaning and Normalization

3.2. Domain-Specific Information Extraction

Named Entity Recognition (NER)

3.3. Graph Embedding and Link Prediction

3.3.1. Relation Extraction (RE)

3.3.2. TransH Embedding

3.3.3. Text-Enhanced Graph Attention Network (GAT)

3.4. LLM-Based Relation Inference and Fusion

3.4.1. GPT-3 Candidate Generation

3.4.2. BERT-Based Validation

3.4.3. Knowledge Completion and Fusion Strategy

3.5. System Deployment and Update Strategy

3.6. Security Knowledge Graph Construction

3.7. Data Modeling and Storage

3.8. Dynamic Updates and Front-End Visualization

4. Results

4.1. Experimental Environment Setup

4.2. Domain-Dictionary Corrected BiLSTM-CRF Experiments

4.2.1. Dataset and Experimental Setup

4.2.2. Evaluation Metrics

4.2.3. Experimental Results

4.3. Attention-Based CNN Experiments

4.3.1. Dataset and Experimental Setup

4.3.2. Experimental Results

4.4. GAT and LLM–Based Vulnerability Relation Prediction

4.4.1. Dataset and Experimental Setup

4.4.2. Evaluation Metrics

4.4.3. Experimental Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI