SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs

Wang, Bohan; Yu, Xuhong; Zheng, Xin

doi:10.3390/electronics15020278

Open AccessArticle

SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs

by

Bohan Wang

^1,2,

Xuhong Yu

^1,2 and

Xin Zheng

^1,2,*

¹

Guizhou Key Laboratory of Advanced Computing, Guizhou Normal University, Guiyang 550001, China

²

School of Cyber Science and Technology, Guizhou Normal University, Guiyang 550001, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 278; https://doi.org/10.3390/electronics15020278

Submission received: 16 November 2025 / Revised: 18 December 2025 / Accepted: 18 December 2025 / Published: 7 January 2026

(This article belongs to the Special Issue Artificial Intelligence and Big Data Processing: Transforming Industrial Applications)

Download

Browse Figures

Versions Notes

Abstract

Current mainstream SQL generation approaches remain insufficient in capturing the semantic information of structured data and handling complex query tasks. To address the challenges of hallucination and accuracy degradation in large language model (LLM)-based SQL generation, this paper proposes an enhanced SQL generation framework that integrates knowledge graphs with large language models. The proposed method introduces an SQL-KG-Verifier module, which synergizes the structured information of knowledge graphs with the generative capabilities of LLMs. By incorporating the verifier, the framework collaboratively refines SQL statements to ensure higher structural consistency, improved accuracy, and enhanced interpretability. Specifically, the verifier employs the entities and relational information retrieved from the knowledge graph as proxies to validate and revise the model outputs, effectively reducing generation errors. Experimental results demonstrate that on the BIRD and Spider datasets, the proposed method achieves execution accuracies of 52.71% and 77.26%, respectively—representing improvements of 19.93 and 21.27 percentage points over baseline models. Moreover, the approach exhibits superior adaptability and generation performance in complex and domain-specific query scenarios.

Keywords:

large language model; knowledge graph; SQL generation; validator

1. Introduction

With the rapid evolution of information technology, the scale and complexity of data have continued to grow, making informatization a critical driving force behind societal operations and economic development. Against this backdrop, how to efficiently manage and exploit massive volumes of data—particularly unstructured data such as text, images, and videos—has become a central challenge for both academia and industry. Although traditional Database Management Systems (DBMSs) perform effectively in handling structured data and routine query tasks, they exhibit significant limitations in areas such as semantic understanding, complex logical reasoning, and cross-modal data integration [1]. For instance, unstructured data often pose difficulties in storage, semantic-based retrieval, and logical inference due to their lack of explicit schema and inherent heterogeneity. These challenges have motivated researchers to explore new intelligent methodologies aimed at enhancing the efficiency of data organization and utilization.

In recent years, the emergence of Large Language Models (LLMs [2]) has led to groundbreaking progress in natural language processing tasks. Representative transformer-based [3] models such as GPT, Qwen, and LLaMA have demonstrated exceptional capabilities in language comprehension and generation through large-scale pretraining, providing powerful tools for extracting and representing complex semantic information.

However, despite their remarkable performance, LLMs still exhibit notable shortcomings in terms of logical consistency, domain knowledge coverage, and structured query generation. Common issues include semantic hallucinations, syntactic or structural errors, and a lack of interpretability. For instance, in Text-to-SQL tasks, LLMs may generate SQL statements containing non-existent table names or attributes, or produce queries without any factual or schema-based grounding. These limitations have prompted researchers to explore how the semantic processing strengths of LLMs can be effectively combined with external knowledge resources.

Against this backdrop, the Knowledge Graph (KG [4]) has gradually emerged as a crucial mechanism for enhancing the reasoning capability and interpretability of LLMs. By representing entities and their relationships through structured nodes and edges, knowledge graphs reveal latent connections within heterogeneous data and support sophisticated semantic reasoning. In recent years, novel architectures have been proposed that organically integrate the semantic understanding of large models with the inferential power of knowledge graphs, forming a closed-loop workflow encompassing semantic parsing, graph-based retrieval, path reasoning, and result generation. Such approaches not only strengthen the logical coherence and domain adaptability of LLMs but also significantly improve the interpretability and reliability of their generated outputs. Despite the initial progress made in integrating LLMs with KG, significant challenges remain in the context of structured query generation, particularly concerning model size, reasoning efficiency, and practical deployment environments. In scenarios that are resource-constrained or heavily reliant on domain-specific knowledge, the question of how to design LLMs-KG integration systems that are both lightweight and highly accurate remains an open and pressing issue.

While significant strides have been made in synergizing Large Language Models with Knowledge Graphs (KGs) for semantic parsing and structured query generation, critical challenges remain regarding system scalability, inference latency, and data security. The prohibitive computational overhead of state-of-the-art LLMs often renders them impractical for real-time deployment in resource-constrained environments. Furthermore, in high-stakes, knowledge-intensive domains such as the banking and finance sector [5], the reliance on general-purpose models raises concerns regarding hallucination risks and data sovereignty. Financial institutions require systems that are not only verifiable and logically consistent but also compliant with strict privacy regulations, often necessitating on-premise deployment where hardware resources are limited. However, reducing model size typically entails a degradation in reasoning capabilities, specifically in complex multi-hop reasoning tasks. Consequently, a pivotal open research problem lies in reconciling the trade-off between model performance and computational efficiency. The challenge is to design a lightweight yet architecture-aware system capable of leveraging the reasoning prowess of LLMs and the factual precision of KGs, ensuring high accuracy and reliability without incurring prohibitive operational costs [6].

To address these issues, this study focuses on SQL statement generation through the integration of LLMs and KGs. We propose an integrated architecture that employs the Think-on-Graph [7] methodology for proxy search in the Text-to-SQL task. Before conducting proxy retrieval, a vector database based semantic matching and filtering mechanism is applied to ensure more accurate context alignment. Furthermore, an SQL-KG-Verifier module is introduced, which leverages the generative capacity of large models to validate and refine the produced SQL statements. This verifier effectively mitigates hallucinations—such as the fabrication of non-existent table names, column names, or attributes—and performs structural correction while enhancing the handling of unstructured data such as textual inputs.

The proposed architecture consists of several key components, including the verifier implementation, prompt engineering, data preprocessing, entity recognition, relation extraction, knowledge graph construction, and SQL generation. By adopting the Think-on-Graph strategy, the framework identifies relevant schema elements—such as table and field names—within SQL statements, and integrates the SQL-KG-Verifier to determine whether valid relationships exist between these elements. This approach organically combines the reasoning ability of LLMs with the structured representation of KGs, aiming to enhance the accuracy and interpretability of medium- and small-scale LLMs in SQL query generation tasks.

2. Methodology

It should provide a concise and precise description of the experimental results, their interpretation, as well as the experimental conclusions that can be drawn.

2.1. Architecture Design

This study proposes a verification framework integrating Large Language Models with Knowledge Graph based retrieval to enhance the accuracy and interpretability of SQL generation. The framework aims to mitigate hallucination issues—such as the fabrication of non-existent tables or fields—commonly observed in LLM-generated SQL statements, while improving the overall reliability of the generation process. The system architecture, as illustrated in Figure 1, comprises five core components: an input module, a verifier module, a knowledge graph construction module, and a knowledge graph retrieval module, which together form a closed-loop mechanism for guided SQL generation.

To facilitate reproducibility, we detail the specific implementation parameters of our proposed framework. We utilized bge-large-zh as the embedding backbone, configured with an embedding dimension of 1024. The Knowledge Graph component was implemented using Neo4j Community Edition (v5.18.0). For efficient vector storage and retrieval, we deployed Milvus 2.4, employing an IVF_FLAT index with Cosine Similarity as the metric. The retrieval hyperparameter top-k was set to 10 for the initial retrieval stage. All experiments were conducted using the Qwen2.5 model series (evaluating the 7B, 32B, and 72B variants) as the foundational Large Language Models (LLMs). Inference was performed with a temperature setting of 0.1 to ensure deterministic outputs, and the maximum token length was constrained to 32,764. Computational tasks were executed on a hardware environment equipped with four NVIDIA RTX 4090 GPUs.

Figure 1 illustrates the overall architecture of the proposed knowledge graph-enhanced Text-to-SQL system. The framework comprises four main components: (1) an Input module that processes user queries and table structure prompts through information filtering and prompt splicing; (2) a Knowledge Graph Building module that extracts, cleans, and vectorizes relational database schemas for storage in a vector database; (3) a Validator module that performs information extraction, entity matching, rule verification, and reasoning using large language models; and (4) a Knowledge Graph Retrieval module that employs agent-based loop search to retrieve relevant schema elements including table names, field names, primary keys, foreign keys, and indices. The system incorporates an iterative error detection and repair mechanism with result feedback to ensure SQL generation accuracy.

2.2. Design of Input Module

The input module receives two types of inputs: (1) User Input, consisting of natural language queries expressing the user’s data retrieval intent, and (2) Database Schema Information, including table names, field names, data types, and relational structures extracted from the target database. The module leverages LLMs to parse these inputs and convert them into preliminary SQL statements along with associated query information. To maintain consistency throughout this paper, we define the following input terminology: (1) User Prompt: The natural language query submitted by the user. (2) Schema Context: The structured metadata (table names, field names, relationships) extracted from the relational database. (3) Combined Input: The concatenation of User Prompt and Schema Context, which serves as the actual input to the LLM for SQL generation. This process involves several key steps, including prompt engineering, extraction of table schema information, and structuring of the queries.

For the text-to-SQL application scenario, a set of general-purpose foundational prompts is designed. These prompts primarily include task specifications, SQL syntax requirements, and output format standards. Subsequently, table schema information from relational databases is extracted through an information filtering process. Metadata such as table names, column names, data types, and primary/foreign key relationships are embedded into the foundational prompts in a structured manner, providing the LLMs with the necessary knowledge to generate SQL statements.

For different database systems, the prompts can be adjusted during the information filtering stage to form a standardized input format, ensuring that the prompts remain synchronized with the target database environment. Finally, the dynamically adjusted structured prompts are concatenated with the foundational prompts to facilitate the generation of high-quality SQL queries. Table A1 presents examples of some of these prompts:

Query structuring refers to the process of analyzing user intent and generating syntactically correct, structured SQL statements with the LLMs based on the information provided in the prompts.

2.3. SQL-KG-Verifier

The proposed SQL-KG-Verifier is developed based on the concept of integrating LLMs with knowledge graph-enhanced retrieval, aiming to establish a systematic paradigm for SQL statement generation. This approach leverages the synergy between LLMs and KG to improve both the accuracy and interpretability of SQL generation.

This validator is a hybrid validator that combines two complementary validation methods: (1) hard matching based on knowledge graph structure, used for entity and relationship validation; and (2) semantic reasoning based on large language models, used to retrieve matching paths in the knowledge graph through context awareness. Unlike traditional validation mechanisms that rely solely on graph-based knowledge matching, our hybrid approach also leverages the semantic understanding capabilities of LLMs to assess logical consistency and contextual applicability. This dual validation strategy ensures both the structural accuracy and semantic coherence of the generated SQL statements. Specifically, this validator aims to verify the consistency between the output generated by the LLM and the underlying knowledge graph, while also ensuring compliance with database execution requirements. Specifically, it performs consistency validation of SQL statements, verifies their semantic accuracy, and implements error correction mechanisms. This ensures that the generated text aligns with both the structural and factual representations in the knowledge graph, checks adherence to semantic relationships encoded within the graph, and provides corrective strategies whenever discrepancies arise between the model outputs and the knowledge graph. The formal implementation is defined as follows:

Q^{'} = D (V (M (ε (Q)), K G), K G)

set:

Q

represents the SQL query statement generated by the large language models;

K G (E, R)

is a knowledge graph, where

E

is a set of entities,

R

is a set of relations;

ε (Q)

represents information extracted from SQL (tables, fields, relations, etc.);

M : ε (Q) \to (E, R)

illustrates the process of retrieving and matching information with a knowledge graph;

V

represents rule validation and reasoning functions;

D

represents error detection and correction functions;

Q^{'}

is the output of the corrected SQL query statement.

1.: Information Extraction and Matching:

M (ε (Q)) = {(e, r) | e \in E, r \in R, Match Successful}

2.: Rule Checking and Reasoning:

V (M (ε (Q)), K G = Verify Match Consistency

3.: Iterative error correction and systemic security assurance:

D (V (M (ε (Q)), K G), K G) = Q^{'}

(1): Information Extraction and Matching. The first step involves extracting key elements from the input SQL query, including table names, field names, and relational structures, as well as those referenced in the user query. These extracted components are then retrieved and matched against entity nodes in the knowledge graph, primarily for the purpose of entity disambiguation. The verifier conducts searches within both the knowledge graph and the vector database to confirm whether the corresponding entities—such as tables and fields—exist in the knowledge graph. This process ensures that the generated output is consistent with the factual knowledge encoded in the graph. Finally, the validated elements, including table names, field names, relationships, and entities, are passed on to the subsequent rule-checking and reasoning module.
(2): Rule Checking and Reasoning. Based on the structural information encoded in the knowledge graph, the verifier implements a multi-level rule-checking and reasoning mechanism consisting of three components: (1) Schema Constraint Validation: Verifies that referenced tables and fields exist in the database schema, checks data type compatibility between compared fields, and validates primary-foreign key relationships. (2) Semantic Consistency Rules: Ensures that JOIN conditions connect semantically related entities, validates that WHERE clause predicates are logically coherent, and checks for redundant or contradictory conditions. (3) Inference Based on Graphs and Large Language Models: The approach utilizes graphs to explore the information of neighboring nodes, and leverages large language models to summarize this neighboring information in order to infer the potential implicit relationships between entities. By combining the graph’s connectivity and the semantic summarization capability of the large language model, this method identifies potential missing JOIN conditions. When direct relationships are unavailable, it also suggests alternative query paths. Through these mechanisms, the verifier identifies potential conflicts between the input data and the encoded rules, thereby validating the consistency of the output with the factual knowledge stored in the graph. In addition, the reasoning capabilities of the knowledge graph are leveraged to locate relevant nodes and relational information that support the verification process. The detailed hard-matching procedure is illustrated in Table 1.

(3): Iterative error correction and systemic security assurance. By explicitly strengthening connections through three mechanisms—mapping relational table structures to entity nodes, aligning foreign key constraints with graph edges, and enriching field semantics via attribute annotations—the framework establishes a rigid structural alignment that serves a dual purpose. It facilitates a closed-loop refinement process: when discrepancies arise, the verifier isolates mismatched information and feeds specific error details back to the LLM for iterative optimization. Second, extending beyond accuracy, this architecture functions as a critical security layer that cross-references generated SQL against KG-encoded constraints to identify anomalies—such as hallucinations or adversarial inputs—and preemptively mitigate risks like SQL injection or unauthorized access. Ultimately, this integration guarantees a comprehensive Tri-fold Integrity (comprising syntactic executability, semantic fidelity, and schema adherence), thereby ensuring the trustworthiness required for deploying LLM systems in complex production environments.

The verifier implemented according to the above methodology is illustrated in Figure 2:

Figure 2 illustrates the overall architecture of the SQL-KG-Verifier, presenting a four-stage iterative pipeline designed for accuracy and self-correction. The process initiates at the Information Input layer, which ingests structured database schema data, including relationship constraints and field metadata. This input is subsequently processed by the Knowledge Graph Retrieval module, which validates the data via hard match proxy search and entity disambiguation mechanisms. The resulting statistical evidence is then evaluated by a Judging LLM. Upon successful validation, the system proceeds to outcome evaluation to generate the final SQL statement; conversely, a negative judgment triggers an Error Detection and Repair cycle, providing feedback to refine the input design and re-execute the pipeline.

2.4. Figures, Tables and Schemes

The construction of the knowledge graph is designed from two complementary perspectives: the conceptual layer and the data layer. This dual-layered approach ensures both the rationality of knowledge representation and the efficiency of data structures. The overall design logic is illustrated in Figure 3.

Figure 3 illustrates the two-layer architecture for constructing a knowledge graph from relational database schemas. The Conceptual Level defines the ontology through entity, relationship, and property definitions, producing an abstract graph model where table names serve as nodes connected by primary key and foreign key relationships to field name nodes. The Data Layer implements the actual construction pipeline, including database standardization, automated cleaning algorithms, structural optimization, memory configuration, and batch import via the Neo4j-admin import tool. The resulting instantiated knowledge graph is validated through Cypher (Neo4j-5.18) statement counting and Python (3.12) script comparison against the source database to ensure data integrity.

2.4.1. Conceptual Layer

The conceptual layer represents the schema design of the knowledge graph and is constructed in a top-down manner. It defines the entity types, relation types, attributes, and fundamental schema constraints within the knowledge graph.

When mapping relational table structures to the knowledge graph, we adopt a subject–predicate–object modeling approach, i.e., entity–relation–entity. The detailed modeling logic is as follows:

Table 2 presents the entities and their attributes in the conceptual layer of the knowledge graph after mapping.

Table 3 illustrates the subject–predicate–object triples that constitute the conceptual layer.

2.4.2. Data Layer

The data layer consists of concrete data instances populated based on the schema defined in the conceptual layer, and it is constructed in a bottom-up manner.

Data cleaning and normalization are required to process, transform, and optimize the database table structures, ensuring consistency and integrity when mapping them into the knowledge graph. The specific transformation and normalization rules are detailed in Table 4 and Table 5.

The relationship transformation component converts relational structures—such as primary keys and foreign keys—from relational databases into explicit relationships within the knowledge graph. The results of this transformation are presented in Table 6.

Structural Optimization focuses on adapting relational database table structures—primarily defined by primary–foreign key relationships—into graph database representations, which are centered on nodes, relationships, and attributes. To achieve this, the table structure information is systematically optimized to align with graph-based modeling requirements. The detailed optimization strategies are presented in Table 7.

2.5. Knowledge Graph Retrieval

The retrieval mechanism of the knowledge graph in this study integrates an enhanced Think-on-Graph based approach with hard matching over the graph. Building upon entity recognition, vector-based retrieval, and relation pruning, the method incorporates a cyclic, iterative graph reasoning process, thereby enabling deep integration between large language models and the knowledge graph. The overall retrieval framework is illustrated in Figure 4.

As illustrated in Figure 4, the search logic diagram depicts the complete retrieval workflow, which consists of three main stages:

Stage 1, Input Processing: The system receives three types of input: (a) extracted table names and field names from the user query, (b) recognized entities from the natural language input, and (c) schema information from the relational database. These inputs are converted into JSON format to serve as structured input for the LLM.

Stage 2, Vector-based Retrieval: The JSON-formatted input is processed through an embedding model to generate semantic vectors. These vectors are then matched against the pre-indexed vectors in the high-performance vector database (using Milvus-2.5) to retrieve the top-k most relevant schema elements.

Stage 3, Knowledge Graph Reasoning: The retrieved candidates are used as seed nodes for graph traversal. The system performs iterative path expansion along relational edges, with the LLM evaluating semantic relevance at each step. The final output includes validated SQL components with their complete reasoning paths.

2.5.1. Agent-Based Retrieval

To achieve semantic-level understanding of structured databases and enhance the accuracy of SQL generation, this study proposes a two-stage enhancement approach that integrates vector-based retrieval with knowledge graph reasoning. In this framework, field descriptions and table schema information from the database are vectorized using an embedding model and stored in a high-performance vector database (Milvus), thereby constructing a semantic indexing layer that supports similarity-based semantic matching.

When a user submits a natural language query, the system first invokes a named entity recognition (NER) module to extract potential entities from the query text, such as table names, field names, and domain-specific terms. These entities are then semantically matched against table and field nodes stored in the vector database, and the top-k most relevant nodes are selected. Subsequently, using these highly relevant nodes as anchor nodes (initial query vertices for graph traversal), the system performs a local subgraph pruning on the knowledge graph, retaining only nodes and their direct relationships that are strongly related to the query topic, thereby effectively reducing the search space complexity. Based on this pruned subgraph, the system semantically integrates the node metadata—including field types, annotations, and example values—with the original user query to construct an initial contextual representation, Ctx0, which is then fed into the LLMs to guide the generation of candidate SQL statements.

The system identifies deviations through a multi-criteria evaluation process: (1) Syntax Error Detection, The generated SQL is parsed using a SQL syntax analyzer to identify structural errors such as missing keywords, unbalanced parentheses, or invalid clause ordering. (2) Semantic Deviation Assessment, The system compares the generated SQL’s semantic intent against the original user query using cosine similarity between their embedding representations. A similarity score below a predefined threshold (e.g., 0.75) indicates potential semantic deviation. (3) Schema Compliance Check, The SQL is validated against the database schema to ensure all referenced tables, columns, and relationships exist and are correctly used. (4) Execution Feasibility Test, A dry-run execution plan is generated to identify potential runtime issues such as ambiguous column references or type mismatches.

If any of these checks identify deviations (syntax errors detected, semantic similarity < threshold, schema violations found, or execution plan warnings), the system initiates an iterative path expansion mechanism. Based on the currently activated node set, the mechanism recursively expands along the relational edges in the knowledge graph, extracting adjacent nodes and their attribute information. Subsequently, the LLMs evaluates the semantic relevance of the expanded candidate relationships and nodes, performs redundancy filtering, and selects the top-k most discriminative path segments, recording their complete retrieval paths to form a “path reasoning chain.” This process can be executed over multiple iterations, with each iteration dynamically injecting newly acquired contextual information into the historical context. Through this iterative enrichment, the system gradually constructs a more comprehensive and logically consistent semantic representation, denoted as Ctxt, which ultimately guides the LLMs to generate SQL statements that are accurate, complete, and compliant with the database schema.

If the SQL statement generated by the large language model (LLM) contains syntax errors, semantic deviations, or fails to meet the expected query, the system will initiate an iterative path expansion mechanism. This mechanism, based on the currently active set of nodes, recursively expands along the relational edges in the knowledge graph, extracting neighboring nodes and their attribute information. Subsequently, the LLM evaluates the semantic relevance of the expanded candidate relationships and nodes, performs redundancy filtering, and selects the top k most distinctive path segments, recording their complete retrieval paths to form a “path reasoning chain.” This process can be executed multiple times, with each iteration dynamically injecting newly acquired contextual information into the historical context. Through this iterative enrichment, the system gradually builds a more comprehensive and logically consistent semantic representation, denoted as Ctxt, ultimately guiding the LLM to generate accurate, complete, and database schema-compliant SQL statements.

Furthermore, our system identifies deviations through a multi-criteria evaluation process:

Syntax Error Detection: The generated SQL statement is parsed using an SQL syntax analyzer to identify structural errors, such as missing keywords, mismatched parentheses, or incorrect clause order.
Semantic Deviation Assessment: The system compares the semantic intent of the SQL statement generated by the large language model with the original user query using cosine similarity between embedding representations. A similarity score below a predefined threshold (e.g., 0.75) indicates a potential semantic deviation.
Schema Compliance Check: The SQL statement is validated against the database schema to ensure that all referenced tables, columns, and relationships exist and are used correctly.
Execution Feasibility Test: A simulated execution plan is generated to identify potential runtime issues, such as ambiguous column references or type mismatches.

If any check reveals a deviation (syntax error detected, semantic similarity below threshold, schema violation found, or execution plan warning), the system will initiate the iterative path expansion mechanism.

2.5.2. Entity Validation

To ensure the accuracy and reliability of retrieval results, the system incorporates an entity consistency validation module. Specifically, all entities identified from the user query are re-embedded and subjected to a secondary semantic comparison within the vector database. The outcomes obtained from hard matching in the knowledge graph are then evaluated by the LLMs to verify whether the entities genuinely exist in the knowledge graph. If a mismatch is detected, the system triggers entity replacement suggestions or prompts the user to clarify ambiguities, thereby preventing downstream errors caused by incorrect entity recognition.

2.5.3. Relationship Validation

After entity validation is completed, the system further conducts semantic consistency verification on the relationship triples extracted from the knowledge graph. The valid entities confirmed in the entity validation stage are jointly encoded with the set of relationships output by the relation extraction module, forming an integrated semantic context, denoted as Ctxrel. This context is then evaluated by the LLMs to assess the plausibility of each relationship, including whether it conforms to business logic, contains semantic conflicts, or results in redundant inferences. The confidence scores produced by the LLMs are used to filter valid relationship paths, remove noisy connections, and ensure that the ultimately constructed subgraph is semantically coherent and structurally compact.

2.5.4. Computational Cost Analysis

To address efficiency considerations, we analyze the computational costs of each pipeline component. The vector search module comprises embedding generation with O(n) complexity (100–300 ms), FAISS/Milvus retrieval with O(log n) complexity using IVF indexing (50–150 ms), and re-ranking with O(k × m) complexity (80–200 ms).

The knowledge graph hard matching involves entity lookup with O(1) complexity via B-tree indexing (20–80 ms), relationship traversal with O(d) complexity for 2–3 hops (100–400 ms), and subgraph extraction bounded by O(|V| + |E|) (150–500 ms).

The LLM-based judgment constitutes the most computationally intensive component. Entity validation requires 800–2000 ms processing 100–200 tokens, relationship assessment demands 1200–2500 ms with 200–400 tokens, and SQL refinement consumes 1500–4000 ms processing 500–1000 tokens.

For a typical query, the total additional latency ranges from 4 to 10 s, distributed as: vector retrieval 6%, knowledge graph matching 9%, LLM verification 80%, and auxiliary overhead 5%.

Despite these requirements, the system maintains its “lightweight” characterization through pre-constructed knowledge graphs, pre-computed embeddings, targeted schema-focused prompts, and early termination mechanisms. Compared to full-scale RAG systems requiring retrieval over extensive document corpora, our approach constrains retrieval to structured schema elements, substantially reducing search space and computational requirements for practical production deployment.

This study’s methodology establishes a closed-loop mechanism encompassing “semantic retrieval → local subgraph pruning → iterative expansion → dual validation,” which not only ensures retrieval efficiency but also substantially enhances the accuracy and interpretability of SQL generation. By integrating these stages, the proposed approach provides a robust solution for natural language-to-SQL conversion in complex query scenarios.

3. Experiments and Analysis of Results

3.1. Experimental Parameters and Performance Metrics

This experiment aims to evaluate the performance of LLMs on the Text-to-SQL task and to verify the enhancement effect of integrating knowledge graphs with LLMs. The evaluation is conducted on the BIRD and Spider datasets, which represent tasks of varying complexity and scenarios for natural language-to-SQL generation. The datasets are described as follows:

BIRD Dataset [8]: The BIRD dataset is an open-source, cross-domain benchmark primarily designed to study the impact of large-scale database contents on text-to-SQL parsing. The dataset is released under the CC BY-SA 4.0 license, which permits sharing and adaptation with appropriate attribution. (https://bird-bench.github.io/ (accessed on 15 November 2025)).

Spider Dataset [9]: The Spider dataset is a large-scale, complex cross-domain semantic parsing and text-to-SQL dataset annotated by eleven students from Yale University. The dataset is released under the CC BY-SA 4.0 license. (https://yale-lily.github.io/spider (accessed on 15 November 2025)).

In addition to the primary Text-to-SQL benchmarks (BIRD and Spider), we also evaluate the retrieval component of our framework on three knowledge graph question answering datasets to assess the effectiveness of the Think-on-Graph retrieval mechanism:

WebQSP-EX: WebQSP [10] is a benchmark dataset for question answering over Freebase, containing 4737 natural language questions with corresponding SPARQL queries. The “-EX” variant extends the original dataset with additional evaluation metrics for multi-hop reasoning assessment.

QALD-10-EN-EX: QALD-10 [11] is part of a series of evaluation campaigns focusing on question answering over knowledge graphs. The English version contains questions of varying complexity, including simple factoid questions and complex multi-constraint queries. The dataset is designed to evaluate systems on DBpedia and Wikidata.

AdvHotpotQA-EX: Based on the original HotpotQA [12] multi-hop reasoning dataset, AdvHotpotQA-EX introduces adversarial examples that require more sophisticated reasoning chains. This dataset specifically tests the model’s ability to perform complex, multi-step reasoning while avoiding distractors.

These datasets complement the Text-to-SQL evaluation by providing insights into the retrieval module’s performance across different knowledge graph reasoning scenarios.

We tested both the baseline large model and its enhanced version using knowledge graph integration. The evaluation metrics used included:

Execution Accuracy (EX) [13]: This metric measures the percentage of times that the SQL query generated by the model produces the correct result when executed on the database.

1 (V_{i} = \hat{V_{i}})

is an indicator function; it equals 1 if the internal condition is met, and 0 otherwise.

N

represents the number of samples.

E X = \frac{\sum_{i = 1}^{N} 1 (V_{i} = \hat{V_{i}})}{N} \times 100 %

The number of correctly predicted SQL queries is the number of SQL queries on the test set where the model output is exactly the same as the real SQL query. The total number of test samples is the total number of samples in the dataset used for testing.

3.2. Experiment

The experimental evaluation is organized into two complementary components to comprehensively assess the proposed framework:

Comparative Experiment evaluates the overall performance of the proposed KG + ToG method against existing state-of-the-art Text-to-SQL approaches, demonstrating the effectiveness of integrating knowledge graphs with large language models.

Ablation Experiment systematically analyzes the contribution of individual components within the framework, including the graph retrieval module and the verifier component, to understand their respective impacts on performance.

This two-part structure enables both holistic performance comparison and fine-grained component analysis.

3.2.1. Comparative Experiment

To evaluate the effectiveness of different reasoning enhancement methods in the Text-to-SQL task, this study conducted comparative experiments under a unified model architecture (Qwen2.5-32B) and a consistent evaluation benchmark. Four mainstream approaches were examined—namely, the standalone large model, Agentar-Scale-SQL [14], GenaSQL [15], DPG-SQL [16], DIN-SQL [17], DAIL-SQL [18], TA-SQL [19], SuperSQL [20], and the proposed SQL-KG-Verifier integrated with the Think-on-Graph mechanism. The experimental results are summarized in Table 8, and the detailed analysis is presented as follows:

As presented in Table 9, we conducted a systematic comparison of various Text-to-SQL strategies. To ensure a fair comparison and isolate the algorithmic contribution from the sheer power of the foundation model, we strictly unified the base model across all competing methods to Qwen2.5-32B. This setting allows us to rigorously evaluate performance under constraints of model scale and computational resources, prioritizing reproducibility and practical deployment over brute-force scaling.

Addressing the Performance Gap with SOTA: It is acknowledged that the BIRD leaderboard features success rates exceeding 70% (e.g., Agentar-Scale-SQL at 74.6% and GenaSQL at 70.53%). However, these results typically rely on proprietary, massive-scale models or extensive proprietary pipelines that differ fundamentally in resource consumption and accessibility. In contrast, our experimental setup focuses on the 32B parameter class. Within this resource-constrained but reproducible regime, our proposed KG + ToG method achieves an execution accuracy of 52.71% on BIRD and 77.26% on Spider. This represents the state-of-the-art performance specifically for open-source models of this scale, effectively bridging the gap between medium-scale models and larger proprietary systems.

Detailed Performance Breakdown: Experimental results demonstrate the significant advantage of our approach:

Superiority over Baselines: KG + ToG outperforms the standard Qwen2.5-32B base model (34.64%) by a substantial margin of 18.07%. Furthermore, it surpasses other advanced prompting engineering methods tailored for this scale, such as DAIL-SQL (49.59%), TA-SQL (48.59%), and SuperSQL (45.08%).

Robustness in Complexity: While baseline methods achieve respectable scores on the simpler Spider dataset (ranging from 70.42% to 75.68%), their performance degrades sharply on the more challenging BIRD benchmark. Our method maintains a consistent lead, proving that the integration of the SQL-KG-Verifier and Thought-of-Graph reasoning mechanisms effectively handles the complex schema linking and ambiguous intent resolution required in real-world scenarios.

In summary, while larger proprietary models may achieve higher absolute scores, KG + ToG establishes a new benchmark for efficiency and accuracy within the 32B open-source landscape, validating the necessity of knowledge-driven reasoning in resource-limited environments.

3.2.2. Ablation Experiment

For the verifier component, we decompose it into four configurations for empirical analysis: (1) LLM-only, (2) LLM combined with hard knowledge graph matching (LLM + KG), (3) ToG combined with hard knowledge graph matching but without LLM-based verification (termed KG + ToG-LLM-only), and (4) the full KG + ToG pipeline integrating ToG, vector retrieval, and hard knowledge graph matching. Detailed results are presented in Table 9 and Table 10.

As shown in Table 9, ToG consistently outperforms pure LLM-based retrieval across all model scales and datasets. Notably, on the multi-hop reasoning-intensive AdvHotpotQA benchmark, the accuracy of Qwen2.5-72B improves from 25.17% to 32.87% (+7.7 percentage points), demonstrating ToG’s effectiveness in guiding complex, multi-step reasoning trajectories. Further incorporating hard knowledge graph matching (KG + ToG) yields continued performance gains: on WebQSP, the accuracy of Qwen2.5-72B rises from 75.23% to 81.32%; on AdvHotpotQA, it reaches 34.92%, significantly surpassing both standalone ToG and LLM-only approaches.

Importantly, this performance trend remains consistent across different model sizes (7B, 32B, and 72B), indicating that the knowledge graph not only provides structured semantic anchors for ToG but also effectively mitigates semantic drift during the reasoning process. The synergy between ToG and KG thus substantially enhances the model’s capacity for semantic comprehension and precise knowledge grounding in complex query scenarios.

Table 10 demonstrates that incorporating knowledge graph (KG) information—whether through LLM + KG or KG + ToG variants—consistently and significantly improves the accuracy of SQL generation. Notably, the full KG + ToG verifier achieves the best performance across all experimental settings. Taking Qwen2.5-72B as an example, on the Spider dataset, its execution accuracy (EX) increases from 66.79% (LLM-only) to 80.12% (+13.33 percentage points); on BIRD, it rises from 43.26% to 58.63%. More critically, even when compared to the KG + ToG-LLM-only configuration—which leverages ToG-guided reasoning paths but excludes LLM-based semantic understanding—the complete KG + ToG verifier retains a clear advantage (e.g., 76.48% → 80.12% on Spider), underscoring the essential role of deep integration between LLMs and KGs in ensuring both logical consistency and syntactic correctness during verification.

Furthermore, the verification mechanism yields particularly pronounced gains on medium-sized models. For instance, with Qwen2.5-32B on Spider, EX jumps from 57.33% to 77.26%, indicating that the proposed verifier not only pushes the performance ceiling but also effectively compensates for the limited semantic parsing capabilities of smaller-scale models in complex scenarios.

The proposed KG + ToG framework achieves a significant performance breakthrough in Text-to-SQL by deeply integrating KG-guided multi-hop reasoning with a structured verification mechanism. Comparative experiments confirm that this approach outperforms existing state-of-the-art methods on both the standard benchmark Spider and the more challenging, real-world-inspired BIRD dataset—exhibiting especially pronounced gains on BIRD, which validates its superior adaptability to complex database schemas and ambiguous user intents.

Ablation studies further reveal that the performance gains stem from the synergistic interplay of three core components:

Think-on-Graph (ToG) explicitly models multi-step reasoning paths, enabling effective acquisition of distributed structural information and ensuring schema consistency throughout SQL generation;
Vector-based retrieval filtering, applied prior to hard KG matching, introduces an initial semantic-level screening step by computing semantic similarity between the query and candidate facts, thereby aligning linguistic expressions and mitigating semantic drift in subsequent generation stages;
SQL-KG-Verifier (KG) fuses the LLM’s semantic reasoning with the KG’s logical constraints to perform dual verification—assessing both grammatical correctness and schema-level consistency—and provides iterative feedback for refinement, substantially enhancing result accuracy and robustness.

Together, these components form an inseparable “reasoning–knowledge–verification” closed loop. The framework demonstrates consistent performance gains across model scales (7B, 32B, 72B) and exhibits strong generalization across heterogeneous tasks, including open-domain question answering and structured query generation. These findings robustly affirm the necessity and effectiveness of coupling knowledge-driven grounding with reasoning-guided inference in complex semantic parsing, offering a scalable, verifiable, and practical paradigm for next-generation semantic parsing systems in real-world applications.

4. Conclusions and Future Work

This study presents an enhanced SQL generation framework for complex database queries, focusing on the deep integration of LLMs with structured knowledge graphs. The framework constructs a domain-specific knowledge graph based on database table structures and leverages LLMs-guided retrieval to facilitate multi-hop semantic reasoning, achieving precise mapping from natural language to structured SQL. Furthermore, a validation mechanism is introduced to enforce both semantic consistency and syntactic correctness of the generated outputs, significantly improving accuracy, robustness, and interpretability.

To systematically evaluate the effectiveness of the proposed framework, we conducted multi-dimensional comparative experiments using the Qwen2.5 series models on two authoritative Text-to-SQL benchmark datasets, BIRD and Spider. The results demonstrate that, across all model scales, the KG + ToG approach consistently outperforms the baseline, RAG, and KG enhancement methods. Notably, KG + ToG exhibits superior reasoning capabilities and structural alignment advantages when handling complex queries involving cross-table joins, implicit constraints, and multi-hop logic. Ablation studies further confirm the synergistic contributions of ToG in guiding reasoning paths, KG in providing structural support, and the validation module in enhancing output reliability, validating both the necessity and complementarity of the framework’s core components [21].

Despite achieving significant improvements in SQL generation accuracy, adaptability, and complex query handling, the experiments also reveal certain limitations in generalization [22]. Specifically, performance exhibits slight fluctuations when encountering unseen database schemas or extremely sparse queries. This phenomenon primarily stems from the static and domain-dependent nature of knowledge graph construction, as well as the inherent uncertainty of LLMs in open-domain reasoning.

Future work will focus on three main directions:

Dynamic Knowledge Graph Construction and Adaptive Integration: Explore lightweight, incremental knowledge graph construction mechanisms that leverage real-time updates from database metadata, enhancing the model’s adaptability to dynamic environments;
Enhanced Controllability of the Generation Process: Introduce reinforcement learning rewards and supervised fine-tuning strategies, combined with prompt engineering optimization, to constrain model outputs to adhere to business semantics and SQL syntax, thereby reducing redundancy and logical errors [10];
Cross-Domain Generalization and Zero-Shot Transfer: Improve the model’s ability to handle heterogeneous database structures through graph alignment, schema abstraction, and meta-learning strategies, mitigating the current strong dependency on training-domain knowledge [13].

This study not only provides a high-performance and highly interpretable paradigm for the Text-to-SQL task but also offers a reusable technical pathway for the deep integration of LLMs with structured knowledge systems in database automation, intelligent interaction, and enterprise-level data services. With further optimization of components and architectural evolution, the proposed framework is expected to achieve higher accuracy, greater robustness, and broader applicability in real-world production environments, advancing intelligent data querying from a “decision-support tool” toward a new stage of “autonomous decision-making”.

Author Contributions

Conceptualization, B.W. and X.Z.; Methodology, B.W.; Software, B.W.; Writing—Original Draft Preparation, B.W.; Writing—Review and Editing, X.Y. and X.Z.; Funding Acquisition, X.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This esearch was funded by the Guizhou Science and Technology Program, the Guanghe Fund, and the National Natural Science Foundation of China under the project “Evaluation Model of Simulation Architecture and Attack Patterns for Security Testing of IoT Key Technology RFID Systems”. The project numbers are QKH-ZC[2023]Key001, ghfund202407047557, and 61461009. The article processing charge (APC) was funded by the Guizhou Key Laboratory of Advanced.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the Guizhou Key Laboratory of Advanced Computing for providing computational resources. The authors also acknowledge ChatGPT 5.1 for assistance in language polishing and English translation during manuscript preparation. ChatGPT was used as a supporting AI tool, and all generated content was thoroughly reviewed and edited by the authors, who take full responsibility for the final published version.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. This is a table caption.

Example	Prompt Text
P₁	### System Instruction You are an expert in SQL logic analysis. Your task is to interpret the user’s natural language query and convert it into a professional Text2SQL task description. ### Requirements 1. Analyze Intention: Clearly state what data the user is looking for and the business logic behind it. 2. Deconstruct Logic: Break down the request into SQL-related concepts (Select, Filter, Join, Group, Sort) without writing the actual code. 3. Handle Ambiguity: Highlight any implicit assumptions made during the analysis. ### Output Style Use a structured bullet-point format to present the analysis. Ensure the terminology is technically accurate (e.g., use terms like “Predicate,” “Aggregation,” “Foreign Key”).
P₂	Role: You are a Senior Database Engineer and an expert in SQL generation. Your specific expertise lies in [Insert Dialect, e.g., MySQL 8.0/PostgreSQL]. Task: Generate a precise and executable SQL query based on the provided “Table Schema” to answer the “User Request”. Constraints & Rules: 1. Output Only: Output strictly the SQL code block. Do not add any explanations, notes, or markdown text outside the code block. 2. Syntax: Ensure the SQL is syntactically correct for the specified dialect. 3. Aliases: Use explicit table aliases (e.g., ‘t1’, ‘t2’ or ‘users’, ‘orders’) for clarity, especially when performing JOINs. 4. No Hallucination: Only use columns and tables defined in the schema. If the request cannot be answered with the given schema, output a SQL comment: ‘-- Error: Insufficient schema context’. Data Context: Table Schema: {SQL related table structure} User Request: {Input Data/Natural Language Question} Output format: ‘‘‘sql -- Your SQL Query here
P₃	# Role You are an advanced SQL Parsing and Metadata Extraction Engine. You specialize in static code analysis to map data lineage. # Objective Analyze the provided SQL statement to extract two key types of information: 1. Field Inventory: A comprehensive list of all fields used per table. 2. Relationship Mapping: A structural representation of how tables are connected (Joins/Links). # Analysis Guidelines ### 1. Entity Extraction (Fields) * Scope: Extract columns from ALL clauses including ‘SELECT’, ‘FROM’, ‘WHERE’, ‘GROUP BY’, ‘ORDER BY’, ‘HAVING’, and ‘JOIN ON’. * Table Resolution: Identify the full table name. If a schema is provided (e.g., ‘sales.orders’), preserve it. * Alias Handling (CRITICAL): * You must STRIP all table aliases from column names. * Example: If the SQL is ‘SELECT t1.user_id’, extract ‘user_id’. * Example: If the SQL is ‘WHERE o.amount > 100’, extract ‘amount’. * Calculations: For derived fields (e.g., ‘COUNT(distinct user_id)’), extract only the underlying column name (‘user_id’). ### 2. Relationship Extraction (Topology) * Identify connections defined in ‘JOIN … ON’ clauses and implicit joins in ‘WHERE’ clauses. * Capture the Source (Left) and Target (Right) components of the relationship. * Determine the join type (INNER, LEFT, RIGHT, FULL, CROSS). # Output Format Return the result strictly as a JSON object. Do not include markdown formatting (like ‘‘‘json … ’’’) or explanatory text. Use the following schema: { “entities”: [ { “table_name”: “Name of the table”, “fields”: [ “List”, “of”, “distinct”, “columns”, “used” ] } ], “relationships”: [ { “join_type”: “INNER/LEFT/RIGHT/etc”, “source_table”: “Table name”, “source_field”: “Column used in join”, “target_table”: “Table name”, “target_field”: “Column used in join” } ] } # Input SQL “““ [INSERT YOUR SQL HERE] “““

References

Mansuri, I.R.; Sarawagi, S. Integrating unstructured data into relational databases. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06), Atlanta, GA, USA, 3–7 April 2006; IEEE: New York, NY, USA, 2006; p. 29. [Google Scholar]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Hogan, A.; Blomqvist, E.; Cochez, M.; Claudia, A.; Gerard, M.; Claudio, G.; Emilio, L.G.J.; Sabrina, K.; Sebastian, N.; Axel, P.; et al. Knowledge graphs. ACM Comput. Surv. (CSUR) 2021, 54, 1–37. [Google Scholar] [CrossRef]
Seranmadevi, R.; Addula, S.R.; Kumar, D.; Tyagi, A. Security and Privacy in AI: IoT-Enabled Banking and Finance Services. In Monetary Dynamics and Socio-Economic Development in Emerging Economies; IGI Global: Hershey, PA, USA, 2026; pp. 163–194. [Google Scholar]
Peng, B.; Li, C.; He, P.; Galley, M.; Gao, J. Instruction Tuning with GPT-4. arXiv 2023, arXiv:2304.03277. [Google Scholar] [CrossRef]
Hong, Z.; Yuan, Z.; Chen, H.; Zhang, Q.; Huang, F.; Huang, X. Knowledge-to-SQL: Enhancing SQL Generation with Data Expert LLM. In Proceedings of the Findings of the 62nd Annual Meeting of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, 11–16 August 2024; Association for Computational Linguistics (ACL): Kerrville, TX, USA, 2024; pp. 10997–11008. [Google Scholar]
Yu, T.; Zhang, R.; Yang, K.; Yasunaga, M.; Wang, D.; Li, Z.; Ma, J.; Li, I.; Yao, Q.; Roman, S.; et al. Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task. arXiv 2018, arXiv:1809.08887. [Google Scholar]
Li, J.; Hui, B.; Qu, G.; Yang, J.; Li, B.; Li, B.; Wang, B.; Qin, B.; Geng, R.; Huo, N.; et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Adv. Neural Inf. Process. Syst. 2023, 36, 42330–42357. [Google Scholar]
Yih, W.T.; Richardson, M.; Meek, C.; Chang, M.W.; Suh, J. The value of semantic parse labeling for knowledge base question answering. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Berlin, Germany, 7–12 August 2016; pp. 201–206. [Google Scholar]
Perevalov, A.; Diefenbach, D.; Usbeck, R.; Both, A. Qald-9-plus: A multilingual dataset for question answering over dbpedia and wikidata translated by native speakers. In Proceedings of the 2022 IEEE 16th International Conference on Semantic Computing (ICSC), Virtual, 26–28 January 2022; pp. 229–234. [Google Scholar]
Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W.; Salakhutdinov, R.; Manning, C.D. HotpotQA: A dataset for diverse, explainable multi-hop question answering. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October–4 November 2018; pp. 2369–2380. [Google Scholar]
Chaudhuri, S.; Narasayya, V.; Ramamurthy, R. Estimating progress of execution for SQL queries. In Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, Paris, France, 13–18 June 2004; pp. 803–814. [Google Scholar]
Wang, P.; Sun, B.; Dong, X.; Dai, Y.; Yuan, H.; Chu, M.; Gao, Y.; Qi, X.; Zhang, P.; Yan, Y. Agentar-Scale-SQL: Advancing Text-to-SQL through Orchestrated Test-Time Scaling. arXiv 2025, arXiv:2509.24403. [Google Scholar]
Dönder, Y.D.; Hommel, D.; Wen-Yi, A.W.; Mimno, D.; Jo, U.E.S. Cheaper, Better, Faster, Stronger: Robust Text-to-SQL without Chain-of-Thought or Fine-Tuning. arXiv 2025, arXiv:2505.14174. [Google Scholar]
Zhang, T.; Chen, C.; Liao, C.; Wang, J.; Zhao, X.; Yu, H.; Wang, J.; Li, J.; Shi, W. Sqlfuse: Enhancing text-to-sql performance through comprehensive llm synergy. arXiv 2024, arXiv:2407.14568. [Google Scholar]
Pourreza, M.; Rafiei, D. Din-sql: Decomposed in-context learning of text-to-sql with self-correction. Adv. Neural Inf. Process. Syst. 2023, 36, 36339–36348. [Google Scholar]
Gao, D.; Wang, H.; Li, Y.; Sun, X.; Qian, Y.; Ding, B.; Zhou, J. Text-to-sql empowered by large language models: A benchmark evaluation. arXiv 2023, arXiv:2308.15363. [Google Scholar] [CrossRef]
Qu, G.; Li, J.; Li, B.; Qin, B.; Huo, N.; Ma, C.; Cheng, R. Before generation, align it! A novel and effective strategy for mitigating hallucinations in text-to-sql generation. arXiv 2024, arXiv:2405.15307. [Google Scholar]
Toyama, M. SuperSQL: An extended SQL for database publishing and presentation. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, Seattle, WA, USA, 1–4 June 1998; pp. 584–586. [Google Scholar]
Fan, W.; Ding, Y.; Ning, L.; Wang, S.; Li, H.; Yin, D.; Chua, T.-S.; Li, Q. A survey on rag meeting llms: Towards retrieval-augmented large language models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6491–6501. [Google Scholar]
Sun, J.; Xu, C.; Tang, L.; Wang, S.; Lin, C.; Gong, Y.; Ni, L.; Shum, H.-Y.; Guo, J. Think-on-Graph: Deep and Responsible Reasoning of Large Language Model on Knowledge Graph. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]

Figure 1. Architecture flow chart (Source: Author’s contribution).

Figure 2. Validator Flowchart.

Figure 3. Knowledge Graph Design Logic Diagram.

Figure 4. Search Logic Diagram.

Table 1. Hard Natching Logic Code Table.

Relationship Type	Cypher Code
Node existence check	MATCH (n) WHERE exists(n.name) RETURN DISTINCT n.name AS node name;
Single-hop relationship (primary key)	MATCH (t:table)-[r:relation]->(f:field) WHERE r.type = ‘PK’ AND t.name = ‘table’ AND f.name = ‘field’ RETURN t, f
Single-hop relationship	MATCH (t:table)-[r:relation]->(f:field) t.name = ‘table’ AND f.name = ‘field’ RETURN COUNT(*) > 0
Multi-hop paths across tables	MATCH (t1:table)-[r:*2..5]->(t2:table) WITH t1, t2, collect(type(r)) AS relTypes WHERE ALL(rel IN [‘Pri_Key’, ‘For_Key’] WHERE rel IN relTypes) RETURN t1, t2
Multi-hop path of field relationships	MATCH (f1:feild1)-[r:*2..6]->(f2:field2) f1.name = ’field1’ AND f2.name = ’field2’
Field type relationship	MATCH (f1:field)<-[:relation]-(t1:table), (f2:field)<-[:relation]-(t2:table) WHERE f1.name = f2.name AND f1.type = f2.type RETURN f1, f2

Table 2. Entities and Entity Attributes Table.

Entity	Entity Attributes
Table name	Primary key, foreign key
Field name	Type, Default value, Notes

Table 3. Three Element table.

Serial Number	Subject	Verb	Object
1	Table	Primary Key	Field
2	Table	Foreign Key	Field

Table 4. Data Information Extraction Rules.

Information Category	Description
Table Name	The logical name of the table, used to identify the data entity.
Field Name	The physical name of each column in the table.
Field Type	The data type of the field, such as INT, VARCHAR.
Field Description	Comments or business meaning of the field.
Default Value	The default value of the field when no value is assigned.
Field Key Type	Whether the field is a primary key or a foreign key
Table Name	The logical name of the table, used to identify the data entity.

Table 5. Data Cleaning Rules Table.

Data Cleaning Rule Category	Description	Processing Method
Outlier Handling	Inconsistent or illogical entity, attribute, or relationship data.	Identify and remove outliers using a model.
Duplicate Data Handling	Multiple records with duplicate foreign keys.	Identify and remove duplicate records based on matching foreign keys.
Format Standardization	Inconsistent data formats	Standardize the data format to a consistent entity-relationship-entity structure.

Table 6. Relationship Conversion Table.

Table Structure Relationships	Knowledge Graph Relationships
Primary Key	Primary Key Identifier
Foreign Key	Foreign Key Identifier
Column	Basic Relationship

Table 7. Structural Optimization Table.

Table Structure	Knowledge Graph
Table	Entity
Field	Entity
Foreign Key	Relationship, Attribute
Primary Key	Relationship, Attribute
Table Comment	Attribute
Field Type	Attribute
Field Default Value	Attribute
Field Comment	Attribute

Table 8. Comparative Experimental Results.

Model	Method	Dataset
Model	Method	BIRD (%)	Spider (%)
-	Agentar-Scale-SQL	74.6	-
-	GenaSQL	70.53	-
GPT-4	DPG-SQL	-	85.6
GPT-4	DIN-SQL	-	85.3
Qwen2.5-32B	KG + ToG	52.71_[our]	77.26_[our]
Qwen2.5-32B	DAIL-SQL	49.59	75.68
Qwen2.5-32B	TA-SQL	48.59	72.68
Qwen2.5-32B	SuperSQL	45.08	70.42
Qwen2.5-32B	-	34.64	54.89

Table 9. Ablation Experimental Results of The Spectral Retrieval Section (Source Author’s contribution).

Agent Model	WebQSP-EX			QALD-10-EN-EX			AdvHotpotQA-EX
Agent Model	LLM	ToG	KG + ToG	LLM	ToG	KG + ToG	LLM	ToG	KG + ToG
Qwen2.5-72B	68.83%	75.23%	81.32%	49.82%	50.07%	51.92%	25.17%	32.87%	34.92%
Qwen2.5-32B	65.59%	77.13%	80.15%	46.61%	47.50%	48.43%	23.81%	27.31%	33.28%
Qwen2.5-7B	66.26%	74.97%	77.46%	43.33%	45.21%	44.05%	21.08%	23.42%	30.09%

Table 10. Ablation Test Results of The Calibrator.

Model	BIRD-EX (%)				Spider-EX (%)
Model	LLMs	KG	KG + ToG (LLMs-Only)	KG + ToG	LLMs	KG	KG + ToG (LLMs-Only)	KG + ToG
Qwen2.5-72B	43.26	45.19	48.47	53.28	66.79	77.56	76.48	80.12
Qwen2.5-32B	31.44	37.9	33.17	42.90	57.33	73.50	60.03	77.26
Qwen2.5-7B	21.44	30.18	29.68	32.18	47.33	66.69	65.84	69.81

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, B.; Yu, X.; Zheng, X. SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs. Electronics 2026, 15, 278. https://doi.org/10.3390/electronics15020278

AMA Style

Wang B, Yu X, Zheng X. SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs. Electronics. 2026; 15(2):278. https://doi.org/10.3390/electronics15020278

Chicago/Turabian Style

Wang, Bohan, Xuhong Yu, and Xin Zheng. 2026. "SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs" Electronics 15, no. 2: 278. https://doi.org/10.3390/electronics15020278

APA Style

Wang, B., Yu, X., & Zheng, X. (2026). SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs. Electronics, 15(2), 278. https://doi.org/10.3390/electronics15020278

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SQL Statement Generation Enhanced Through the Fusion of Large Language Models and Knowledge Graphs

Abstract

1. Introduction

2. Methodology

2.1. Architecture Design

2.2. Design of Input Module

2.3. SQL-KG-Verifier

2.4. Figures, Tables and Schemes

2.4.1. Conceptual Layer

2.4.2. Data Layer

2.5. Knowledge Graph Retrieval

2.5.1. Agent-Based Retrieval

2.5.2. Entity Validation

2.5.3. Relationship Validation

2.5.4. Computational Cost Analysis

3. Experiments and Analysis of Results

3.1. Experimental Parameters and Performance Metrics

3.2. Experiment

3.2.1. Comparative Experiment

3.2.2. Ablation Experiment

4. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI