Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

FGCSQL: A Three-Stage Pipeline for Large Language Model-Driven Chinese Text-to-SQL

Electronics 2025, 14(6), 1214; https://doi.org/10.3390/electronics14061214

by Guanyu Jiang¹, Weibin Li^2,*

, Chenglong Yu², Zixuan Zhu¹ and Wei Li¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Electronics 2025, 14(6), 1214; https://doi.org/10.3390/electronics14061214

Submission received: 10 February 2025 / Revised: 9 March 2025 / Accepted: 18 March 2025 / Published: 19 March 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The paper proposed a three-stage pipeline method: Redundant Database Schema Items Filtering Encoder (RDsiF-Encoder for short), Generative Pre-trained LLM for SQL Parser, and SQL Query Correcting Decoder for text-to-SQL application. This is an interesting study. However, the paper requires improvement. Please see suggestions below.

Introduction:

Kindly rectify the gap between 80 - 82

In the introduction section, kindly make the aim of the study clearer. This is essential for readers who are new to the field.

In lines 93 – 99, kindly use bullet points to showcase the three steps.

How does your study differ and improve on existing studies (like the ones listed below)? This needs to be clearly stated

Wang, L., Zhang, A., Wu, K., Sun, K., Li, Z., Wu, H., ... & Wang, H. (2020, November). DuSQL: A large-scale and pragmatic Chinese text-to-SQL dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6923-6935).
Zhang, B., Ye, Y., Du, G., Hu, X., Li, Z., Yang, S., ... & Mao, H. (2024). Benchmarking the text-to-sql capability of large language models: A comprehensive evaluation. arXiv preprint arXiv:2403.02951.

Related Work

Literature reviewed provides the names of algorithms (as an example) developed by existing studies. This is okay, however, there is need to have an element of critically reviewing the literature to either demonstrate the strength or weakness which can be adapted by your study or recognised as a gap in which your study intends to fill. This section requires major improvement.

Methodology

Kindly rename section 3 as methodology or methods. You can then let readers know you are introducing FGCSQL.

In lines 172, 175, queries should be in lowercase please.

Kindly demonstrate your approach using another benchmark dataset and report your results.

Mathematical equations need to be indexed.

In line 405, kindly declare SOTA abbreviation early and use the abbreviation consistently.

Conclusion

Authors need to create a subsection to discuss the theoretical and practical implications of the results.

What about the limitations of your study? Limitation of approach needs to be clearly discussed. For example, what is the limitation of defining the filtering task as a classification process?

Author Response

Comments 1: Kindly rectify the gap between 80 - 82

Response 1: Thank you for pointing this out. We agree with this comment. Accordingly, we have revised the discontinuity between lines 80-82, which can be found on page 2, line 91 of the revised manuscript.

Comments 2: In the introduction section, kindly make the aim of the study clearer. This is essential for readers who are new to the field.

Response 2: Thank you for pointing this out. We agree with this comment. Accordingly, we have updated the research objective in the introduction to make it clearer and more accessible, which can be found on page 1, lines 36–46 of the revised manuscript.

Comments 3: In lines 93 – 99, kindly use bullet points to showcase the three steps.

Response 3: Thank you for pointing this out. We agree with this comment. Accordingly, we have presented the three steps using bullet points, which can be found on page 3, lines 100–109 of the revised manuscript.

Comments 4: How does your study differ and improve on existing studies (like the ones listed below)? This needs to be clearly stated - Wang, L., Zhang, A., Wu, K., Sun, K., Li, Z., Wu, H., ... & Wang, H. (2020, November). DuSQL: A large-scale and pragmatic Chinese text-to-SQL dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) (pp. 6923-6935). - Zhang, B., Ye, Y., Du, G., Hu, X., Li, Z., Yang, S., ... & Mao, H. (2024). Benchmarking the text-to-sql capability of large language models: A comprehensive evaluation. arXiv preprint arXiv: 2403.02951.

Response 4: Thank you for pointing this out. We agree with this comment. Accordingly, we have revised the contribution section in the introduction to clarify the differences and improvements of our study compared to existing work. The two papers provided by the reviewer—the first (DuSQL...) is a "dataset paper" focused on dataset creation and quality validation, while the second (Benchmarking...) is a "benchmarking paper" focused on establishing evaluation standards. These works differ significantly in focus from our article, and thus we did not cite them. This update can be found on page 3, lines 110–128 of the revised manuscript.

Comments 5: Literature reviewed provides the names of algorithms (as an example) developed by existing studies. This is okay, however, there is need to have an element of critically reviewing the literature to either demonstrate the strength or weakness which can be adapted by your study or recognised as a gap in which your study intends to fill. This section requires major improvement.

Response 5: Thank you for pointing this out. We agree with this comment. Accordingly, we have revised the related work section to include critical analysis, clarifying the strengths and weaknesses of prior studies and their implications for our research. This update can be found on page 4, lines 134–185 of the revised manuscript.

Comments 6: Kindly rename section 3 as methodology or methods. You can then let readers know you are introducing FGCSQL.

Response 6: Thank you for pointing this out. We agree with this comment. Accordingly, we have renamed Section 3 to "methodology." This modification can be found on page 5, line 186 of the revised manuscript.

Comments 7: In lines 172, 175, queries should be in lowercase please.

Response 7: Thank you for pointing this out. We have changed "Query" to lowercase in lines 172 and 175. This modification can be found on page 5, lines 200 and 203 of the revised manuscript.

Comments 8: Kindly demonstrate your approach using another benchmark dataset and report your results.

Response 8: Thank you for pointing this out. We agree with this comment. Accordingly, we have conducted additional experiments on the BIRD benchmark development set. These supplementary experiments can be found on page 13, lines 450-467 of the revised manuscript.

Comments 9: Mathematical equations need to be indexed.

Response 9: Thank you for pointing this out. We agree with this comment. Accordingly, we have indexed the two previously unnumbered equations in Section 3.2. These indices can be found as Equations (5) and (6) on page 8 of the revised manuscript.

Comments 10: In line 405, kindly declare SOTA abbreviation early and use the abbreviation consistently.

Response 10: Thank you for pointing this out. We agree with this comment. Accordingly, we have declared the SOTA abbreviation earlier and maintained consistent usage throughout the paper. This declaration can be found on page 3, line 127 of the revised manuscript.

Comments 11: Authors need to create a subsection to discuss the theoretical and practical implications of the results.

Response 11: Thank you for pointing this out. We agree with this comment. Accordingly, we have added Section 6.1 to discuss the theoretical and practical implications of our research findings. This addition can be found on page 20, lines 644-657 of the revised manuscript.

Comments 12: What about the limitations of your study? Limitation of approach needs to be clearly discussed. For example, what is the limitation of defining the filtering task as a classification process?

Response 12: Thank you for pointing this out. We agree with this comment. Accordingly, we have added Section 6.3 to explicitly discuss the limitations of our study. This addition can be found on page 21, lines 686-699 of the revised manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

See my comments in the document.

Comments for author File: Comments.pdf

Author Response

Comments 1: In Section 3, I see the authors claimed a three-stage pipeline improves Chinese Text-to-SQL tasks, but where is the theoretical justification for why this specific division (filtering, generating, correcting) is optimal? I would assume a comparison with alternative architectures, such as direct fine-tuning of LLMs without filtering, might be a good one.

Response 1: Thank you for pointing this out. We agree with this comment. Therefore, we have cited previous research at the beginning of Section 3 and explained the rationale for choosing the three-stage "filter-generate-correct" model by combining our thought process. This modification can be found on page 5, lines 187-197 of the revised manuscript.

Regarding the suggestion to "a comparison with alternative architectures, such as direct fine-tuning of LLMs without filtering" our ablation experiments in the initial draft already presented specific experimental results and analyses demonstrating the effectiveness of each component of our model. Relevant validations for the redundant item filtering encoder and SQL error correction decoder can be found in Section 5.3.2 on page 13 and Section 5.3.5 on page 17 of the revised manuscript.

Comments 2: Section 3 mentions that the multi-stage pipeline mitigates error propagation, but it does not provide a detailed error breakdown. I didn’t follow this part. Is there an ablation study analyzing how errors propagate across the three stages?

Response 2: Thank you for pointing this out. We agree with this comment. Therefore, we classified and statistically analyzed errors from sampled data, providing a detailed description of potential SQL parsing errors. We also analyzed correction results of the error correction decoder on sampled data to clarify specific error correction scenarios not fully addressed in ablation experiments. This content is presented as Section 5.5 and can be found on page 18, lines 604-642 of the revised manuscript.

Comments 3: I see the paper claims that the filtering encoder improves schema linking, but I don’t see a qualitative analysis of how well it selects relevant schema items.

Response 3: Thank you for pointing this out. We agree with this comment. Therefore, we selected examples with varying difficulty levels and expression styles, statistically plotted output probability heatmaps of the filtering encoder, and provided corresponding qualitative analyses to visually demonstrate its improvement on schema linking. This update can be found on page 14, lines 499-521 of the revised manuscript.

Comments 4: Section 3.2 states the IECQN method restructures questions, but where is the analysis provided on whether this restructuring biases model outputs or limits its generalization to unseen formats?

Response 4: Thank you for pointing this out. We agree with this comment. Therefore, we constructed two variants of IECQN (one altering structural elements at key positions, another weakening the command intensity of IECQN) to experimentally analyze output bias effects and model generalization capabilities. This update can be found on page 17, lines 570-603 of the revised manuscript.

Comments 5: The filtering encoder removes irrelevant schema items, but no ablation study investigates cases where important schema items are mistakenly filtered. Additionally, while the method uses column data types for filtering, how does it handle ambiguous column names (e.g., "age" appearing in multiple tables)?

Response 5: We appreciate the reviewer's valuable feedback. The issue of "important schema items being mistakenly filtered" is indeed crucial. In our method, the redundant schema filtering module achieves high AUC on validation sets through cross-encoder and Chinese semantic injection mechanisms (details in Section 5.3.1 of ablation studies), significantly reducing mis-filtering risks. For rare cases where critical items are mis-filtered, subsequent LLM parsing cannot generate SQL containing missing items, which constitutes a limitation discussed in Section 6.3.

For ambiguous column names shared across tables, their embeddings enhanced by data type information and Chinese semantic features are processed by the filtering encoder. The updated probability heatmap in Figure (a) of Section 5.3.2 visually demonstrates output probabilities for such columns. Specific analyses can be found on page 15, lines 511-515 of the revised manuscript.

Comments 6: The generative LLM produces SQL, but how does it handle multi-table joins?

Response 6: Thank you for your question. In our framework, the LLM-based SQL parser effectively handles multi-table joins through two key mechanisms:

Training Data-Driven Pattern Learning: During LoRA fine-tuning, the model is exposed to diverse question-SQL pairs involving multi-table scenarios, including explicit JOIN clauses and implicit foreign key relationships. This enables it to learn semantic mapping between Chinese question entities, filtered schema information (tables/columns), and corresponding join conditions.

Schema-Aware Context Construction: After redundancy filtering, the model receives concise yet complete table-column embeddings with injected Chinese semantics. This helps it dynamically identify required tables and infer join paths based on overlapping column names (e.g., table_a. id = table_b.a_id) or user-specified constraints, even when join logic is not explicitly mentioned in the question.

Comments 7: Minor: Why was the beam search size set to 16?

Response 7: We conducted a trade-off analysis between hardware limitations and algorithmic efficiency. Experiments show that when the beam size exceeds 16, although theoretically capable of improving generation quality, practical applications are constrained by GPU memory capacity with significantly diminishing marginal benefits. So we set the beam size B to 16. This update can be found on page 12, lines 427-434 of the revised manuscript.

Comments 8: Section 5 mainly compares FGCSQL with multi-stage models but omits direct comparison with end-to-end approaches (such as Transformer-based direct SQL generation). Please explain in the response letter why this direct comparison was excluded.

Response 8: Thank you for pointing this out. The core innovation of FGCSQL lies in addressing redundant information interference in complex database schemas and post-parsing SQL error correction for large models. Existing end-to-end methods (e.g., single Transformer models) typically generate SQL directly without explicitly handling redundant table/column information filtering or post-parsing logical error correction. Therefore, the solutions we compared against also adopt multi-stage optimization methods. Additionally, we conducted comparisons with ChatGPT's pure prompting approach. Through ablation experiments (Section 5.3.2), we demonstrate that directly inputting full schema information significantly degrades large model performance, indirectly validating potential shortcomings of end-to-end methods.

Comments 9: The use of NatSQL as an intermediate representation is mentioned, but why is NatSQL superior to other SQL simplification methods?

Response 9: Thank you for pointing this out. We agree with this observation. Therefore, we have explained our rationale in the methodology section. This update can be found on page 8, lines 289-301 of the revised manuscript. The updated text reads: "NatSQL simplifies queries while preserving core SQL functionalities, significantly reducing the generation complexity for LLMs. Its natural language-like expressions align better with Chinese semantic logic, alleviating the model's pressure to handle complex SQL syntax (such as deep nesting and HAVING clauses). Additionally, NatSQL and standard SQL support bidirectional lossless conversion, retaining complete semantic information for seamless plug-and-play usage. Therefore, we use the NatSQL corresponding to the original gold SQL query as the target response."

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Well done on the improvements made.

Reviewer 2 Report

Comments and Suggestions for Authors

Thanks the authors for their revisions and response letter. I don't have any further comments.

Article Menu

FGCSQL: A Three-Stage Pipeline for Large Language Model-Driven Chinese Text-to-SQL

Further Information

Guidelines

MDPI Initiatives

Follow MDPI