Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

LR-SQL: A Supervised Fine-Tuning Method for Text2SQL Tasks Under Low-Resource Scenarios

Electronics 2025, 14(17), 3489; https://doi.org/10.3390/electronics14173489

by Wuzhenghong Wen¹

, Yongpan Zhang², Su Pan^1,*

, Yuwei Sun¹, Pengwei Lu² and Cheng Ding²

Reviewer 1: Anonymous

Reviewer 2:

Geeta Sandeep Nadella

Reviewer 3:

Samia Dardouri

Reviewer 4: Anonymous

Electronics 2025, 14(17), 3489; https://doi.org/10.3390/electronics14173489

Submission received: 4 July 2025 / Revised: 21 August 2025 / Accepted: 26 August 2025 / Published: 31 August 2025

(This article belongs to the Special Issue Advances in Data Security: Challenges, Technologies, and Applications)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper presents a novel framework, LR-SQL, for supervised fine-tuning for Text2SQL tasks. It introduces a schema linking model using a slice-based related table filtering task, and a SQL generation model. It also integrates Chain-of-Thought (CoT) reasoning to maintain inference quality. Experimental results show promising gains in memory efficiency with minimal loss in the execution accuracy. Overall, this is a valuable contribution that could be beneficial for the research community.

Comments for improvement:

1. The phrase “extensive experiments” appears multiple times in the manuscript (e.g., abstract, Line 72, Line 82), but its justification is unclear. Could the authors clarify how they justify “extensive”?

2. I suggest explicitly highlighting key contributions instead of just summarizing methods at the end of the introduction section, preferably in bullet or numbered format. That would help to highlight the value of this work.

3. Line 82 - Line 83: Experimental results are not methods. If the authors intend to only summarize methods, the experimental results should be moved to the evaluation section or omitted here.

4. Line 123 - Line 124: The statement “we cannot decompose the task” is unclear and possibly contradictory. Since LR-SQL itself is a decomposition-based framework, it would be helpful for the authors to clarify what they mean by this claim, and provide evidence or examples if possible.

5. Line 160 - Line 161: What if a table is too large to fit in a slice (e.g. exceeding the token limit)? In that case, how to ensure tables with foreign key relationships are in the same slice?

6. In the abstract, the authors state that their method results in a “negligible 0.6% decrease in overall Text2SQL Execution Accuracy.” However, I could not find this specific result in the main body of the paper. Please clarify.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

1. Clarify Slice Construction Details
- Issue: While the slice-based approach is introduced in the methodology section, the details of how slices are constructed (e.g., foreign key relationships, token balancing) are deferred to Appendix A. This can make the main text less self-contained.
- Suggestion: Provide a more concise summary of the slice construction process in the main text, perhaps with a high-level overview or pseudocode. For example:
- Mention that slices are formed based on foreign key relationships to preserve relational integrity.
- Highlight the role of `slice_token` in controlling slice size and how it balances token capacity across slices.
- Include a brief explanation of Algorithm A1 in the main text, summarizing its key steps (e.g., grouping tables with foreign key associations, ensuring token limits per slice).

2. Expand Discussion on Inference Latency
- Issue: The paper acknowledges that slicing increases inference time due to multi-slice processing, but this is only briefly mentioned in the conclusion. The trade-off between GPU memory reduction and inference latency is not fully explored.
- Suggestion: Dedicate a subsection in the "Discussion" or "Limitations" section to analyze the inference latency issue in more detail:
- Quantify the increase in inference time relative to baseline methods (e.g., DTS-SQL).
- Discuss potential optimizations, such as parallel inference or caching intermediate results, to mitigate the latency overhead.
- Compare the overall efficiency (memory vs. time) of LR-SQL against other methods, possibly using metrics like throughput or wall-clock time.

3. Enhance Dataset Description
- Issue: The Spider-Large and Spider-Medium datasets are described briefly, but their characteristics (e.g., domain diversity, query complexity) are not elaborated upon. This makes it harder for readers to assess the generalizability of the results.
- Suggestion: Provide more details about the dataset construction process, including:
- The distribution of table sizes, join types, and query complexities in Spider-Large and Spider-Medium.
- Examples of queries and corresponding SQL statements to illustrate the dataset's diversity.
- A comparison of these datasets with other benchmarks (e.g., WikiSQL, ATIS) to highlight their relevance and scale.

4. Compare with QLoRA Integration
- Issue: The paper mentions that LR-SQL can be combined with QLoRA to further reduce GPU memory usage, but no experimental results or details are provided.
- Suggestion: Include preliminary experiments or at least a discussion on how LR-SQL integrates with QLoRA:
- Describe how QLoRA modifies the parameter-efficient fine-tuning process in LR-SQL.
- Provide qualitative or quantitative insights into the additional memory savings achieved when combining LR-SQL with QLoRA.
- If possible, include a small experiment or ablation study demonstrating the synergy between LR-SQL and QLoRA.

5. Address Generalization Across Datasets
- Issue: The experiments are conducted on Spider-Large and Spider-Medium, which are derived from the same base dataset (Spider). While this provides consistency, it limits the assessment of LR-SQL’s generalizability to other domains or datasets.
- Suggestion: Extend the evaluation to additional datasets (e.g., WikiSQL, ATIS, or custom datasets with different schema structures) to demonstrate the method’s robustness:
- Include a brief analysis of how LR-SQL performs on datasets with varying levels of complexity or schema designs.
- Discuss any challenges encountered during adaptation to new datasets and how they were addressed.

6. Improve Visualization of Results
- Issue: While Tables 1, 2, and 3 present the results clearly, some figures (e.g., Figure 1) could benefit from additional annotations or comparisons.
- Suggestion: Enhance the visual representation of key findings:
- Add a legend or annotations to Figure 1 to clarify the relationship between token volume and GPU memory usage for different models (GLM4, Qwen2, DeepSeek).
- Include a comparative bar chart or line plot showing the reduction in GPU memory usage across different methods (LR-SQL, compromise, DTS-SQL).
- Use color coding or distinct markers to differentiate between models and methods in all figures and tables.

7. Discuss Limitations More Thoroughly
- Issue: The limitations section is concise but could be expanded to provide deeper insights into potential drawbacks.
- Suggestion: Elaborate on the following points:
- Scalability: How does LR-SQL perform with extremely large databases (e.g., >1000 tables)? Are there practical limits to the number of slices?
- Domain Bias: Does the method assume certain database schemas or query patterns? How sensitive is it to variations in schema design or query complexity?
- Training Efficiency: While GPU memory is reduced, does the training process become slower due to the slice-based approach? If so, quantify the trade-offs.

8. Provide Code and Reproducibility Details
- Issue: While the code is available on GitHub, the paper does not explicitly mention the version control or specific commits used for reproducibility.
- Suggestion: Ensure reproducibility by:
- Including a link to the exact commit hash or release tag of the GitHub repository.
- Providing a brief description of the setup requirements (e.g., hardware specifications, software dependencies).
- Documenting the hyperparameters and configurations used for each experiment.

9. Explore Alternative Slicing Strategies
- Issue: The current slicing strategy focuses on foreign key relationships, but other approaches (e.g., clustering based on column similarity or query frequency) could also be effective.
- Suggestion: Discuss alternative slicing strategies and their potential benefits:
- Explore whether clustering tables based on column similarity or query frequency could lead to more efficient slices.
- Conduct a brief ablation study comparing different slicing heuristics (e.g., foreign keys vs. column similarity) to validate the current approach.

10. Conclusion and Future Work
- Issue: The conclusion summarizes the key contributions but could benefit from a more forward-looking perspective.
- Suggestion: Expand the "Future Work" section to outline potential extensions:
- Investigating adaptive slice sizes based on query complexity or database statistics.
- Exploring hybrid approaches that combine LR-SQL with other memory-efficient techniques (e.g., quantization, sparsity).
- Applying LR-SQL to other downstream tasks (e.g., question answering, code generation) that involve large-scale input encoding.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

Comments for Authors

Introduction and Related Work:
The introduction clearly presents the motivation, challenges, and objectives of the study. The references are up-to-date and cover a wide range of related works, including PEFT methods, CoT reasoning, and recent LLMs applied to Text2SQL tasks. It provides a strong foundation.
Methodology:
The proposed LR-SQL framework, including slice-based schema linking and CoT reasoning injection, is well thought out and addresses a relevant bottleneck in supervised fine-tuning under memory constraints. However, the description of how slices are constructed and how CoT is implemented during training could benefit from more concrete examples or pseudocode earlier in the paper (not just in Appendix A).
Results and Discussion:
The experimental setup is detailed, and the evaluations (e.g., Table 1–5) are thorough. The comparative analysis with DTS-SQL, DB-GPT, and zero-shot methods is well-conducted. Still, the figures and tables could be improved in clarity, particularly:
- Adding consistent formatting (e.g., highlighting best values).
- Labeling plots more clearly, especially Figure 1, where axes and legends can be hard to read.
Conclusion and Limitations:
The paper appropriately summarizes contributions and acknowledges limitations such as increased inference time due to slicing. This strengthens its credibility.
Figures and Tables:
- Figure 2 does a good job of summarizing the pipeline, but the layout could be more reader-friendly with clearer labels and text alignment.
- Tables 1–5 are informative but would benefit from visual cues (e.g., bolding best scores, separating methods with better spacing).

Comments on the Quality of English Language

While the overall English in the manuscript is readable and understandable, there are multiple instances of grammatical errors, awkward phrasing, and overly dense sentences that could benefit from editing for clarity and precision. For example:

Wordiness and repetition:

“...the SQL generation model integrates the reasoning information from the aforementioned schema linking model to provide the final SQL statement.”

This could be simplified to:

“...the SQL generation model uses the reasoning from the schema linking model to generate the final SQL.”
Awkward or incorrect constructions:

“This is the first problem we focus on in this paper, , i.e., how to achieve...”

Here, there’s a double comma and the phrasing is informal. A clearer version:

“The first problem addressed in this paper is how to develop...”
Inconsistent tenses and word forms are also present in sections like the methodology and experiments.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Comments and Suggestions for Authors

The work is on the interesting topic of the supervised fine-tuning method for Text2SQL tasks under low-resource scenarios. Here are a few comments to improve work quality.
1) The introduction needs to be improved by highlighting (listing) key challenges and also explaining the contribution made to overcome those challenges. The challenges and contributions need further elaboration.
2) The literature review is very brief and short and not very focused on research gaps.
3) The rationale behind using the two losses (Eqs. 1 and 2) is not well explained.
4) What hyperparameters were searched, and what best parameters were obtained needed to be provided in tabular format.
5) Currently the accuracy is compared in precision and recall only. It is suggested to add more metrics or figures, such as areas under the curve, etc.
6) The work has not considered reviewing sequential models such as RNN, LSTM, and GRU commonly used in this domain, such as mentioned in smoothing and matrix Decomposition-Based Stacked Bidirectional GRU Model for Machine Downtime Forecasting.
7) The conclusion needs to be improved further by technically explaining the advantages of the proposed approach along with key managerial implications and future research works.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed all my previous comments. I suggest accepting this manuscript.

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have addressed my comments.

Article Menu

LR-SQL: A Supervised Fine-Tuning Method for Text2SQL Tasks Under Low-Resource Scenarios

Comments for Authors

Further Information

Guidelines

MDPI Initiatives

Follow MDPI