Nabil: A Text-to-SQL Model Based on Brain-Inspired Computing Techniques and Large Language Modeling
Abstract
1. Introduction
- A brain-inspired natural language semantic encoding algorithm was proposed. This algorithm effectively improves the model’s generalization capabilities by sparsifying spike trains.
- A spatiotemporal feature fusion algorithm was proposed. By dynamically adjusting the weights of brain-inspired encoding features and those of a large language model, it achieves cross-modal fusion of features from brain-inspired spiking neural networks and large language models.
- A candidate SQL generation algorithm was proposed. By manipulating prompt templates, examples, and structures, it reduces the ineffective search space of the large language model and improves the efficiency of SQL generation.
- A champion model was proposed. By parsing SQL queries into abstract syntax trees and automatically aligning dialects using regular expressions, the same test data can be run on multiple engines, further enhancing the model’s credibility.
2. Related Works
3. Methodology
3.1. Preliminaries
3.2. Nabil Model
3.3. Nabil Model Algorithm
Algorithm 1: Nabil Model Algorithm. Source: author’s contribution | |
Input: Q = Natural Language {} | |
Output: The best SQL | |
1 | embeddings = Embedding |
2 | spike_sequence = PulseCodingLayer(embeddings) |
3 | snn_features = MultilayerSpikingRNN(spike_sequence) # Multilayer LIF neurons |
4 | nl_semantic_features = TemporalPooling(snn_features) |
5 | llm_features = LLM_Encoder # LLM outputs semantic features |
6 | snn_aligned = LinearProjection_SNN(nl_semantic_features) |
7 | llm_aligned = LinearProjection_LLM(llm_features) |
8 | joint_features = FusionModule(snn_aligned, llm_aligned) # Splicing + MLP fusion |
9 | prompts = PromptOptimization(joint_features) |
10 | candidate_sqls = [] |
11 | for prompt in prompts: |
12 | sql = LLM_GenerateSQL(prompt) |
13 | candidate_sqls.append(sql) |
14 | topk_candidates = CandidateSQLRanking(candidate_sqls, joint_features, k = TopK) |
15 | champion_sql = None |
16 | test_data = LLM_GenerateTestData(reference_sql, topk_candidates) |
17 | for sql in topk_candidates: |
18 | res_duck = DuckDB_Execute(sql, test_data) |
19 | res_mysql = MySQL_Execute(sql, test_data) |
20 | res_pg = Postgres_Execute(sql, test_data) |
21 | results.append([res_duck, res_mysql, res_pg]) |
22 | for i, sql in enumerate(topk_candidates): |
23 | if results[i][0] == results[i][1] == results[i][2] == LLM_JudgeEquivalence(reference_sql, sql): |
24 | champion_sql = sql |
25 | break |
26 | if champion_sql is None: |
27 | champion_sql = LLM_JudgeEquivalence (reference_sql, topk_candidates) |
28 | Return champion_sql |
3.4. Nabil Model Normalization Module Algorithm
Algorithm 2: Nabil Model Normalization Module Algorithm. Source: author’s contribution | |
Input: SQL | |
Output: Different SQL that can be executed in multiple engines | |
1 | function normalize (sql_input, target): |
2 | # 1. Pre-cleaning to remove obvious analytical obstacles |
3 | sql = strip_semicolon(sql_input) |
4 | sql = drop_dollar_prefix(sql) |
5 | sql = rename_numeric_alias(sql) |
6 | # 2. Abstract Syntax Tree |
7 | for read_style in [target, generic]: |
8 | try: |
9 | sql = sqlglot_transpile (sql,read_style,target, keep_identifiers = True) |
10 | break |
11 | except TranspileError: |
12 | continue |
13 | # 3. SQL dialect gap repair |
14 | if target == “mysql”: |
15 | sql = mysql_remove_nulls(sql) |
16 | sql = mysql_remove_range(sql) |
17 | sql = mysql_bool_to_int(sql) |
18 | sql = mysql_fix_funcs(sql) |
19 | elif target == “postgres”: |
20 | sql = pg_concat_to_pipe(sql) |
21 | sql = pg_explicit_cast(sql) |
22 | sql = pg_cast_in_agg(sql) |
23 | elif target == “duckdb”: |
24 | sql = duckdb_minor_fix(sql) |
25 | # 4. Output the final SQL |
26 | return compress_whitespace(sql) |
3.5. Nabil Model Syntax Tree Abstraction Process
4. Experimental Results and Analysis
4.1. Datasets
4.2. Evaluation Metrics
4.3. Performance Comparison Experiment
4.4. Ablation Comparison Experiment
4.5. Large Language Model Comparison Experiment
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Wang, J.; Liu, S.; Chen, Z.; Shen, T.; Wang, Y.; Yin, R.; Liu, H.; Liu, C.; Shen, C. Ultrasensitive electrospinning fibrous strain sensor with synergistic conductive network for human motion monitoring and human-computer interaction. J. Mater. Sci. Technol. 2025, 213, 213–222. [Google Scholar] [CrossRef]
- Meng, Q.; Yan, Z.; Abbas, J.; Shankar, A.; Subramanian, M. Human–computer interaction and digital literacy promote educational learning in pre-school children: Mediating role of psychological resilience for kids’ mental well-being and school readiness. Int. J. Hum.-Comput. Interact. 2025, 41, 16–30. [Google Scholar] [CrossRef]
- Mehonic, A.; Kenyon, A.J. Brain-inspired computing needs a master plan. Nature 2022, 604, 255–260. [Google Scholar] [CrossRef]
- Guo, J.; Zhan, Z.; Gao, Y.; Xiao, Y.; Lou, J.G.; Liu, T.; Zhang, D. Towards complex text-to-sql in cross-domain database with intermediate representation. arXiv 2019, arXiv:1905.08205. [Google Scholar]
- Sen, J.; Lei, C.; Quamar, A.; Özcan, F.; Efthymiou, V.; Dalmia, A.; Stager, G.; Mittal, A.; Saha, D.; Sankaranarayanan, K. Athena++ natural language querying for complex nested sql queries. Proc. VLDB Endow. 2020, 13, 2747–2759. [Google Scholar] [CrossRef]
- Liu, J.; Cui, Q.; Cao, H.; Shi, T.; Zhou, M. Auto-conversion from Natural Language to Structured Query Language using Neural Networks Embedded with Pre-training and Fine-tuning Mechanism. In Proceedings of the 2020 Chinese Automation Congress (CAC), Shanghai, China, 6–8 November 2020; IEEE: New York City, NY, USA; pp. 6651–6654. [Google Scholar]
- Marcus, R.; Negi, P.; Mao, H.; Tatbul, N.; Alizadeh, M.; Kraska, T. Bao: Making learned query optimization practical. In Proceedings of the 2021 International Conference on Management of Data, Xi’an, China, 20–25 June 2021; pp. 1275–1288. [Google Scholar]
- Cao, R.; Chen, L.; Chen, Z.; Zhao, Y.; Zhu, S.; Yu, K. LGESQL: Line graph enhanced text-to-SQL model with mixed local and non-local relations. arXiv 2021, arXiv:2106.01093. [Google Scholar]
- Sioulas, P.; Ailamaki, A. Scalable multi-query execution using reinforcement learning. In Proceedings of the 2021 International Conference on Management of Data, Xi’an, China, 20–25 June 2021; pp. 1651–1663. [Google Scholar]
- Ahkouk, K.; Machkour, M.; Ennaji, M. Data agnostic RoBERTa-based natural language to SQL query generation. In Proceedings of the IEEE 6th International Conference for Convergence in Technology (I2CT), Pune, India, 2–4 April 2021. [Google Scholar]
- Zhao, C.; Su, Y.; Pauls, A.; Platanios, E.A. Bridging the generalization gap in text-to-SQL parsing with schema expansion. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 5568–5578. [Google Scholar]
- Hui, B.; Geng, R.; Wang, L.; Qin, B.; Li, B.; Sun, J.; Li, Y. S2SQL: Injecting Syntax to Question-Schema Interaction Graph Encoder for Text-to-SQL Parsers. arXiv 2022, arXiv:2203.06958. [Google Scholar]
- Yu, X.; Chai, C.; Li, G.; Liu, J. Cost-based or learning-based? A hybrid query optimizer for query plan selection. Proc. VLDB Endow. 2022, 15, 3924–3936. [Google Scholar] [CrossRef]
- Gan, Y.; Chen, X.; Huang, Q.; Purver, M. Measuring and improving compositional generalization in text-to-sql via component alignment. arXiv 2022, arXiv:2205.02054. [Google Scholar]
- Fu, H.; Liu, C.; Wu, B.; Li, F.; Tan, J.; Sun, J. Catsql: Towards real world natural language to sql applications. Proc. VLDB Endow. 2023, 16, 1534–1547. [Google Scholar] [CrossRef]
- Chen, Z.; Chen, S.; White, M.; Mooney, R.; Payani, A.; Srinivasa, J.; Su, Y.; Sun, H. Text-to-SQL error correction with language models of code. arXiv 2023, arXiv:2305.13073. [Google Scholar]
- Gu, Z.; Fan, J.; Tang, N.; Cao, L.; Jia, B.; Madden, S.; Du, X. Few-shot text-to-sql translation using structure and content prompt learning. Proc. ACM Manag. Data 2023, 1, 1–28. [Google Scholar] [CrossRef]
- Giaquinto, R.; Zhang, D.; Kleiner, B.; Li, Y.; Tan, M.; Bhatia, P.; Nallapati, R.; Ma, X. Multitask pretraining with structured knowledge for text-to-SQL generation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Toronto, ON, Canada, 9–14 July 2023; pp. 11067–11083. [Google Scholar]
- Lee, K.; Dutt, A.; Narasayya, V.; Chaudhuri, S. Analyzing the impact of cardinality estimation on execution plans in microsoft SQL server. Proc. VLDB Endow. 2023, 16, 2871–2883. [Google Scholar] [CrossRef]
- Ba, J.; Rigger, M. Testing database engines via query plan guidance. In Proceedings of the 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE), Melbourne, Australia, 14–20 May 2023; IEEE: New York City, NY, USA; pp. 2060–2071. [Google Scholar]
- Chen, T.; Gao, J.; Chen, H.; Tu, Y. Loger: A learned optimizer towards generating efficient and robust query execution plans. Proc. VLDB Endow. 2023, 16, 1777–1789. [Google Scholar] [CrossRef]
- Li, R.; Zhao, K.; Yu, J.X.; Wang, G. CardOOD: Robust Query-driven Cardinality Estimation under Out-of-Distribution. arXiv 2024, arXiv:2412.05864. [Google Scholar]
- Fan, Y.; Ren, T.; Huang, C.; He, Z.; Wang, X.S. Grounding Natural Language to SQL Translation with Data-Based Self-Explanations. arXiv 2024, arXiv:2411.02948. [Google Scholar] [CrossRef]
- Fan, J.; Gu, Z.; Zhang, S.; Zhang, Y.; Chen, Z.; Cao, L.; Li, G.; Madden, S.; Du, X.; Tang, N. Combining small language models and large language models for zero-shot NL2SQL. Proc. VLDB Endow. 2024, 17, 2750–2763. [Google Scholar] [CrossRef]
- Liu, C.; Liao, W.; Xu, Z. Research on natural language query to SQL method with fused table structure. In Proceedings of the 2024 5th International Conference on Computer Engineering and Application (ICCEA), Hangzhou, China, 12–14 April 2024; IEEE: New York City, NY, USA; pp. 564–567. [Google Scholar]
- Kim, H.; Jeon, T.; Choi, S.; Choi, S.; Cho, H. FLEX: Expert-level False-Less EXecution Metric for Reliable Text-to-SQL Benchmark. arXiv 2024, arXiv:2409.19014. [Google Scholar]
- Mao, W.; Wang, R.; Guo, J.; Zeng, J.; Gao, C.; Han, P.; Liu, C. Enhancing Text-to-SQL Parsing through Question Rewriting and Execution-Guided Refinement. In Proceedings of the Findings of the Association for Computational Linguistics ACL 2024, Bangkok, Thailand, 11–16 August 2024; pp. 2009–2024. [Google Scholar]
- Xie, X.; Xu, G.; Zhao, L.; Guo, R. OpenSearch-SQL: Enhancing Text-to-SQL with Dynamic Few-shot and Consistency Alignment. arXiv 2025, arXiv:2502.14913. [Google Scholar] [CrossRef]
- Chen, K.; Chen, Y.; Koudas, N.; Yu, X. Reliable Text-to-SQL with Adaptive Abstention. Proc. ACM Manag. Data 2025, 3, 1–30. [Google Scholar] [CrossRef]
- Castelein, J.; Aniche, M.; Soltani, M.; Panichella, A.; van Deursen, A. Search-based test data generation for SQL queries. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018; pp. 1220–1230. [Google Scholar]
- Chu, S.; Murphy, B.; Roesch, J.; Cheung, A.; Suciu, D. Axiomatic foundations and algorithms for deciding semantic equivalences of SQL queries. arXiv 2018, arXiv:1802.02229. [Google Scholar] [CrossRef]
- Zhou, Q.; Arulraj, J.; Navathe, S.; Harris, W.; Xu, D. Automated verification of query equivalence using satisfiability modulo theories. Proc. VLDB Endow. 2019, 12, 1276–1288. [Google Scholar] [CrossRef]
- Zhou, Q.; Arulraj, J.; Navathe, S.B.; Harris, W.; Wu, J. SPES: A symbolic approach to proving query equivalence under bag semantics. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; IEEE: New York City, NY, USA; pp. 2735–2748. [Google Scholar]
- Wang, S.; Pan, S.; Cheung, A. QED: A Powerful Query Equivalence Decider for SQL. Proc. VLDB Endow. 2024, 17, 3602–3614. [Google Scholar] [CrossRef]
- He, Y.; Zhao, P.; Wang, X.; Wang, Y. VeriEQL: Bounded Equivalence Verification for Complex SQL Queries with Integrity Constraints. Proc. ACM Program. Lang. 2024, 8, 1071–1099. [Google Scholar] [CrossRef]
- Zhang, Y.; Qu, P.; Ji, Y.; Zhang, W.; Gao, G.; Wang, G.; Song, S.; Li, G.; Chen, W.; Zheng, W.; et al. A system hierarchy for brain-inspired computing. Nature 2020, 586, 378–384. [Google Scholar] [CrossRef]
- Du, X.; Hu, S.; Zhou, F.; Wang, C.; Nguyen, B.M. FI-NL2PY2SQL: Financial Industry NL2SQL Innovation Model Based on Python and Large Language Model. Future Internet 2025, 17, 12. [Google Scholar] [CrossRef]
- Du, X.; Guo, X.; Zhou, F.; Gu, M.; Lu, Z.; Wang, C. FinDS2: A Novel Data Synthesis System for Fintech Product Risks. In Proceedings of the 2024 IEEE 11th International Conference on Cyber Security and Cloud Computing (CSCloud), Shanghai, China, 28–30 June 2024; IEEE: New York City, NY, USA; pp. 73–78. [Google Scholar]
- Li, J.; Hui, B.; Qu, G.; Yang, J.; Li, B.; Li, B.; Wang, B.; Qin, B.; Geng, R.; Huo, N.; et al. Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls. Adv. Neural Inf. Process. Syst. 2023, 36, 42330–42357. [Google Scholar]
Researcher | Research Content | Advantages | Disadvantages |
---|---|---|---|
Guo, J. [4] | Text-to-SQL | Modular and interpretable | Limited expressive capabilities |
Chen, T. [21] | Query optimization | Strong robustness | Only supports SPJ queries |
Mao, W. [27] | Text-to-SQL | Reduce natural language ambiguity | Risk of excessive rewriting |
Xie, X. [28] | Text-to-SQL | Few-shot automatic expansion | Limited SQL-like capabilities |
Chen, K. [29] | Text-to-SQL | Improve accuracy and reliability | Excessive abandonment |
Wang, S. [34] | SQL equivalence verification | Fast reasoning speed | Does not support complex queries |
He, Y. [35] | SQL equivalence verification | Considering integrity constraints | Does not support complex queries |
Field | Description |
---|---|
db_id | Database Name |
question | Questions curated by human crowdsourced resources based on database descriptions and content. |
evidence | External knowledge evidence annotated by experts is used to assist models or SQL annotators. |
SQL | Questions are accurately answered using SQL annotated by crowdsourced resources, including database descriptions and content. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhou, F.; Hu, S.; Du, X.; Li, N.; Zhou, T.; Zhao, Y.; Shang, S.; Ling, X.; Zhu, H. Nabil: A Text-to-SQL Model Based on Brain-Inspired Computing Techniques and Large Language Modeling. Electronics 2025, 14, 3910. https://doi.org/10.3390/electronics14193910
Zhou F, Hu S, Du X, Li N, Zhou T, Zhao Y, Shang S, Ling X, Zhu H. Nabil: A Text-to-SQL Model Based on Brain-Inspired Computing Techniques and Large Language Modeling. Electronics. 2025; 14(19):3910. https://doi.org/10.3390/electronics14193910
Chicago/Turabian StyleZhou, Feng, Shijing Hu, Xiaozheng Du, Nan Li, Tongming Zhou, Yanni Zhao, Sitong Shang, Xufeng Ling, and Huaizhong Zhu. 2025. "Nabil: A Text-to-SQL Model Based on Brain-Inspired Computing Techniques and Large Language Modeling" Electronics 14, no. 19: 3910. https://doi.org/10.3390/electronics14193910
APA StyleZhou, F., Hu, S., Du, X., Li, N., Zhou, T., Zhao, Y., Shang, S., Ling, X., & Zhu, H. (2025). Nabil: A Text-to-SQL Model Based on Brain-Inspired Computing Techniques and Large Language Modeling. Electronics, 14(19), 3910. https://doi.org/10.3390/electronics14193910