Next Article in Journal
Semantic-Enhanced Graph Convolutional Neural Networks for Multi-Scale Urban Functional-Feature Identification Based on Human Mobility
Next Article in Special Issue
Dynamic Graph Convolutional Network-Based Prediction of the Urban Grid-Level Taxi Demand–Supply Imbalance Using GPS Trajectories
Previous Article in Journal
Extension of RCC*-9 to Complex and Three-Dimensional Features and Its Reasoning System
Previous Article in Special Issue
Improving Three-Dimensional Building Segmentation on Three-Dimensional City Models through Simulated Data and Contextual Analysis for Building Extraction
 
 
Brief Report
Peer-Review Record

Is ChatGPT a Good Geospatial Data Analyst? Exploring the Integration of Natural Language into Structured Query Language within a Spatial Database

ISPRS Int. J. Geo-Inf. 2024, 13(1), 26; https://doi.org/10.3390/ijgi13010026
by Yongyao Jiang and Chaowei Yang *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4: Anonymous
ISPRS Int. J. Geo-Inf. 2024, 13(1), 26; https://doi.org/10.3390/ijgi13010026
Submission received: 12 September 2023 / Revised: 12 December 2023 / Accepted: 28 December 2023 / Published: 10 January 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Given that LLMs, represented by ChatGPT, are attracting lots of attention in academic fields, this study was conducted just in time to set an example of using ChatGPT for spatial queries. As a pioneer work, the paper has the potential to benefit the field of GIScience. However, some issues need to be addressed before it can be published.   

First, the title is too broad for the content. Geospatial analytics is much broader than the focus of this study, which is spatial database query. A more accurate title can better reflect the contributions.

Second, the questions to ChatGPT were in a natural ascending order based on complexity. However, I have an impression that the questions are less organized and possibly selected in a subject way. I suggest the authors to provide reasonable criteria and logic for question selection.

Third, an important aspect is to analyze across multiple spatial datasets. The authors could explore this aspect by asking advanced questions that require an understanding of all four test datasets.

Fourth, I do not understand the ‘temperature’ in section 4. Explanation is needed.

Fifth, real challenges in natural language processing such as the fuzziness of descriptions deserve to be evaluated. So some questions can be designed to target this aspect.

Lastly, the writing style can be modified to be more like scientific writing. But overall the paper is easy to read.

Comments on the Quality of English Language

the writing style can be modified to be more like scientific writing. But overall the paper is easy to read.

Author Response

Please check attached file for response to you as first reviewer since we have a figure. 

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This paper analyzes the use of large language models (LLMs), such as ChatGPT, to generate SQL queries for geospatial datasets based on natural language requests. It proposes a framework consisting of three phases: training, prompting, and parsing.

The research evaluates ChatGPT's performance on real-world data from New York City and concludes that while ChatGPT can be accurate and demonstrate quite impressive abilities in several cases, it still requires improvement for complex queries.

The paper is interesting, especially when evaluated as a concise contribution of only 8 pages. Essentially, the paper serves as a discussion of an experience-based experimentation.

 

I suggest two improvements to make the content more self-contained:

 

Provide and describe all the tables, schemas, and content of the database submitted to ChatGPT. For example, this could include details about the 'nyc_neighborhoods' table used in this experimentation.

 

Share the dataset, including the natural language questions, the generated SQL queries, and the results of running these queries, as a public repository.

Author Response

1. Provide and describe all the tables, schemas, and content of the database submitted to ChatGPT. For example, this could include details about the 'nyc_neighborhoods' table used in this experimentation.
Thanks for the suggestion! I have provided the schema for the 3 major tables in Figure 2 including “nyc_neighborhoods”. Listing the schema for all tables could be a bit overwhelming. Since we are also sharing the dataset through a public repository, I’m skipping the other 2 tables.
2. Share the dataset, including the natural language questions, the generated SQL queries, and the results of running these queries, as a public repository.
Good idea! I have published the data on Github and included the link at the end of section 3.1.

Reviewer 3 Report

Comments and Suggestions for Authors

This paper presents a basic discussion on text-to-SQL generation with respect to GIS. Most of the article are descriptions of elementary assignments over several popular geospatial databases. At the end, the authors present a minimalist benchmark, which, however, is not public.

 

The main weaknesses of the article are:

1. The state-of-the-art description is completely insufficient. There is a large body of ongoing research in text-to-SQL that cannot be ignored just because I focus on a specific subset of SQL.

2. the SQL queries presented in the paper are elementary, and the entire description in Chapter 3 is rather reminiscent of some popular article on medium.com.

3. A more detailed description of the benchmarking experiment is missing in Chapter 4, and the number of SQL queries tested is desperately small. If the article could have any interesting output, it would be those SQL queries. However, the SQL queries are unfortunately not published.

 

Positive aspects of the article:

1. The authors focus on a specific area of SQL querying. This could be interesting if the topic was elaborated on a larger scale and the bechmark was published.

Below are some links to interesting work in the text-to-SQL area that the authors could read and incorporate their results into their research. Many of them describe non-trivial text-to-SQL benchmarks that contain tens of thousands of SQL queries.

[1] Rajkumar, Nitarshan, Raymond Li, and Dzmitry Bahdanau. "Evaluating the text-to-sql capabilities of large language models." arXiv preprint arXiv:2204.00498 (2022).

[2] Qin, Bowen, et al. "A survey on text-to-sql parsing: Concepts, methods, and future directions." arXiv preprint arXiv:2208.13629 (2022).

[3] Wang, Bailin, et al. "Rat-sql: Relation-aware schema encoding and linking for text-to-sql parsers." arXiv preprint arXiv:1911.04942 (2019).

[4] Kim, Hyeonji, et al. "Natural language to SQL: Where are we today?." Proceedings of the VLDB Endowment 13.10 (2020): 1737-1750.

[5] Yu, Tao, et al. "Spider: A large-scale human-labeled dataset for complex and cross-domain semantic parsing and text-to-sql task." arXiv preprint arXiv:1809.08887 (2018).

[6] Li, Jinyang, et al. "Can llm already serve as a database interface? a big bench for large-scale database grounded text-to-sqls." arXiv preprint arXiv:2305.03111 (2023).

 

Later reconsideration:

Even though this is a brief report, it is impossible to ignore the ongoing research in the field. The publication of a bechmark is also possible with the report. I might be more moderate in judging the scope of the results and experiments in the paper. So in principle I wouldn't change anything substantive (in my review report).

Author Response

This paper presents a basic discussion on text-to-SQL generation with respect to GIS. Most of the articles are descriptions of elementary assignments over several popular geospatial databases. At the end, the authors present a minimalist benchmark, which, however, is not public.

The main weaknesses of the article are:
1. The state-of-the-art description is completely insufficient. There is a large body of ongoing research in text-to-SQL that cannot be ignored just because I focus on a specific subset of SQL.
Thanks for the suggestion! We’ve expanded the state-of-the-art description with more related work, including the major trends in automatic code generation and some of the suggested papers. Although a few studies have evaluated the accuracy of LLM in text-to-SQL in terms of general data science questions, not much research has touched the spatial query aspect, which is where our work comes into play. Please see the last two paragraphs in section 1 for more details.
2. The SQL queries presented in the paper are elementary, and the entire description in Chapter 3 is rather reminiscent of some popular article on medium.com. A more detailed description of the benchmarking experiment is missing in Chapter 4, and the number of SQL queries tested is desperately small.
Thanks for your input! We understand that there have been a few articles evaluating LLM’s performance in text-to-SQL in the general information/computer science domain. However, as a pioneer study, this is probably the first paper touching the spatial query aspect as far as we know (based on Google), there is a lack of benchmarking dataset for
spatial SQL query. We do NOT think we can/should use those general benchmarking dataset because they rarely cover the spatial aspect. As a starting point, we composed a relatively small set of questions for initial benchmarking. This explanation has been added to Section 3.3 and Section 4.
We do recognize the sample size in the benchmarking experiment is NOT large enough to produce statistically significant convincing results. A good benchmark should have at least thousands of entries and can represent all types of geospatial tasks. How to compose such a comprehensive benchmark in itself probably deserves another discussion/paper. As a future research direction, further benchmarking has been added as one of the future research directions in Section 5.
3. If the article could have any interesting output, it would be those SQL queries. However, the SQL queries are unfortunately not published.
Good point! We have published the benchmarking questions and query through  github repository.

Reviewer 4 Report

Comments and Suggestions for Authors

The article describes an interesting case of using LLMs in the context of the generation and execution of spatial SQL queries in DBMS . In my opinion, however, it does not constitute a research publication, as the research question and the contributions of the researchers are absent.

Secondary comments:

The methodology part, especially the prompt template and the parser are poorly described.

The temperature parameter should be explained in more detail. Some references are needed.

Figure 10 has the same caption as figure 9.

 

Later reconsideration:

The article describes an interesting case of using LLMs in the context of the generation and execution of spatial SQL queries in DBMS. Although is not a real research work, it could be published as a brief report. Comments: - Descibe better the contributions of the work. - The methodology part, especially the prompt template and the parser, are poorly described. - The temperature parameter should be explained in more detail. Some references are needed. - Figure 10 has the same caption as figure 9. - Minor editing of English language required Decision: Accept after minor revision (corrections to minor methodological errors and text editing)

 

Comments on the Quality of English Language

Minor English corrections needed (plural instead of singular in some cases)

Author Response

Describe better the contributions of the work.
We’ve expanded the state-of-the-art description with more related work, including the major trends in automatic code generation. Although a few studies have evaluated the accuracy of LLM in text-to-SQL in terms of general data science questions, not much research has touched the spatial query aspect, which is where our work comes into play. As a pioneer study, we explore the possibility of using natural language to interact with geospatial datasets with the help of LLMs in this paper. To achieve that, we also propose a framework to (1) train a LLM to understand your data, (2) generate geospatial SQL queries based on a natural language question, (3) send the SQL query to the backend database, (4) parse the database response back to human language. We hope that the
framework can serve as a proxy to improve the efficiency of geospatial database analysis and lower the barrier of geo-analytics. Please see the last two paragraphs in section 1 for more details.
2. The methodology part, especially the prompt template and the parser, are poorly described.
More explanation has been added in the methodology part. Please see section 2 for details.
3. The temperature parameter should be explained in more detail.
More detailed explanation has been added in section 4. “Temperature is a hyperparameter of LLM that regulates the randomness and creativity of the output of an LLM. The higher the value, the more flexible and creative the model would be. Increasing the temperature value typically makes the output more diverse but might also increase its likelihood of straying from the context. For example, a temperature of 0 is deterministic, meaning that the highest probability response is always selected.”
4. Some references are needed. - Figure 10 has the same caption as figure 9.
Good catch! Updated the caption of figure 10.
5. Minor editing of English language required
Done. We have went through the paper and modified the language to be more like scientific writing.

Round 2

Reviewer 4 Report

Comments and Suggestions for Authors

The authors have adequately addressed all comments.

Back to TopTop