Relation-Aware Graph Transformer for SQL-to-Text Generation

: Generating natural language descriptions for structured representation (e.g., a graph) is an important yet challenging task. In this work, we focus on SQL-to-text, a task that maps a SQL query into the corresponding natural language question. Previous work represents SQL as a sparse graph and utilizes a graph-to-sequence model to generate questions, where each node can only communicate with k-hop nodes. Such a model will degenerate when adapted to more complex SQL queries due to the inability to capture long-term and the lack of SQL-speciﬁc relations. To tackle this problem, we propose a relation-aware graph transformer (RGT) to consider both the SQL structure and various relations simultaneously. Speciﬁcally, an abstract SQL syntax tree is constructed for each SQL to provide the underlying relations. We also customized self-attention and cross-attention strategies to encode the relations in the SQL tree. Experiments on benchmarks WikiSQL and Spider demonstrate that our approach yields improvements over strong baselines.


Introduction
SQL (Structured Query Language) is a vital tool to access databases. However, SQL is not easy to understand for the average person. SQL-to-text aims to convert a structured SQL program into a natural language description. It can help automatic SQL comment generation as well as build an interactive question answering system [1,2] for natural language interface to a relational database [3][4][5]. Besides, SQL-to-text is useful for searching SQL programs available on the Internet. Guo et al. [6] and Wu et al. [7] also demonstrated that SQL-to-text can assist the text-to-SQL task [8][9][10][11] by using SQL-to-text as data augmentation. In the real world, it can help people understand complex SQLs quickly by reading corresponding texts.
A naive idea is casting SQL-to-text as a Seq2Seq problem [12,13]. Taking the SQL sequence as input, a Seq2Seq model translates it to natural language. The main limitation is that when the SQL sequence becomes longer, the Seq2Seq model may fail to capture the dependency between complex conditions and operations. SQL is structural and can be converted into an abstract syntax tree, as Figure 1 illustrated. Generally, a tree is a special graph, so SQL-to-text can be modeled as a Graph-to-Sequence [14] task. Xu et al. [15] considers the intrinsic graph structure of a SQL query. They construct the SQL graph by representing each token in the SQL as a node in the graph, and concatenating different units (e.g., column names, operators, values) through SQL keyword nodes (e.g., SELECT, AND). By aggregating information from the K-hop neighbors through graph neural network (GNN, Scarselli et al. [16], 2008), each node obtains its contextualized embedding which will be accessed in the natural language decoding phase. Though simple and effective, it suffers from two main drawbacks: (1) poor generalization capability due to the sparsity of the constructed SQL graph, and (2) ignorance of relations between different node pairs, especially the relevance among column nodes. In particular, Xu et al. [15] only deals with the simple SQL sketch SELECT $AGG $COLUMN WHERE $COLUMN $OP $VALUE (AND $COLUMN $OP $VALUE) * . Only one column unit and one single table are mentioned in the sketch, and all constraints are organized via intersections of conditions in the WHERE clause. The model updates the contextualized embedding of each node by a K-step iteration. Each node will only communicate with its 1-hop neighbors in one iteration, thus each node can only "see" nodes within the distance of K at the end of iterations. The performance will easily deteriorate when we transfer to more complicated SQL sketches composed of multiple tables, GroupBy/HAVING/OrderBy/LIMIT clauses and nested SQLs.
As the example shown in Figure 1, a Graph2Seq model with K = 6 may work well on the simple SQL (shown in the left) while generalizing poorly on the complex SQL with a longer dependency distance (shown in the right). We find that two nodes may share high correlations even though they are far apart in both the serialized SQL query and the parsed abstract syntax tree. For instance, the columns mentioned in the same clause (intra-clause) are tightly related. See the example in Figure 1b. Users always require not only the last name, but also the first name of specific candidates. Similarly, there is a high probability that the column serving as one condition in the WHERE clause will also be requested exactly in SELECT clause (inter-clause). Previous work pays more attention on the syntactic structure of SQL, but neglects these potential relations at the semantic level.
To this end, we propose a Relation-aware Graph Transformer (RGT) to take into account both the abstract syntax tree of the query and the correlations between different node pairs. The entire node set is split into two parts: intermediate nodes and leaf nodes. Leaf nodes are usually raw table names or column words, plus some unary modifiers such as DISTINCT and MAX. Typically, these leaf nodes convey significant semantic information in the query. Intermediate nodes such as SELECT and AND inherently capture the tree structure of the underlying SQL query and connect the scattered leaf nodes. An example of constructed SQL tree is shown in Figure 2.
We introduce four types of relations into the SQL tree and propose two variants of cross-attention to capture the structural information. All relations are encoded by our proposed RGT model. As a SQL query may involve multiple tables, we first consider the relations among abstract concepts TABLE and COLUMN, called databse schema (DBS). Given two nodes representing TABLE or COLUMN, they might be two columns in the same table or two tables connected by a foreign key. We define 11 different types of DBS to describe such relations. Besides, the depth of node reflects the amount of information: deeper nodes contain more semantic information while shallower nodes have more syntactic information. We introduce directional relative depth (DRD) to capture the relative depth between intermediate nodes. As for leaf nodes, the most important relation is affiliation. For example, in Figure 2, the leaf nodes month and salary are connected to the COLUMN node, and the COLUMN and another leaf node val0 belong to the intermediate node >. These three leaf nodes are highly relevant. We use lowest common ancestor (LCA) to measure the closeness of two leaf nodes. As we can see, the LCA of node month and val0 is the node > in Figure 2. Furthermore, to leverage the tree structure of SQL, we use two cross-attention strategies, namely attention over ancestors (AOA) and attention over descendants (AOD). Attention over ancestors only allows leaf nodes to attend their ancestors, and attention over descendants forces intermediate nodes to attend only their descendants.
We conduct extensive experiments on benchmarks WikiSQL [17] and Spider [18] with various baseline models. For simple SQL sketches on WikiSQL, our RGT model outperforms the previous best Graph2Seq model [15] and achieves 31.2 BLEU. To the best of our knowledge, we are the first to perform SQL-to-text task on the SQL sketches that involves multiple tables and complex conditions. Results (28.84 BLEU) demonstrate that our model generalizes well compared with other alternatives.
Our main contributions are summarized as follows: • We propose a relation-aware graph transformer to consider various relations between node pairs in the SQL graph. • We are the first to perform the SQL-to-text task with much more complicated SQL sketches on the dataset Spider.

•
Extensive experiments show that our model is superior to various Seq2Seq and Graph2Seq models. Data and codes of our models and baselines will be public.
This paper is organized as follows: In Section 1, we introduce the task of SQL-to-text, analyze the existing problems of previous work and present our work on the whole. Then, we summarize related work in Section 2. After that, we clarify our method in detail in Section 3, including how to build the SQL tree and the architecture of our model. In Section 4, we conduct our experiments on two public datasets and report all the results. Finally, we conclude and show the expectation of future work in Section 5.

Related Work
Data-to-text Data-to-text intends to transform non-linguistic input data into the meaningful and coherent natural language text [19]. There are several types of the non-linguistic input data, such as a set of triples (the WebNLG challenge [20]) and some kinds of meaning representations (e.g., several slot-value pairs of the E2E dataset [21], the Abstract Meaning Representation (AMR) graph [22]). The key problem of this task is how to obtain a good representation of the input data. At first, researchers [23,24] cast the structured input data to sequence and adopt the sequence-to-sequence model, e.g., LSTM. However, this method neglects the intrinsic structure of the input data. To this end, a lot of graph-to-sequence models are proposed. In particular, refs. [25,26] encoded the input data based on a graph convolutional network (GCN [27]) encoder. Ref. [28] extended transformer to the graph input and proposed the graph transformer encoder. In this work, our model is based on the graph transformer encoder.
SQL-to-text This technique can leverage automatically generated SQL programs [17] to create additional (question, SQL) pairs, alleviating the annotation scarcity problem of the complicated text-to-SQL [29] task with data augmentation [6]. Earlier rule-based methods [30,31] heavily rely on researchers to design generic templates, which will inevitably produce rigid and canonical questions. Seq2Seq [13], Tree2Seq [32] and Graph2Seq [15] models have demonstrated their superiority over the traditional rule-based system. In this work, we propose a relation-aware graph transformer to take into account both the graph structure and various relations embedded in different node pairs.
Tree-to-sequence Tree-to-sequence model [32] aims to map a tree structure into a sequence. Each node gathers information from its children nodes when encoding. They apply this technique to neural machine translation. Specifically, they reorganize the input sequence in the source language as a tree according to its constituency structure. In our work, we construct a SQL tree and utilize the Tree LSTM [33] as a baseline.
Graph-to-sequence Graph convolution network (GCN, Kipf and Welling [27], 2016) and graph attention network (GAT, Veličković et al. [34], 2017) have been successfully applied in various tasks to obtain node embeddings. Every node updates its node embedding by aggregating information from its neighbors. There may be labeled relations or features on edges of the graph. Relations or edge features can be incorporated when aggregating information from neighbors [2,35,36] or calculating relevance weights between node pairs [37][38][39][40]. We adopt both strategies with our tailored relations for different node pairs.

SQL Tree Construction
The entire node set of the constructed SQL tree V is split into two categories: inter- Intermediate nodes include three abstract concepts (SQL, TABLE and COLUMN), seven SQL-clause keywords (SELECT, WHERE, etc.) and binary operators (>, <, =, etc.), while leaf nodes contain unary operators, raw table names and column words and placeholders for entity value (entity mentions such as "new york" are replaced with one special token val0 during preprocessing, called delexicalization). With this partition, the node embeddings of these two types can be updated using different relational information.
Starting from the root node SQL, we firstly append the clause-level nodes as its children (see Figure 2). Then concept abstraction nodes, TABLE and COLUMN, and relevant operator nodes are accordingly attached to their parents. Next, for node COLUMN and TABLE, we append all the raw words, aggregators, and distinct flags as leaf nodes. Our SQL Tree consists of three levels (see Figure 3): clause level, schema level, and token level. Table 1 shows all types of nodes.
• First, SQL is divided into some clauses such as SELECT clause, WHERE clause, nested SQL clause and so on (see Figure 3a). • Then, each clause is composed of several tables, columns, and some other binary operators. Considering that some table and column names have multiple tokens, we design two abstract nodes (TABLE and COLUMN) to address this problem (see Figure 3c). With these two abstract nodes, the clause nodes can be represented as shown in Figure 3b. Noticing that binary operators can be regarded as a relation between several nodes, we set them as intermediate nodes (parents of some children nodes). • For other unary operators and tokens (table and column), we put them on leaves.

Encoder Overview
The input features include trainable embeddings for all nodes and relations. We use X L ∈ R |V L |×d x and R L = [r L ij ] |V L |×|V L | to denote the set of leaf node embeddings and the relation matrix among leaf nodes. Symmetrically, X I ∈ R |V I |×d x and R I = [r I ij ] |V I |×|V I | for intermediate nodes.
The encoder is composed of K stacked blocks, as illustrated in Figure 4. The main component is relation-aware graph transformer (RGT), which takes as input the node embedding matrix X, the relation matrix R and a relation function E that extracts relation embeddings from R, and outputs the updated node matrix. Each block contains four modules: one RGT for intermediate nodes, one RGT for leaf nodes, and two cross-attention modules. In each block, node embeddings X I and X L are updated sequentially via selfattention and cross-attention. According to the dataflow in Figure 4, intermediate nodes are first updated by Then, leaf nodes attend intermediate nodes and update with RGT, Finally, intermediate nodes attend leaf nodes also, Subscripts in, mid, out are used to differentiate the inputs and outputs. Definitions of relation embedding functions E I rel and E L rel , relation matrix R I and R L , and module CrossAttention I←L (·, ·) and CrossAttention L←I (·, ·) will be elaborated later.

Relation-Aware Graph Transformer
We utilize Transformer [41] as the backbone of our model, which can be viewed as an instance of graph attention network (GAT, Veličković et al. [34], 2017) where the receptive field for each node is the entire node set. We view SQL tree as a special graph. Assume the where V is the vertex set and R is the relation matrix. Each node v i ∈ V has a randomly initialized embedding x i ∈ R d x . Shaw et al. [37] proposes to incorporate the relative position between nodes v i and v j into relevance score calculation and context aggregation step. Similarly, we adapt this technique to our framework by introducing additional relational vectors. Mathematically, given the relation matrix R, we construct a relation embedding function E rel to retrieve the feature vector e ij = E rel (r ij ) ∈ R d x /H for relation r ij . Then, the output embedding y i of node v i after one iteration layer is calculated viâ where FC(·) denotes a fully-connected layer, LayerNorm{·} is layer normalization trick [42], 1 ≤ h ≤ H is the multi-head index. The relation embedding function E rel is shared across different heads and multiple layers unless otherwise specified. For the convenience of discussion, we simplify the notation of our RGT encoding module into where X in = [x 1 ; · · · ; x |V| ] represent the matrix of input embeddings for all nodes.

Relations among Intermediate Nodes
As for intermediate nodes, we consider two types of relations: database schema (DBS) and directional relative depth (DRD). DBS considers the relations among abstract concepts TABLE and COLUMN. In total, we define 11 relations, which is a subset of relations proposed in Wang et al. [39]. For example, if node v I i and v I j are nodes of type COLUMN and they belong to the same table according to the database schema, the relation r DBS ij is SAME-TABLE. Table 2 shows the complete version of DBS relations. Mathematically, where the relation embedding function E DBS rel maps the relation category r DBS ij into a trainable vector e DBS ij .   With the assistance of the underlying directed SQL tree, we can build another relation matrix to indicate the accessibility and relative depth difference between two intermediate nodes v I i and v I j . Let d(v I i ) indicates the depth of node v I i , e.g., the depth of root SQL node is 1 (see Figure 4). Given the maximum depth difference D, where E DRD is the relation embedding module with 2D + 2 entries. One special entry represents the inaccessibility inf.
where affine transformation FC(·) is used to fuse relation features from two perspectives and tuple(·, ·) means the combination of relations.

Relations among Leaf Nodes
Leaf nodes mainly consist of raw words, plus a few unary operators as modifiers. Gathering all these nodes into a sequence s L following their original order in the SQL query, we can obtain relative position relation (RPR) among these leaf nodes. Assume the position of node v L i in s L is indexed by s L (v L i ) and D is the pre-defined maximum distance, relation features e RPR ij for nodes v L i and v L j is defined as Actually, E RPR rel stores the parameter matrix of shape (2D + 1) × (d x /H) for retrieval. Tokens in the same clause will cluster together in the sequence s L . Intuitively, r RPR ij with smaller absolute numerical value will capture the previously mentioned intra-clause relations.
Furthermore, we take into account the structure of SQL tree. Let LCA(v L i , v L j ) denotes the lowest common ancestor for leaf nodes v L i and v L j in the SQL tree. The relation feature e LCA ij is computed via The relation embedding function E LCA rel simply extracts the current node embedding of intermediate node LCA(v L i , v L j ) from X I mid and transforms it into dimension d x /H through a trainable linear layer. The relation between remote leaf nodes is reflected by the common ancestor node.
The complete relation embedding function E L rel for leaf nodes is constructed by combining both the flattened and tree-structured relations

Cross-Attention between Leaf and Intermediate Nodes
Module CrossAttention I←L (·, ·) collects features from leaf nodes V L to intermediate nodes where W e ∈ d x × d x is trainable parameters. Similarly, module CrossAttention L←I (·, ·) collects features from intermediate nodes V I to leaf nodes V L , such that the semantic information can be organized referring to the structural information. Rather than attending all the intermediate nodes, v L i only cares about its ancestors in the SQL tree. We call this strategy attention over ancestors (AOA), similar to AOD.

Decoder
After obtaining the final node embeddings X I out , X L out of intermediate and leaf nodes, we apply an LSTM-based [43] sequential decoder with copy mechanism [44] to generating the natural language sentence. Representations of the raw table and column words in leaf nodes will be extracted for a direct copy before decoding. Placeholders for entities such as val0 will be replaced with corresponding nouns (called lexicalization) during postprocessing. Specifically, we distinguish intermediate nodes from leaf nodes to capture the semantic and structural information differently. Given the final node embeddings X I out , X L out , the initial hidden state is where MaxPooling(·) is a function of transforming X n×d into x d×1 . In particular, For each time step t, we get the context vectors c I t and c L t , respectively.
, where Attention(·) is the same as the cross attention mentioned in Section 3.6.
Afterward, the concatenation of the context vectors and previous hidden state h t−1 is fed into the next step.
h t = LSTM([h t−1 ; c I t ; c L t ]) Considering there are many low-frequency words, we incorporate copy mechanism into the decoder. We use P vocab (y t ) and P copy (y t ) to denote the generation probability and copy probability of y t , respectively. Let p gen t denote the probability of generating a word at time t. P out (y t ) is the final output probability of y t . Then, where W out , W copy , and W gen are trainable parameters.

Dataset
WikiSQL We conduct experiments on WikiSQL with the latest version (The size of the latest version is 7019 less than used by [15]). SQLs in WikiSQL only contain SELECT and WHERE clauses with a short length. We utilize the official train/dev/test splits, ensuring each table only appears in a single split. This setup requires the model to generalize to unseen tables during inference.
Spider We also use Spider, a much more complex dataset. SQLs in Spider are much longer and the data size is much smaller compared to WikiSQL. Furthermore, some other complex grammars such as JOIN, HAVING and nested SQLs are also involved in Spider.
Thus, the task on Spider is much more difficult. Considering the test split is not public, we only use the train and dev splits.
The statistics of the two datasets are illustrated in Table 3.

Experiment Setup
Hyper parameters All our codes are implemented by Pytorch [45]. We utilize Adam [46] optimizer to train our models with a learning rate of 0.0001. The batch size is 32 for WikiSQL and 16 for Spider. Other hyperparameters can be found in Table 4. Note that the layer number of RGT of leaf nodes and intermediate nodes may not be identical. The update cycle (K) is equal to the minimum layer number. For example, RGT of intermediate nodes has 6 layers and 3 layers of leaf nodes. Thus, K is 3. Two RGT layers encode the intermediate nodes, and a single RGT layer encodes the leaf nodes for each cycle. The motivation is that the structure of SQLs in Spider are complex and vital. Thus, the intermediate nodes (structural part of SQL) require more layers to encode. To ensure fairness, all our embeddings (nodes and relations) are initialized randomly (the same as all baselines). We can also initialize all token embeddings (leaf nodes) with some pre-trained vectors (e.g., GloVe [47] and BERT [48]) to further boost the performance. Metric We use BLEU-4 [49] and NIST [50] as automatic metrics. Each SQL has a single reference in WikiSQL. In Spider, most SQLs have double references because many SQLs are corresponding to two different natural language expressions. However, there are two threats of this metric: (1) The results may fluctuate seriously. (2) BLUE-4 cannot fully evaluate the quality of the generated text. To alleviate the fluctuation of results, we run all our experiments 5 times with different random seeds. All results are obtained from the mteval-v14.pl(https://github.com/moses-smt/mosesdecoder/blob/master/scripts/ generic/mteval-v14.pl, accessed on 9 November 2021) script. Furthermore, we conduct a human evaluation on Spider to compare our model with the strongest baseline.
Data preprocessing For WikiSQL, we omit the FROM clause since all SQLs are only related to a single table. For Spider, we replace the table alias with its original name and remove the AS grammar. Additionally, the questions are delexicalized as mentioned before.

Baselines
For all baselines, the same attention-based [51] LSTM decoder with a copy mechanism is utilized, where only the schema-dependent items (table and column tokens) will be copied.
BiLSTM The encoder is a BiLSTM encoder with SQL sequences as input. We report both results with and without a copy mechanism for this baseline.
TreeLSTM The encoder is a Child-Sum TreeLSTM encoder [33] with our SQL Tree as input.
Transformer We investigate the effect of position embedding on the transformer. Specifically, we consider transformer encoder without position embedding, with absolute position embedding [41] and relative position embedding [37].
GCN/GAT Regarding the SQL Tree as a graph, we can employ Graph Neural Networks (GNN), such as Graph Convolutional Network (GCN) and Graph Attention Network (GAT). Additionally, we rerun the code of Xu et al. [15] (https://github.com/IBM/SQL-to-Text, accessed on 10 November 2020). Table 5 shows the main results, including seq2seq baselines, graph2seq baselines and our model. Our model relation-aware graph transformer (RGT) outperforms all baseline models on both WikiSQL and Spider in both BLEU and NIST. Specifically, RGT outperforms the strongest baseline transformer with relative position by 1.17 BLEU and 0.23 NIST on Spider, 0.42 BLEU, and 0.1 NIST on WikiSQL, indicating the effectiveness of our model. We discover that the GCN does not perform well compared to other baselines. GCN only cares about the structure of the graph without considering any special relations between nodes. We also notice that the transformer with a relative position works well even it only considers the relative position relation. This finding encourages us to consider more relations than the structure.

Ablation Study
To investigate the influence of relations and cross attention, we conduct two ablation studies, respectively. All our ablation studies are conducted on Spider.
Relation ablation In Table 6, pruning any relation leads to lower performance, indicating that all relations introduced to RGT are reasonable. Specifically, relations among leaf nodes seem more important, verifying the motivation of strengthening relations among semantic SQL tokens (column, table, and so on). We explain the effects of four relations as follows: • structural relations: Both DBS (DataBase Schema) and DRD (Directional Relative Depth) strengthen the structural representation, but they work differently. DBS is to capture relations about the database schema, such as relations between table and  table, table and column, and so on. DRD is to capture the hierarchical structure in SQL. For example (see Figure 2), both DESC node and COLUMN node (the most right two abstract nodes) are descendants of OrderBy node. To express the hierarchy, we incorporate direction into DRD. • semantic relations: Both LCA (Lowest Common Ancestor) and RPR (Relative Position Relation) enhance the semantic representation. For instance (see Figure 2), the model can realize month and salary are close and may belong to the same column or table with RPR. With LCA, the model ensures they belong to the same column then.

Cross attention ablation
To investigate how the cross attention mechanism affects the performance, we apply different combination of attention strategies in cross attention, namely attention over descendants (AOD), attention over ancestors (AOA), attention over full nodes (AOF) and no attention (None). Table 7 shows the experiment result. AOD + AOA works best, consistent with our expectations. We consider the cross attention is a balance problem. AOF can capture all kinds of relations, but may introduce more noises (information from less related nodes), while None would lose some vital information. For example, AOD + None performs better than AOD + AOF, which means in this case AOF would introduce more noises. Besides, AOD + None outperforms None + None, indicating that ignoring all relations would lead to poorer performance. In this task, we choose AOD + AOA as our attention strategy, which can catch relations among different types of nodes without introducing too much noise.

Human Evaluation
We randomly select 100 samples (∼20%) from the dev set of Spider to conduct the human evaluation. For the SQL-to-text task, we should evaluate the correctness and fluency of the generation. To assess the correctness, we recruited two CS students familiar with SQL to score generations. They were first asked to select the better one for correctness from two generations. Furthermore, we asked them to objectively count the number of correct generation for aggregator (MIN, MAX and so on), column (column in SQL) and operator (+, -, DESC, IN and so on). Then, we calculated the metrics (precision, recall, and f1), respectively. Additionally, we asked three native English speakers to evaluate the fluency and grammar correctness. Our model is evaluated against the strongest baseline (transformer with relative position). The results are illustrated in Table 8. The lower part of Table 8 shows the percentage of choosing the generation as more correct (line correctness) or fluent (line fluency), and the percentage of a generation being chosen both correct and fluent (line both). From the evaluation result, we can conclude that our model can generate more correct sentences with a comparable fluency.

Case Study
We show two examples generated by our model RGT and the transformer with relative position ( Figure 5). For the first example, both models can realize the type correctly, but the baseline fails to generate the pet. Our model can strengthen relations among tokens in one column, so the pet in the SQL would be a strong signal. For the second example, the baseline generates a more fluent sentence. There is a grammar error in the generation of our model (teacher is not correct), but teacher is matched with the SQL teacher in SQL. This phenomenon indicates that our model is more concerned with the relations among nodes. These two cases are consistent with our human evaluation conclusion.

Conclusions
In this paper, we propose a relation-aware graph transformer (RGT) for complex SQL-to-Text generation. When learning the representation of each token in a SQL, multiple relations are considered in our model. Extensive experiments on two datasets WikiSQL and Spider show that our proposed model outperforms strong baselines including Seq2Seq models and Graph2Seq models.
There are two lines of work we can finish in the future. First, we can apply our SQL-totext model to augment more text and SQL pairs to boost the performance of the text-to-SQL model by generating lots of SQLs automatically. In detail, we can make some SQL templates by handcrafting rules. Based on these templates, a lot of SQL queries can be generated, and then our SQL-to-text model transforms them into texts. These augmented text and SQL pairs can assist to train the text-to-SQL model. Second, we can extend our method to a more general task, e.g., code-to-text. Our model is appropriate to encode the abstract syntax tree of the programming language.