CoSQL: A Conversational Text-to-SQL Challenge Towards Cross-Domain Natural Language Interfaces to Databases.
We present CoSQL, a corpus for building cross-domain, general-purpose database (DB) querying dialogue systems.
It consists of 30k+ turns plus 10k+ annotated SQL queries, obtained from a Wizard-of-Oz (WOZ) collection of 3k dialogues querying 200 complex DBs spanning 138 domains.
Each dialogue simulates a real-world DB query scenario with a crowd worker as a user exploring the DB and a SQL expert retrieving answers with SQL, clarifying ambiguous questions, or otherwise informing of unanswerable questions.
When user questions are answerable by SQL, the expert describes the SQL and execution results to the user, hence maintaining a natural interaction flow.
CoSQL introduces new challenges compared to existing task-oriented dialogue datasets: (1) the dialogue states are grounded in SQL, a domain-independent executable representation, instead of
domain-specific slot-value pairs, and (2) because testing is done on unseen databases, success requires generalizing to new domains.
CoSQL includes three tasks: SQL-grounded dialogue state tracking, response generation from
query results, and user dialogue act prediction.
We evaluate a set of strong baselines for each task and show that CoSQL presents significant challenges for future research.
Editing-based SQL Query Generation for Cross-Domain Context-Dependent Questions.
We focus on the cross-domain context-dependent text-to-SQL generation task.
Based on the observation that adjacent natural language questions are often linguistically dependent and their corresponding SQL queries tend to overlap, we utilize the interaction history by editing the previous predicted query to improve the generation quality.
Our editing mechanism views SQL as sequences and reuses generation results at the token level in a simple manner.
It is flexible to change individual tokens and robust to error propagation.
Furthermore, to deal with complex table structures in different domains, we employ an utterance-table encoder and a table-aware decoder to incorporate the context of the user utterance and the table schema.
We evaluate our approach on the SParC dataset and demonstrate the benefit of editing compared with the state-of-the-art baselines which generate SQL from scratch.
SParC: Cross-Domain Semantic Parsing in Context.
We present SParC, a dataset for cross-domain Semantic Parsing in Context.
It consists of 4,298 coherent question sequences (12k+ individual questions annotated with SQL queries), obtained from controlled user interactions with 200 complex databases over 138 domains.
We provide an in-depth analysis of SParC and show that it introduces new challenges compared to existing datasets.
SParC (1) demonstrates complex contextual dependencies,
(2) has greater semantic diversity, and
(3) requires generalization to new domains due to its cross-domain nature and the unseen databases at test time.
We experiment with two state-of-the-art text-to-SQL models adapted to the context-dependent, cross-domain setup.
The best model obtains an exact match accuracy of 20.2% over all questions and less than 10% over all interaction sequences, indicating that the cross-domain setting and the contextual phenomena of the dataset present significant challenges for future research.
Multi-Hop Knowledge Graph Reasoning with Reward Shaping.
Multi-hop reasoning is an effective approach for query answering (QA) over incomplete knowledge graphs (KGs). The problem can be formulated in a reinforcement learning (RL) setup, where a policy-based agent sequentially extends its inference path until it reaches a target. However, in an incomplete KG environment, the agent receives low-quality rewards corrupted by false negatives in the training data, which harms generalization at test time. Furthermore, since no golden action sequence is used for training, the agent can be misled by spurious search trajectories that incidentally lead to the correct answer. We propose two modeling advances to address both issues: (1) we reduce the impact of false negative supervision by adopting a pretrained one-hop embedding model to estimate the reward of unobserved facts; (2) we counter the sensitivity to spurious paths of on-policy RL by forcing the agent to explore a diverse set of paths using randomly generated edge masks. Our approach significantly improves over existing path-based KGQA models on several benchmark datasets and is comparable or better than embedding-based models.
NL2Bash: A Corpus and Semantic Parser for Natural Language Interface to the Linux Operating System.
Dataset & Code
We present new data and semantic parsing methods for the problem of mapping english sentences to Bash commands (NL2Bash). Our long-term goal is to enable any user to easily solve otherwise repetitive tasks (such as file manipulation, search, and application-specific scripting) by simply stating their intents in English. We take a first step in this domain, by providing a large new dataset of challenging but commonly used commands paired with their English descriptions, along with the baseline methods to establish performance levels on this task.
Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text.
Modeling relation paths has offered significant gains in embedding models for knowledge base (KB) completion. However, enumerating paths between two entities is very expensive, and existing approaches typically resort to approximation with a sampled subset. This problem is particularly acute when text is jointly modeled with KB relations and used to provide direct evidence for facts mentioned in it. In this paper, we propose the first exact dynamic programming algorithm which enables efficient incorporation of all relation paths of bounded length, while modeling both relation types and intermediate nodes in the compositional path representations. We conduct a theoretical analysis of the efficiency gain from the approach. Experiments on two datasets show that it addresses representational limitations in prior approaches and improves accuracy in KB completion.