Victoria Lin (林曦)

Ph.D. Student
Paul G. Allen School of Computer Science & Engineering, Office 486
University of Washington, Seattle



About Me

I am a PhD student in Computer Science at the University of Washington, advised by Luke S. Zettlemoyer. My research area is semantic parsing and natural language understanding. I work on problems regarding the representation and extraction of structured knowledge from natural language.

Recently I have been collaborating with Michael D. Ernst from UW PLSE on program synthesis from natural language, using data-driven and deep-learning approaches. Our goal is to add a layer of natural abstraction to oceans of programming languages and APIs, giving programmers more convenient access to computer systems compared to traditional coding paradigm.

I was a PhD student of the late Ben Taskar.

Research Highlights

The Tellina project enables natural-language-based assistance for programming in scripting languages. Our initial prototype translates an imperative natural language sentence into an executable one-liner shell command, using neural encoder-decoder models which were augmented with slot-filling and other enhancements. We are developing hybrid translation models and expert annotation pipelines that scale to address irregularities in the target language and the lack of parallel training data.



Program Synthesis from Natural Language Using Recurrent Neural Networks.
Xi Victoria Lin, Chenglong Wang, Deric Pang, Kevin Vu, Luke Zettlemoyer, Michael D. Ernst.
University of Washington Department of Computer Science and Engineering technical report UW-CSE-17-03-01.
Pdf Abstract Bibtex Tellina Tool
Even if a competent programmer knows what she wants to do and can describe it in English, it can still be difficult to write code to achieve the goal. Existing resources, such as question-and-answer websites, tabulate specific operations that someone has wanted to perform in the past, but they are not effective in generalizing to new tasks, to compound tasks that require combining previous questions, or sometimes even to variations of listed tasks.

Our goal is to make programming easier and more productive by letting programmers use their own words and concepts to express the intended operation, rather than forcing them to accommodate the machine by memorizing its grammar. We have built a system that lets a programmer describe a desired operation in natural language, then automatically translates it to a programming language for review and approval by the programmer. Our system, Tellina, does the translation using recurrent neural networks (RNNs), a state-of-the-art natural language processing technique that we augmented with slot (argument) filling and other enhancements.

We evaluated Tellina in the context of shell scripting. We trained Tellina's RNNs on textual descriptions of file system operations and bash one-liners, scraped from the web. Although recovering completely correct commands is challenging, Tellina achieves top-3 accuracy of 80% for producing the correct command structure. In a controlled study, programmers who had access to Tellina outperformed those who did not, even when Tellina's predictions were not completely correct, to a statistically significant degree.
@techreport{LinWPVZE2017:TR, author = {Xi Victoria Lin and Chenglong Wang and Deric Pang and Kevin Vu and Luke Zettlemoyer and Michael D. Ernst},
title = {Program synthesis from natural language using recurrent neural networks},
institution = {University of Washington Department of Computer Science and Engineering},
number = {UW-CSE-17-03-01},
address = {Seattle, WA, USA},
month = mar,
year = {2017}

Conference Proceedings

Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text.
Kristina Toutanova, Xi Victoria Lin, Scott Wen-tau Yih, Hoifung Poon and Chris Quirk.
In proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL), August 7-12, 2016, Berlin, Germany.
Pdf Abstract Bibtex
Modeling relation paths has offered significant gains in embedding models for knowledge base (KB) completion. However, enumerating paths between two entities is very expensive, and existing approaches typically resort to approximation with a sampled subset. This problem is particularly acute when text is jointly modeled with KB relations and used to provide direct evidence for facts mentioned in it. In this paper, we propose the first exact dynamic programming algorithm which enables efficient incorporation of all relation paths of bounded length, while modeling both relation types and intermediate nodes in the compositional path representations. We conduct a theoretical analysis of the efficiency gain from the approach. Experiments on two datasets show that it addresses representational limitations in prior approaches and improves accuracy in KB completion.
author = {Kristina Toutanova, Xi Victoria Lin, Scott Wen-tau Yih, Hoifung Poon and Chris Quirk.},
title = {Compositional Learning of Embeddings for Relation Paths in Knowledge Bases and Text.},
booktitle = {ACL - Association for Computational Linguistics.},
year = 2016,
month = 08,
address={Berlin, Germany},
url={ACL - Association for Computational Linguistics.}

Peer-reviewed Workshop

Multi-label Learning with Posterior Regularization.
Xi Victoria Lin, Sameer Singh, Luheng He, Ben Taskar, and Luke Zettlemoyer.
NIPS Workshop on Modern Machine Learning and NLP, December 13, 2014, Montreal, Canada.
Pdf Abstract Bibtex
In many multi-label learning problems, especially as the number of labels grow, it is challenging to gather completely annotated data. This work presents a new approach for multi-label learning from incomplete annotations. The main assumption is that because of label correlation, the true label matrix as well as the soft predictions of classifiers shall be approximately low rank. We introduce a posterior regularization technique which enforces soft constraints on the classifiers, regularizing them to prefer sparse and low-rank predictions. Avoiding strict low-rank constraints results in classifiers which better fit the real data. The model can be trained efficiently using EM and stochastic gradient descent. Experiments in both the image and text domains demonstrate the contributions of each modeling assumption and show that the proposed approach achieves state-of-the-art performance on a number of challenging datasets..
author = {Xi Victoria Lin and Sameer Singh and Luheng He and Ben Taskar and Luke Zettlemoyer},
title = {Multi-label Learning with Posterior Regularization},
booktitle = {NIPS Workshop on Modern Machine Learning and Natural Language Processing},
year = 2014,
month = 12,
address={Montreal, Quebec, CA},


  • The Taskar Center for Accessible Technology (TCAT) was lauched by Anat Caspi in November, 2014. I am excited about its mission. Anat's unique perspective lead to innovative technology and products which would improve the quality of life for many.
  • I'm fascinated by different kinds of puzzles. At some point I tried to design a few: Sea Virus, Chocolate Crush.
  • I took Latin courses occassionally.