title bar

Speech and Language:
Unsupervised Latent-Variable Models

Workshop at NIPS 2008

Overview

Natural language processing (NLP) models must deal with the complex structure and ambiguity present in human languages. Because labeled data is unavailable for many domains, languages, and tasks, supervised learning approaches only partially address these challenges. In contrast, unlabeled data is cheap and plentiful, making unsupervised approaches appealing. Moreover, in recent years, we have seen exciting progress in unsupervised learning for many NLP tasks, including unsupervised word segmentation, part-of-speech and grammar induction, discourse analysis, coreference resolution, document summarization, and topic induction.

The goal of this one-day workshop is to bring together researchers from the unsupervised machine learning community and the natural language processing community to facilitate cross-fertilization of techniques, models, and applications. The workshop focus is on the unsupervised learning of latent representations for natural language. In particular, we are interested in structured prediction models which are able to discover linguistically sophisticated patterns from raw data. To provide a common ground for comparison and discussion, we provide a cleaned and preprocessed data set for the convenience of those who would like to participate. This data contains part-of-speech tags and parse trees in addition to raw sentences. An exciting direction in unsupervised NLP is the use of parallel text in multiple languages to provide additional structure on unsupervised learning. To that end, we will provide a bilingual corpus with word alignments, and encourage the participants to push the state-of-the-art in unsupervised NLP.

The workshop will be structured to emphasize discussion and the exchange of ideas, avoiding the mini-conference format. A centerpiece of the workshop will be invited talks by Andrew McCallum (UMass), Regina Barzilay (MIT) and Sharon Goldwater (Edinburgh). Andrew McCallum will talk about recent work on unsupervised learning with side information, Regina Barzilay will present models for learning from multilingual corpora and Sharon Goldwater's talk will be about Bayesian word segmentation. The invited talks are intended to provide an overview of recent success stories in unsupervised learning for natural language processing as well as to spark discussion. All contributed papers will be presented during two poster sessions between the invited talks in order to make the workshop more interactive.

Organizers

!IS_A_LIST Slav Petrov

!IS_A_LIST Aria Haghighi

!IS_A_LIST Percy Liang

!IS_A_LIST Dan Klein

Site designed by John DeNero