Word alignment
Word alignment, a facet of machine translation, is the task of aligning the words of sentence-aligned parallel corpora. We have developed state-of-the-art supervised techniques, and unsupervised techniques competitive with even supervised methods. Recently, we have worked to correct pathological errors in alignments for syntactic translation.
We recently released the BerkeleyAligner, an unsupervised word aligner, as an open source project.
Unsupervised Alignment
The core innovation we have explored is a training method we call cross-EM, which jointly trains two conditional alignment models (the classic HMM word models) and propagates information between them. Our joint training technique reduced alignment error rate by 32%, handily outperforming GIZA++. Software to train and decode these models appears below.
Syntax Sensitive Alignments
In syntax-based translation systems, the training corpus is typically parsed and aligned independently. The example below highlights the negative consequence: alignment errors can violates the constituent structure of the parse tree.

We have developed an unsupervised model and a decoding heuristic that together eliminate errors like these. Software based on this work also appears below.
Supervised Alignment
We developed a large-margin approach to word alignment based on formulating a maximum weighted matching problem. The model allows us to condition on arbitrary features of the input sentence pair, including orthographic content and scores from other models. Our model also captures first-order effects across alignments.
Software
- The BerkeleyAligner includes software for cross-EM training of unsupervised models, syntax-sensitive distortion models, and a suite of decoding heuristics.
- Cross-EM Aligner: an unsupervised word-aligner written in Java based on the Alignment by Agreement paper.
Publications
- Tailoring Word Alignments to Syntactic Machine Translation, John DeNero and Dan Klein, In proceedings of ACL 2007. [pdf] [slides]
- Alignment by Agreement, Percy Liang, Ben Taskar, and Dan Klein, In proceedings of NAACL 2006. [pdf] [slides] [bib]
- Word Alignment Via Quadratic Assignment, Simon Lacoste-Julien, Ben Taskar, Dan Klein, and Michael Jordan, In proceedings of NAACL 2006. [pdf] [bib]
- A Discriminative Matching Approach to Word Alignment, Ben Taskar, Simon Lacoste-Julien, and Dan Klein, In proceedings of EMNLP 2005. [pdf] [bib]

