title bar

The Berkeley Word Aligner

Initial Release -- July 10, 2007

Join the announcement mailing list (low traffic, we promise)

The BerkeleyAligner is a software package that combines the innovations of recent work in unsupervised word alignment at Berkeley. This package is meant both as an alternative to the ubiquitous GIZA++ and as a test bed for new alignment ideas.

We have released the source code and tools under the GPL in an online code repository so that the community can contribute to the project, submit bug and feature requests, and ask questions in a structured forum.

Software Package Highlights

!IS_A_LIST Joint training of IBM models, which reduces AER 32% relative to GIZA++

!IS_A_LIST Syntactic distortion model

!IS_A_LIST Posterior decoding heuristics

!IS_A_LIST Evaluation code including search for posterior thresholds

!IS_A_LIST tracks AER as the model is trained

!IS_A_LIST Faster and less memory-intensive than ever

!IS_A_LIST Easy integration with the Berkeley Parser (soon to come)

!IS_A_LIST No preprocessing; flexible input formats

!IS_A_LIST Documentation and helpful training scripts (more to come)

!IS_A_LIST Pure Java 1. 5 will run on any platform

!IS_A_LIST Open source and ready for extension

Publications

!IS_A_LIST Tailoring Word Alignments to Syntactic Machine Translation, John DeNero and Dan Klein, In proceedings of ACL 2007. [pdf] [slides]

!IS_A_LIST Alignment by Agreement, Percy Liang, Ben Taskar, and Dan Klein, In proceedings of NAACL 2006. [pdf] [slides] [bib]

Resources and References

!IS_A_LIST Berkeley Word Alignment Project is a project page for all word alignment related research at Berkeley.

!IS_A_LIST The Berkeley Parser integrates easily with the Berkeley Word Aligner's syntactic alignment model.

!IS_A_LIST GIZA++ is a widely used alternative word aligner published in 2001 and updated in 2003.

Site designed by John DeNero