Berkeley Coreference Resolution System
The Berkeley Coreference Resolution System is a state-of-the-art English coreference system described in the following papers:
It takes as input text with annotations in the CoNLL format, then detects and resolves mentions in that text. The system is bundled with a preprocessor that can take raw text input, split it by sentences and tokens, and produce the necessary CoNLL annotation layers: POS tags, syntactic parses, and named entity chunks.
Note that the Berkeley Entity Resolution System has a (mostly) superset of this system's functionality and gets improved coreference results.
Download the system here (16MB tgz). The source code is mostly Scala, but the download includes a pre-built runnable .jar file that can be run with a standard JRE.
Download the models here (~300MB tgz) (compatible with 1.0 and 1.1). This package includes pre-trained models for both preprocessing (sentence splitting, parsing, and NER) and coreference (SURFACE and FINAL models from the paper), with different coreference models for the CoNLL data and for running on raw text.
Version 1.0: code
CoNLL 2012 Results
Results reported in the EMNLP 2013 paper are on the CoNLL 2011 test using version 5 of the official CoNLL scorer. The table below lists results on the CoNLL 2012 development and test sets using v8.01 of the scoring script. These scores reflect version 1.1 of the system, which shuffles the input data, leading to improved performance and removing nondeterminism stemming from the order that files are returned by File.listFiles.
The system should reproduce these to within small amounts of noise. If you have any questions or are curious about results not listed here, please email Greg Durrett.