The Berkeley Natural Language Processing Group


Berkeley Coreference Resolution System


The Berkeley Coreference Resolution System is a state-of-the-art English coreference system described in the following papers:

Easy Victories and Uphill Battles in Coreference Resolution [PDF], [BibTeX]
Greg Durrett and Dan Klein.
EMNLP 2013.

Decentralized Entity-Level Modeling for Coreference Resolution [PDF], [BibTeX]
Greg Durrett, David Hall, and Dan Klein.
ACL 2013.

It takes as input text with annotations in the CoNLL format, then detects and resolves mentions in that text. The system is bundled with a preprocessor that can take raw text input, split it by sentences and tokens, and produce the necessary CoNLL annotation layers: POS tags, syntactic parses, and named entity chunks.


The README contains more information. The system is licensed under the GPLv3.

Download the system here (16MB tgz). The source code is mostly Scala, but the download includes a pre-built runnable .jar file that can be run with a standard JRE.

Download the models here (~300MB tgz). This package includes pre-trained models for both preprocessing (sentence splitting, parsing, and NER) and coreference (SURFACE and FINAL models from the paper), with different coreference models for the CoNLL data and for running on raw text.

Old Versions

Version 0.9: code, models

CoNLL 2012 Results

Results reported in the EMNLP 2013 paper are on the CoNLL 2011 test using version 5 of the official CoNLL scorer. The table below lists results on the CoNLL 2012 development and test sets using the latest reference implementation of the scoring script.

Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1 F1
Dev Surface 71.3165.6968.3961.6351.4356.0752.7153.5553.1259.19
Dev Final 72.8465.9969.2564.8152.8958.2555.0455.7355.3860.96
Test Surface 71.4965.8468.5560.5850.0154.7952.1051.8351.9658.43
Test Final 72.8565.8769.1863.5552.4757.4854.3154.3654.3460.33

The system should reproduce these to within small amounts of noise. If you have any questions or are curious about results not listed here, please email Greg Durrett.

Site designed by Jonathan K. Kummerfeld