Berkeley Coreference Resolution System

Overview

The Berkeley Coreference Resolution System is a state-of-the-art English coreference system described in the following papers:

Easy Victories and Uphill Battles in Coreference Resolution [PDF], [BibTeX]
Greg Durrett and Dan Klein.
EMNLP 2013.

Decentralized Entity-Level Modeling for Coreference Resolution [PDF], [BibTeX]
Greg Durrett, David Hall, and Dan Klein.
ACL 2013.

It takes as input text with annotations in the CoNLL format, then detects and resolves mentions in that text. The system is bundled with a preprocessor that can take raw text input, split it by sentences and tokens, and produce the necessary CoNLL annotation layers: POS tags, syntactic parses, and named entity chunks.

Note that the Berkeley Entity Resolution System has a (mostly) superset of this system's functionality and gets improved coreference results.

Downloads

The README contains more information. The system is licensed under the GPLv3.

Download the system here (16MB tgz). The source code is mostly Scala, but the download includes a pre-built runnable .jar file that can be run with a standard JRE.

Download the models here (~300MB tgz) (compatible with 1.0 and 1.1). This package includes pre-trained models for both preprocessing (sentence splitting, parsing, and NER) and coreference (SURFACE and FINAL models from the paper), with different coreference models for the CoNLL data and for running on raw text.

Old Versions

Version 1.0: code

Version 0.9: code, models

CoNLL 2012 Results

Results reported in the EMNLP 2013 paper are on the CoNLL 2011 test using version 5 of the official CoNLL scorer. The table below lists results on the CoNLL 2012 development and test sets using v8.01 of the scoring script. These scores reflect version 1.1 of the system, which shuffles the input data, leading to improved performance and removing nondeterminism stemming from the order that files are returned by File.listFiles.

MUC B^3 CEAF-E CoNLL F1
Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1 F1
Dev Surface 72.0767.0869.4960.8654.0457.2455.4852.4253.9160.21
Dev Final 73.4467.6870.4463.1455.5559.1057.0054.2155.5761.71
Test Surface 72.0266.9369.3859.9251.7855.5654.4850.8652.6159.18
Test Final 74.0667.4870.6263.4753.7458.2056.4653.2354.8061.21

The system should reproduce these to within small amounts of noise. If you have any questions or are curious about results not listed here, please email Greg Durrett.