Ocular Historical Document Recognition System

Overview

Ocular is a state-of-the-art historical OCR system described in the following papers:

Improved Typesetting Models for Historical OCR [PDF]
Taylor Berg-Kirkpatrick and Dan Klein.
ACL 2014.

Unsupervised Transcription of Historical Documents [PDF]
Taylor Berg-Kirkpatrick, Greg Durrett, and Dan Klein.
ACL 2013.

Ocular can recognize collections of documents that use historical fonts. The system is unsupervised: you don't need document images that are labeled with human transcriptions in order to learn a particular historical font. Instead, Ocular learns the font directly, straight from the set of input document images you want transcribed.

Downloads

The system and the source can be downloaded from github. If OpenCL or CUDA are installed on your system, Ocular can make better use of available hardware.