BYU

Abstract by Brian Davis

Personal Infomation


Presenter's Name

Brian Davis

Degree Level

Doctorate

Co-Authors

Bryan Morse
Brian Price
Scott Cohen
Chris Tensmeyer

Abstract Infomation


Department

Computer Science

Faculty Advisor

Bryan Morse

Title

Deep Visual Template-Free Historical Form Parsing

Abstract

Automatic, template-free extraction of information from form images is challenging due to the variety of layouts forms can take. This is even more challenging for historical forms due to noise.
A crucial part of the extraction process is associating input text with pre-printed labels.
We present a learned, template-free solution to detect pre-printed text and input text and determine which are related.
While previous approaches to this problem have been focused on clean and clear layouts, we show our approach is effective in the domain of noisy and varied form images. We introduce a new dataset consisting of historical form images of this type to validate our approach on.
Our method uses a fully convolutional network to detect pre-printed text and input text lines.
We pool features from the detection network to classify possible relationships in a language-agnostic way.

We compare our method of pairing to both heuristic rules as well as learned methods using only shape and spatial features (not visual features) and demonstrate superior results.