BYU

Abstract by Seth Stewart

Personal Infomation


Presenter's Name

Seth Stewart

Co-Presenters

None

Degree Level

Masters

Co-Authors

Bill Barrett

Abstract Infomation


Department

Computer Science

Faculty Advisor

Bill Barrett

Title

Pixel-Labeling Document Images using Neural Networks for Content Masking and Text Recognition

Abstract

Much of world and family history is trapped on paper forms such as birth records, census records, and marriage certificates. We use artificial neural networks to peel apart a document image automatically into semantically distinct layers isolating machine print, handwriting, etc. Our technique accurately labels new document images even when trained on a single document image having a completely different layout. Our method uniquely allows for recovery of overlapped content, and shows potential for enabling semantic association of discovered fields. For example, we can lexically refine handwriting recognized as part of a "Name", "Date", or "Birthplace" field, improving accuracy.
Using our method, we achieve winning results in a pixel classification challenge on a public document image dataset, but we also show that the same framework we use for content classification can also be used to perform character recognition directly, without any complex and error-prone pipeline.