Abstract by Iain Lee
Census Record Image Segmentation
United States census records contain manually-filled forms of valuable demographic data. Images of these census documents are made publicly available, but only as images with some meta-data regarding location and date. Advances in machine learning provide an increasingly cost-effective and time-efficient solutions to data extraction. However, current handwriting recognition techniques only work on lines of text. Therefore, in order to make use of the data found in U.S. census records we must first extract those lines. We propose a method for extracting segments from the census in order to gather training data for handwriting recognition models. Our end goal is to automatically index the census records and this method will provide a crucial step in that process.
Keywords: Image segmentation, Machine Learning, Handwriting recognition, Census data