Abstract by Lance Haderlie

Personal Infomation

Presenter's Name

Lance Haderlie

Degree Level


Abstract Infomation


Computer Science

Faculty Advisor

Scott Woodfield


Big Data Complexities in Family History Work


                Over the course of the last year I have been attempting to determine the best way to index, search, and analyze an enormous dataset. We have the information of 900 million Family Search database persons, which we need to be able to iterate over and extract statistics from. I tried Mongo, SQLite, MySql, and eventually settled with PostgreSql as my database of choice. As none of the others were efficient enough to function as needed.  Mongo has amazing single PID lookup times, but iterating through would have taken 3 days (not counting any other indexing, comparisons, or other calculations), SQLite was not built to handle such large datasets, and MySql did not have simple interface options. Postgres is fast, efficient, and even with 600 GB of data is still able to retrieve 500,000 random samples for analysis within seconds. We went from transferring 500 records in 45 minutes to 30 million in 3 hours. We will be using this database to create probabilities for merge analysis going forward.