Abstract by Samuel Litster
Deep Data Cleaning: TreeSweeper
Geneology records on FamilySearch are infamous for errors and inconsistencies. TreeSweeper is an app we've developed over the last two years that downloads a user's family history information and runs it through a datalog-based logic engine, identifying different kinds of errors. It uses probabilities to determine things that are possibly errors (like someone having children at a young age) or definitely errors (like someone having children before they're born), along with identifying possible opportunities for research (like possible missing children). We mine probability data from family history records.