Computer Science

Faculty Advisor

Scott Woodfield


A Probabilistic Approach to the Merging Problem


In our talk, we will discuss our team’s attempt to provide an answer to this question of 'Given two people marked as potential duplicates on FamilySearch, should they be merged?' We will note methods we have examined (Dempster-Shafer, cluster analysis, Bayesian statistics) and have deemed inadequate or irrelevant for solving this problem. The majority of our talk, however, will be focused on the approach that we found the most promising: a probabilistic approach which, drawing from a sample of pairs of potential duplicates which have been manually labeled as matching or not, categorizes matches by available information and the proportion of matching information. We will present our preliminary findings, and discuss what our next steps will be.