Abstract by Brandon Carter
Clustering via Pairwise Distance Information
Cluster analysis is commonly performed for exploratory data analysis. Hierarchical clustering is a popular heuristic technique for cluster analysis. The algorithm is based on pairwise distance information among the items being clustered. Alternatively, we propose to a new cluster analysis method based for formal probability distributions. Our method uses the same pairwise distance information and leverages the Ewens-Pitman Attraction (EPA) distribution (Dahl, et al., 2017) to form cluster estimates and quantification of uncertainty. Specifically, we propose to use the EPA distribution to simulate samples from the clustering distribution. We examine and compare a variety of methods for partition estimation which use only a pairwise probability matrix. We compare this new clustering methodology and estimation methods to existing procedures, and characterize the similarities and differences between our distribution-based clustering procedure and traditional hierarchical clustering.