Abstract by Kaylee Dudley
Clustering Healthcare Costs by Disease
All insurance companies, regardless of the kind of insurance they offer, do their best to predict the future by comparing current clients to historical clients' information. Any statistically significant correlation, regardless of expectations and hidden factors, can help to actuarially model future behavior.
Using deidentified data from over 6 million health insurance policies over one year, we looked for any significant groupings of medical issues. The medical issues are defined based on the commercial "Episode Treatment Groups" classification, and our claims contain 347 different ETGs.
We performed different kinds of analysis, including Bayesian posterior cluster analysis, k-means cluster analysis, and association rule learning. We compared our findings to existing medical knowledge and its reasonable expectations.