Abstract by David Teuscher
MLB Offensive Performance with Bayesian Nonparametric Models
The purpose of this research was to test the predictive ability of product partition Bayesian nonparametric models with respect to Major League Baseball offensive performance as measured by weighted runs created plus (wRC+). The data were taken from both the Lahman baseball database, which provided information about each batter for every season played, and from Fangraphs, which provided constants that were used to calculate the wRC+. Once the data set was cleaned, for every player that debuted after 1900, there was an observation for every season with the season number and wRC+ calculated. An orthonormal quartic polynomial was fit for each individual player to produce a performance curve. The coefficients of the polynomial fit and the seasonal number were used in a product partition model with and without the covariates to predict performance in the next year. The sum of squares error was calculated for each formulation to determine which model best predicted offensive performance.