Abstract by Ellie Van De Graaff
Ellie Van De Graaff
Generating Meta Features to Best Select Learning Models
Data scientists are constantly working to evaluate datasets with the best model. To do so, a scientist will inspect the data, select a model, and inspect results, always looking to improve. Using this cycle, they learn how to best select models. Our lab is working to create automated data scientists that can select the best models using experiences from their own work and others’ past performances. These scientists can better select models when the data is summarized concisely, which is done using metafeatures. My work focuses on generating meta features. These summarize the dataset by using a combination of simple operations (such as row count and labels), statistical analysis (including mean, correlation, and skewness), and landmarking values (used from SK Learn). The summarized datasets are standardized in order to use in the pipeline. The pipeline is the model the data is run on, and without uniformity in the dataset, multiple sets of data cannot be compared to one another. By comparing performance, we are able to determine how best to select a model.