BYU

Abstract by Ellie Van De Graaff

Personal Infomation


Presenter's Name

Ellie Van De Graaff

Degree Level

Undergraduate

Abstract Infomation


Department

Computer Science

Faculty Advisor

Kevin Seppi

Title

Generating Meta Features to Best Select Learning Models

Abstract

Data scientists are constantly working to evaluate datasets with the best model. To do so, a scientist will inspect the data, select a model, and inspect results, always looking to improve. Using this cycle, they learn how to best select models. Our lab is working to create automated data scientists that can select the best models using experiences from their own work and others’ past performances. These scientists can better select models when the data is summarized concisely, which is done using metafeatures. My work focuses on generating meta features. These summarize the dataset by using a combination of simple operations (such as row count and labels), statistical analysis (including mean, correlation, and skewness), and landmarking values (used from SK Learn). The summarized datasets are standardized in order to use in the pipeline. The pipeline is the model the data is run on, and without uniformity in the dataset, multiple sets of data cannot be compared to one another. By comparing performance, we are able to determine how best to select a model.