Abstract by Roland Laboulaye
Christophe Giraud-Carrier, Kevin Seppi
Searching a Metafeature-Defined State Space
While dataset metafeatures have historically been most successfully used in automatic classifier selection, we target the distinctly more difficult task of automatic pipeline creation, where a pipeline is here defined as a sequence of preprocessing operations followed by a learning algorithm. We approach this problem as a search through a state space and propose a framework wherein the metafeatures of a dataset represent a mutable state and preprocessing operations represent actions that alter our state. We measure the extent to which changes in a metafeature state can capture the effect of distinct preprocessing operations. We then verify that our available preprocessing operations allow us to navigate the metafeature space. Finally, we propose ways in which this framework can be used to define a heuristic to search through the metafeature space to create data science pipelines.