Abstract by Joseph Clark

Personal Infomation

Presenter's Name

Joseph Clark


Jeremy Rees

Degree Level



Jeremy Rees
Brandon Schoenfeld

Abstract Infomation


Computer Science

Faculty Advisor

Kevin Seppi


Automatic Semantic Type Detection Through Natural Language Processing


Correctly identifying the semantic types of data is essential in automatic machine learning (AutoML) for building robust machine learning models. Manual profiling is undesirable in the scope of AutoML and can be expensive and inaccurate. The majority of existing profiling tools rely on regular expression matching and lookup tables to profile data, while the most recent state-of-the-art profiling techniques are beginning to use deep learning. We explore natural language processing methods, including word and sentence embeddings, to perform semantic profiling. We utilize a collection of hundreds of datasets, including tens of thousands of columns, with types annotated by MIT Lincoln Labs to evaluate our methodology. Our results show there are advantages and disadvantages for all techniques, suggesting future research.