BYU

Abstract by Joseph Clark

Personal Infomation


Presenter's Name

Joseph Clark

Co-Presenters

Jeremy Rees

Degree Level

Undergraduate

Co-Authors

Jeremy Rees
Brandon Schoenfeld

Abstract Infomation


Department

Computer Science

Faculty Advisor

Kevin Seppi

Title

Automatic Semantic Type Detection Through Natural Language Processing

Abstract

Correctly identifying the semantic types of data is essential in automatic machine learning (AutoML) for building robust machine learning models. Manual profiling is undesirable in the scope of AutoML and can be expensive and inaccurate. The majority of existing profiling tools rely on regular expression matching and lookup tables to profile data, while the most recent state-of-the-art profiling techniques are beginning to use deep learning. We explore natural language processing methods, including word and sentence embeddings, to perform semantic profiling. We utilize a collection of hundreds of datasets, including tens of thousands of columns, with types annotated by MIT Lincoln Labs to evaluate our methodology. Our results show there are advantages and disadvantages for all techniques, suggesting future research.