Abstract by Brooks Butler
Physics and Astronomy
Kent Gee, Mark Transtrum
Clustering analysis on geospatial features for predicting acoustic soundscapes
Collecting outdoor ambient acoustic data is an expensive process. This makes it difficult to collect the substantial amounts of data needed for training most supervised machine learning models. Using unsupervised machine learning methods, such as k-means clustering analysis, we can make a statistical comparison between our current training data set and our input feature population. In this case our input features are the geospatial features used to describe any given location in the North Carolina region. Initial results show that most geospatial clusters group themselves according to a small number of certain prominent geospatial features. Additionally, our current training data set only partially represents our input feature space. The results of this analysis allow for the optimization of training data acquisition process by informing our choice of new site locations for data acquisition that maximize the statistical diversity of our training data compared with that of our input data.