Abstract by Nathaniel Robinson
Improved Word Representations Via Summed Weight Slices
Neural embedding models are often described as having an `embedding layer', or a set of network activations that can be extracted from the model in order to obtain word or sentence representations. In this paper, we show via a modification of the well-known word2vec algorithm that relevant semantic information is contained throughout the entirety of the network, not just in the commonly-extracted hidden layer. This extra information can be extracted by summing indexed slices from both the input and output weight matrices of a skip-gram model. Word embeddings generated via this method exhibit strong semantic structure, and are able to outperform state-of-the-art models such as GloVe, FastText, and BERT on the challenging task of SAT analogies.