Abstract by Evan Peterson
Hooked on Phonics with Deep Learning: Learning to Phonemicize Text with Attentive Sequence-to-Sequence Models
Phonemicization is the task of converting words of a language into their phonetic representation, i.e. symbols representing the way those words sound. It is a critical piece of automated text-to-speech systems and has applications in linguistics and language learning. The set of phonetic symbols a given set of characters can map to varies depending on their position in the word, neighboring letters, and other factors, making phonemicization a non-trivial task. An alternative to hand-coded rule-based finite state transducers, which are capable of high-accuracy phonemicization but require explicit knowledge about the formal rules of a language’s phonetics, is to train a sequence-to-sequence deep learning model to learn the task using a phonetics corpus as supervised training data. Sequence-to-sequence models can handle variable-length inputs and learn temporal dependencies between sequence segments, making them a good fit. In this work we implement such a model.