Abstract by Jared Slone
Combining Reinforcement learning ideas to solve complex problems
The task of developing reinforcement learning algorithms that are capable of generalizing over large discrete action spaces is difficult. Value based policy architectures are incapable of acting on large action spaces because most value functions scale linearly with the size of the action space. While actor critic models avoid having to calculate the value function for every element in the action space, they are incapable of generalizing over discrete action spaces. Wolpertinger is a deep reinforcement learning algorithm capable of generalizing over extremely large discrete action spaces. The wolpertinger architecture was designed by Google Deep Mind in 2016. It borrows ideas from DDPG and other successful, but problem specific, reinforcement learning algorithms and brings them together to achieve new capabilities. Wolpertinger is worth studying for its practicality and impressive abilities. It is also worth studying for the unique way that it combines various ideas from across the field of reinforcement learning.