BYU

Abstract by Joshua Greaves

Personal Infomation


Presenter's Name

Joshua Greaves

Degree Level

Undergraduate

Co-Authors

Kolby Nottingham

Abstract Infomation


Department

Computer Science

Faculty Advisor

David Wingate

Title

Sample-Efficient Distributional Policy Gradients

Abstract

Policy gradient algorithms are desirable because they are conceptually simple and often easy to implement. However, a major drawback is that they suffer from sample-inefficiency. One solution is to use off-policy policy gradients, but they are prone to instability outside of a trust region. We present a distributional actor-critic policy gradient algorithm that maintains simplicity while using off-policy data to increase sample-efficiency. We use a distributional critic over multiple policies to capture more information about the structure of the task, which we then backpropagate through to train the actor. We show that this makes the actor updates more sample-efficient, since the actor is trained with off-policy data that incorporates information from multiple ways of interacting with the environment.