Don’t miss the next upload by paperswelove!

Join free & follow paperswelove to be the first to hear it.

Join & follow
Share
Stephen Tu on "Least Squares Policy Iteration"

Stephen Tu on "Least Squares Policy Iteration"

About the show

Meetup: http://bit.ly/2y3Qc1j
Paper: https://users.cs.duke.edu/~parr/jmlr03.pdf
Slides: http://bit.ly/2wkPmsN
Video: https://youtu.be/WpHPMqzufJY

-----------------------------------------------------------------------------------
Sponsored and hosted by Two Sigma (@twosigma)
-----------------------------------------------------------------------------------

Description
------------------
Policy iteration is a classic dynamic programing algorithm for solving a Markov Decision Process (MDP). In policy iteration, the algorithm alternates between two steps: 1) a policy evaluation step, and 2) a policy improvement step. When the number of states and actions of the MDP is finite and small, policy iteration performs well and comes with nice theoretical guarantees. However, when the state and action spaces are large (possibly continuous), policy iteration becomes intractable, and approximate methods for solving MDPs must be used...

Comments