In this module we are gonna define and "taste" what reinforcement learning is about. We'll also learn one simple algorithm that can solve reinforcement learning problems with embarrassing efficiency.
At the heart of RL: Dynamic Programming
This week we'll consider the reinforcement learning formalisms in a more rigorous, mathematical way. You'll learn how to effectively compute the return your agent gets for a particular action - and how to pick best actions based on that return.
Model-free methods
This week we'll find out how to apply last week's ideas to the real world problems: ones where you don't have a perfect model of your environment.
Approximate Value Based Methods
This week we'll learn to scale things even farther up by training agents based on neural networks.
Policy-based methods
We spent 3 previous modules working on the value-based methods: learning state values, action values and whatnot. Now's the time to see an alternative approach that doesn't require you to predict all future rewards to learn something.
Exploration
In this final week you'll learn how to build better exploration strategies with a focus on contextual bandit setup. In honor track, you'll also learn how to apply reinforcement learning to train structured deep learning models.