1. The Rosetta Stone
Reinforcement Learning (RL) and Dynamic Programming (DP) often describe the same concepts using different words. Here is the translation guide:
| Reinforcement Learning (RL) | Dynamic Programming (DP) / Control |
|---|---|
| Environment | System |
| Agent | Decision Maker / Controller |
| Action | Decision / Control |
| Reward | (Negative) Cost |
| Value Function | (Negative) Cost Function |
| Action Value (Q-Value) | Q-Factor of State-Control Pair |
2. Terminology Matcher
Click a term on the left (RL), then click its matching term on the right (DP).
3. Advanced Concepts
Planning vs. Learning
Planning: Solving a DP problem with a known model.
Learning: Solving a DP problem without an explicit model (using simulation/data).
Deep RL
Approximate DP using value and/or policy approximation with Deep Neural Networks.
Self-Learning / Self-Play
Solving a DP problem using some form of Policy Iteration (often optimistic).
Prediction
Equivalent to Policy Evaluation (finding the value of a fixed policy).