RL vs DP Terminology | Sequential Decision Making

🗿

1. The Rosetta Stone

Reinforcement Learning (RL) and Dynamic Programming (DP) often describe the same concepts using different words. Here is the translation guide:

Reinforcement Learning (RL)	Dynamic Programming (DP) / Control
Environment	System
Agent	Decision Maker / Controller
Action	Decision / Control
Reward	(Negative) Cost
Value Function	(Negative) Cost Function
Action Value (Q-Value)	Q-Factor of State-Control Pair

🎮

Click a term on the left (RL), then click its matching term on the right (DP).

🧠

Planning: Solving a DP problem with a known model.

Learning: Solving a DP problem without an explicit model (using simulation/data).

Approximate DP using value and/or policy approximation with Deep Neural Networks.

Solving a DP problem using some form of Policy Iteration (often optimistic).

Equivalent to Policy Evaluation (finding the value of a fixed policy).

📝