1. Deterministic Problems
We want to minimize \( J_{\tilde{\mu}(r)}(i_0) \) over \( r \) using the gradient method:
\[ r^{k+1} = r^k - \gamma^k \nabla J_{\tilde{\mu}(r^k)}(i_0) \]The Challenge
The gradient \( \nabla J \) is often not explicitly available.
Solution: Approximate it by finite differences of cost function values.
Deterministic Case
Finite differences work well because the cost function evaluation is exact (no noise).
Stochastic Case
Cost values are noisy (Monte Carlo). Differencing noisy values leads to very poor gradient estimates.
2. Stochastic Problems
To handle stochastic problems, we take an unusual step: convert the minimization of \( F(z) \) into a stochastic optimization problem over probability distributions.
The Transformation
Instead of \( \min_{z \in Z} F(z) \), we solve:
\[ \min_{p \in \mathcal{P}_Z} E_p \{ F(z) \} \]- \( z \): Random variable (e.g., state-control trajectory).
- \( \mathcal{P}_Z \): Set of probability distributions over \( Z \).
- \( p \): A generic distribution (the policy).
Relation to DP
For this to apply to DP, we must enlarge the set of policies to include randomized policies.
Note: In standard DP, optimization over randomized policies yields the same optimal cost as deterministic policies. However, this "smoothing" allows us to compute gradients more easily (as we will see).