1. The Optimization Problem
We want to find the optimal policy parameters \( r \) directly.
Problem Statement
Each \( r \) defines a stationary policy \( \tilde{\mu}(r) \). We determine \( r \) by minimizing the cost:
\[ \min_{r} J_{\tilde{\mu}(r)}(i_0) \]Or more generally, minimizing the expected cost over a distribution of initial states:
\[ \min_{r} E \{J_{\tilde{\mu}(r)}(i_0)\} \]Optimization Loop
2. Random Search Methods
These methods apply to the general minimization \( \min_{r \in R} F(r) \).
How they work
Given a current parameter \( r^k \):
- Choose points randomly in a neighborhood of \( r^k \).
- Select a new point \( r^{k+1} \) from these samples (aiming for lower cost).
Pros
- Robust: Less affected by local minima than gradient methods.
- Flexible: No gradients required; works for discrete optimization.
Cons
- Speed: Can be slow in practice.
Examples
3. Cross Entropy Method
An iterative method that resembles a "gradient method" but uses sampling.
Algorithm Steps
- Ellipsoid: At iterate \( r^k \), construct an ellipsoid \( E_k \) centered at \( r^k \).
- Sample: Generate random samples within \( E_k \).
- Select: "Accept" a subset of samples that have low cost.
- Update Mean: Let \( r^{k+1} \) be the sample mean of the accepted samples.
- Update Covariance: Construct a new sample covariance matrix from the accepted samples to define \( E_{k+1} \).
Visualizing Cross Entropy
Ideally, the distribution narrows down towards the optimal parameter region.