🎯

Optimization Framework

Random Search Methods and the Cross Entropy Method for Policy Optimization.

⚙️

1. The Optimization Problem

We want to find the optimal policy parameters \( r \) directly.

Problem Statement

Each \( r \) defines a stationary policy \( \tilde{\mu}(r) \). We determine \( r \) by minimizing the cost:

\[ \min_{r} J_{\tilde{\mu}(r)}(i_0) \]

Or more generally, minimizing the expected cost over a distribution of initial states:

\[ \min_{r} E \{J_{\tilde{\mu}(r)}(i_0)\} \]

Optimization Loop

graph LR Params("Parameters r") --> Policy("Policy μ(r)") Policy --> Cost("Cost J(i0)") Cost --> Update("Update r") Update --> Params style Update fill:#ffe4e6,stroke:#f43f5e
🎲

2. Random Search Methods

These methods apply to the general minimization \( \min_{r \in R} F(r) \).

How they work

Given a current parameter \( r^k \):

  1. Choose points randomly in a neighborhood of \( r^k \).
  2. Select a new point \( r^{k+1} \) from these samples (aiming for lower cost).

Pros

  • Robust: Less affected by local minima than gradient methods.
  • Flexible: No gradients required; works for discrete optimization.

Cons

  • Speed: Can be slow in practice.

Examples

Evolutionary Programming Tabu Search Simulated Annealing Cross Entropy Method
🧬

3. Cross Entropy Method

An iterative method that resembles a "gradient method" but uses sampling.

Algorithm Steps

  1. Ellipsoid: At iterate \( r^k \), construct an ellipsoid \( E_k \) centered at \( r^k \).
  2. Sample: Generate random samples within \( E_k \).
  3. Select: "Accept" a subset of samples that have low cost.
  4. Update Mean: Let \( r^{k+1} \) be the sample mean of the accepted samples.
  5. Update Covariance: Construct a new sample covariance matrix from the accepted samples to define \( E_{k+1} \).

Visualizing Cross Entropy

graph TD Start((r_k)) --> Sample("Sample Points") Sample --> Evaluate("Evaluate Costs") Evaluate --> Filter("Keep Best %") Filter --> Mean("Compute New Mean") Filter --> Cov("Compute New Cov") Mean --> Next((r_k+1)) Cov --> Next style Filter fill:#dbeafe,stroke:#3b82f6

Ideally, the distribution narrows down towards the optimal parameter region.

📝

4. Test Your Knowledge

1. What is the objective function in the optimization framework?

2. Which of the following is a key advantage of Random Search methods?

3. Do Random Search methods require gradients?

4. In the Cross Entropy method, how is the new parameter \( r^{k+1} \) chosen?

5. What defines the sampling region in the Cross Entropy method?

Previous

Lecture 29

Next

Lecture 31