Optimization Framework

⚙️

1. The Optimization Problem

We want to find the optimal policy parameters \( r \) directly.

Problem Statement

Each \( r \) defines a stationary policy \( \tilde{\mu}(r) \). We determine \( r \) by minimizing the cost:

\[ \min_{r} J_{\tilde{\mu}(r)}(i_0) \]

Or more generally, minimizing the expected cost over a distribution of initial states:

\[ \min_{r} E \{J_{\tilde{\mu}(r)}(i_0)\} \]

Optimization Loop

graph LR Params("Parameters r") --> Policy("Policy μ(r)") Policy --> Cost("Cost J(i0)") Cost --> Update("Update r") Update --> Params style Update fill:#ffe4e6,stroke:#f43f5e

🎲

2. Random Search Methods

These methods apply to the general minimization \( \min_{r \in R} F(r) \).

How they work

Given a current parameter \( r^k \):

Choose points randomly in a neighborhood of \( r^k \).
Select a new point \( r^{k+1} \) from these samples (aiming for lower cost).

Pros

Robust: Less affected by local minima than gradient methods.
Flexible: No gradients required; works for discrete optimization.

Cons

Speed: Can be slow in practice.

Examples

Evolutionary Programming Tabu Search Simulated Annealing Cross Entropy Method

🧬

3. Cross Entropy Method

An iterative method that resembles a "gradient method" but uses sampling.

Algorithm Steps

Ellipsoid: At iterate \( r^k \), construct an ellipsoid \( E_k \) centered at \( r^k \).
Sample: Generate random samples within \( E_k \).
Select: "Accept" a subset of samples that have low cost.
Update Mean: Let \( r^{k+1} \) be the sample mean of the accepted samples.
Update Covariance: Construct a new sample covariance matrix from the accepted samples to define \( E_{k+1} \).

Visualizing Cross Entropy

graph TD Start((r_k)) --> Sample("Sample Points") Sample --> Evaluate("Evaluate Costs") Evaluate --> Filter("Keep Best %") Filter --> Mean("Compute New Mean") Filter --> Cov("Compute New Cov") Mean --> Next((r_k+1)) Cov --> Next style Filter fill:#dbeafe,stroke:#3b82f6

Ideally, the distribution narrows down towards the optimal parameter region.

📝

1. The Optimization Problem

Problem Statement

Optimization Loop

2. Random Search Methods

How they work

Pros

Cons

Examples

3. Cross Entropy Method

Algorithm Steps

Visualizing Cross Entropy

4. Test Your Knowledge

1. What is the objective function in the optimization framework?

2. Which of the following is a key advantage of Random Search methods?

3. Do Random Search methods require gradients?

4. In the Cross Entropy method, how is the new parameter \( r^{k+1} \) chosen?

5. What defines the sampling region in the Cross Entropy method?