1. The Problem
Goal: Decide when to sell a stock to maximize expected value.
- Horizon: \(N\) time periods.
- Initial Price: \(x_0\).
- Dynamics: Price \(x_k\) evolves stochastically.
- Constraint: Must sell by period \(N\).
Price Evolution
If \(0 < x_k < \bar{x}\):
\[ x_{k+1} = \begin{cases} x_k + 1 & \text{prob } p^+ \\ x_k & \text{prob } 1 - p^+ - p^- \\ x_k - 1 & \text{prob } p^- \end{cases} \]Boundaries at 0 and \(\bar{x}\) are reflective/absorbing as defined.
2. DP Formulation
Bellman Equation
For \(0 < x_k < \bar{x}\), the optimal reward-to-go \(J^*_k(x_k)\) is:
\[ J^*_k(x_k) = \max \left\{ x_k, \quad p^+ J^*_{k+1}(x_k+1) + (1 - p^+ - p^-) J^*_{k+1}(x_k) + p^- J^*_{k+1}(x_k-1) \right\} \]Interpretation: Maximize between Selling Now (getting \(x_k\)) and Waiting (expected future value).
3. Heuristic & Rollout
Base Heuristic
Sell if price \(x_k \geq \beta x_0\) (where \(\beta > 1\)).
Simple rule, but not optimal.
Rollout Policy
At each step, look one step ahead and use the Base Heuristic to estimate future costs.
Improves upon the base heuristic.
Performance Comparison
| Method | Expected Reward (\(x_0=3\)) | Notes |
|---|---|---|
| Base Heuristic | 2.268 | Simple threshold rule. |
| Rollout (20 samples) | 2.264 | Degraded by noise. |
| Rollout (200 samples) | 2.273 | Beats heuristic! |
| Optimal Policy | 2.400 | Exact DP solution. |
*Values based on example parameters: \(N=14, \bar{x}=7, p^+=p^-=0.25\).