Phase II — Unconstrained Optimization | Week 3 | 2.5 hours Benchmark everything you've built. Know when to use each method.
Every Ceres Solver::Options choice maps to this week's content: minimizer_type (LINE_SEARCH vs TRUST_REGION), line_search_direction_type (STEEPEST_DESCENT, LBFGS, BFGS), trust_region_strategy_type (LEVENBERG_MARQUARDT, DOGLEG). Understanding which method wins on which problem is the difference between a 10-second solve and a failed solve.
| Method | Convergence | Per-step | Storage | Hessian? | Best for |
|---|---|---|---|---|---|
| GD | Linear: $O(\kappa)$ | $O(n)$ | $O(n)$ | No | Simple problems, huge $n$ |
| GD + Wolfe | Linear | $O(n)$ | $O(n)$ | No | When gradients cheap |
| Newton | Quadratic | $O(n^3)$ | $O(n^2)$ | Yes | Small dense problems |
| Newton + LS | Quadratic (local) | $O(n^3)$ | $O(n^2)$ | Yes | Medium problems |
| Trust Region (Cauchy) | Linear | $O(n)$ | $O(n)$ | Yes | Robust, handles indefinite $H$ |
| Trust Region (Dogleg) | Superlinear | $O(n^3)$ | $O(n^2)$ | Yes | Ceres default |
| BFGS | Superlinear | $O(n^2)$ | $O(n^2)$ | No | Medium $n$, no Hessian |
| L-BFGS | Superlinear | $O(mn)$ | $O(mn)$ | No | Large $n$, Ceres LINE_SEARCH |
| CG (PR+) | $n$-step (quad) | $O(n)$ | $O(n)$ | No | Very large $n$, sparse |
Is n small (< 1000)?
├── Yes: Can you compute H?
│ ├── Yes: Trust Region Dogleg or Newton + LS
│ └── No: BFGS
└── No:
├── Is H sparse? → Sparse Newton / Trust Region (Ceres default)
├── Can afford O(mn)? → L-BFGS
└── Minimal memory? → CG (PR+)
| Method | Update Rule |
|---|---|
| GD | $x_{k+1} = x_k - \alpha_k \nabla f_k$ |
| Newton | $x_{k+1} = x_k - H_k^{-1}\nabla f_k$ |
| Trust Region | $\min_\delta m(\delta)$ s.t. $\|\delta\| \leq \Delta$, update $\Delta$ via $\rho$ |
| BFGS | $H_{k+1} = (I-\rho sy^T)H_k(I-\rho ys^T) + \rho ss^T$ |
| L-BFGS | Two-loop recursion with $m$ stored $(s,y)$ pairs |
| CG | $d_{k+1} = -g_{k+1} + \beta_k d_k$, $\beta^{PR+} = \max(0, g_{k+1}^T(g_{k+1}-g_k)/\|g_k\|^2)$ |
# See: code/week03/optimizer_shootout.py
Benchmark all 6 methods on 5 test functions across 5 metrics:
Test Functions: 1. Rosenbrock (2D) — narrow curved valley 2. Styblinski-Tang (5D) — many local minima 3. Beale (2D) — flat regions 4. Extended Rosenbrock (20D) — high dimensional 5. Quadratic (50D, $\kappa = 1000$) — controlled conditioning
Metrics: 1. Iterations to convergence 2. Function evaluations 3. Wall-clock time 4. Final error $\|x - x^*\|$ 5. Final gradient norm $\|\nabla f\|$
Match the method: For each scenario, choose the best optimizer: - 3D problem with cheap Hessian → ? - 100,000-variable SLAM → ? - Noisy gradient, no Hessian → ? - Dense 500-variable problem → ?
Failure modes: For each method, describe its worst case: - GD on ill-conditioned quadratic → ? - Newton at a saddle point → ? - BFGS without Wolfe conditions → ? - CG (FR) on non-quadratic → ?
Rate ordering: Sort by convergence speed near the solution: GD, BFGS, Newton, CG.
Memory ordering: Sort by memory usage for $n = 10,000$: Newton, L-BFGS ($m=10$), CG, GD, BFGS.
Ceres mapping: Match each Ceres setting to this week's topic:
- STEEPEST_DESCENT → ?
- LBFGS → ?
- LEVENBERG_MARQUARDT → ?
- DOGLEG → ?
Phase II gate: Solve by hand: $\min f(x,y) = (x-1)^2 + 10(y-x^2)^2$ — apply one step of GD, Newton, and BFGS from $(0,0)$.