Day 18: Trust Region Methods

Phase II — Unconstrained Optimization | Week 3 | 2.5 hours Restrict the step to where your model is trustworthy.

Previous: Day 17: Newton's Method
Next: Day 19: Quasi-Newton BFGS
Week: Week 3 Overview
Curriculum: CURRICULUM.md

OKS Relevance

Ceres' default strategy is LEVENBERG_MARQUARDT — which is a trust region method applied to least squares. When you see trust_region_strategy_type in a Ceres solver config, that's this. Trust regions handle indefinite Hessians, saddle points, and far-from-solution starts — all common in SLAM. The DOGLEG option in Ceres is exactly what you'll implement today.

Theory (45 min)

18.1 The Trust Region Idea

Instead of choosing a direction then a step size (line search), trust region methods: 1. Build a quadratic model $m_k(\delta)$ 2. Restrict $\|\delta\| \leq \Delta_k$ (trust region radius) 3. Solve the constrained subproblem 4. Update $\Delta_k$ based on model quality

18.2 The Subproblem

$$\min_\delta \; m_k(\delta) = f_k + g_k^T \delta + \frac{1}{2}\delta^T H_k \delta \quad \text{s.t.} \quad \|\delta\| \leq \Delta_k$$

18.3 Cauchy Point

The simplest solution: steepest descent restricted to the trust region.

$$\delta_C = -\tau \frac{\Delta}{\|g\|} g$$

where $\tau = \min\left(1, \frac{\|g\|^3}{\Delta \, g^T H g}\right)$ if $g^THg > 0$, else $\tau = 1$.

Guarantee: Using the Cauchy point gives global convergence (but slow — linear rate like GD).

18.4 Dogleg Method

Interpolate between the Cauchy point and the full Newton step:

Newton step $\delta_N = -H^{-1}g$ (if $H \succ 0$)
Cauchy step $\delta_C$ (steepest descent clipped to $\Delta$)
If $\|\delta_N\| \leq \Delta$: take Newton step
If $\|\delta_C\| \geq \Delta$: take scaled gradient step on boundary
Otherwise: interpolate along the dogleg path (Cauchy → Newton) to hit the boundary

18.5 Radius Update

Compute the actual-to-predicted reduction ratio:

$$\rho_k = \frac{f(x_k) - f(x_k + \delta_k)}{m_k(0) - m_k(\delta_k)}$$

$\rho_k$	Action
$< 0.25$	Shrink: $\Delta_{k+1} = 0.25 \Delta_k$
$\in [0.25, 0.75)$	Keep: $\Delta_{k+1} = \Delta_k$
$\geq 0.75$ and $\\|\delta\\| = \Delta$	Expand: $\Delta_{k+1} = 2\Delta_k$

If $\rho_k < 0$ (model was worse): reject step, shrink radius, try again.

18.6 Trust Region vs Line Search

Aspect	Line Search	Trust Region
Strategy	Pick direction, then step size	Pick step within ball
Handles indefinite $H$?	Needs modification	Naturally
Robustness	Can fail to find step	Always produces a step
Default in Ceres	`LINE_SEARCH`	`TRUST_REGION` (default)

Implementation (60 min)

Exercise 1: Cauchy Point Trust Region

# See: code/week03/trust_region.py

Implement the simplest trust region using only the Cauchy point.

Exercise 2: Dogleg Trust Region

Implement the full dogleg method with Newton/Cauchy interpolation.

Exercise 3: Comparison

Compare trust region vs line search Newton on Rosenbrock from various starting points.

Exercise 4: Saddle Point

Create a function with a saddle point (e.g., $f = x^2 - y^2$). Show trust region navigates away while pure Newton may converge to the saddle.

Exercise 5: Visualization

On a contour plot: show the trust region circle, Cauchy point, Newton point, dogleg path, and how the radius adapts over iterations.

Practice (45 min)

Trust region vs line search: When to prefer which? Create a comparison table.
Connection to Levenberg-Marquardt: Show that $(H + \lambda I)\delta = -g$ with $\lambda$ chosen to satisfy $\|\delta\| = \Delta$ is equivalent to the trust region subproblem.
What happens if the model is perfect ($\rho = 1$ always)? The trust region expands to infinity → becomes Newton.
What's the minimum number of function evaluations per trust region iteration?
For a diagonal Hessian, derive the trust region solution analytically.
Why does the Cauchy point guarantee global convergence? (Sufficient decrease ∝ $\|g\|^2$.)
Derive the dogleg path parametrically: $p(\tau) = (1-\tau)\delta_C + \tau\delta_N$ for $\tau \in [0,1]$... but actually it's piecewise! Explain.
In Ceres, DOGLEG vs LEVENBERG_MARQUARDT: when to switch? (Hint: DOGLEG needs $H \succ 0$.)

Expert Challenges

🎯 Challenge 1: Steihaug-Toint CG for the trust region subproblem

**Problem:** For large-scale problems where forming $H$ is prohibitive, solve the trust region subproblem approximately using CG, stopping when: (a) CG converges, (b) the iterate leaves the trust region, or (c) negative curvature is detected. Implement this "truncated CG" approach. **Solution:** Run CG on $H\delta = -g$. At each CG step $k$: (1) check $\|\delta_k\| \leq \Delta$ — if boundary hit, interpolate to boundary; (2) check $d_k^T H d_k > 0$ — if negative curvature, step to boundary along $d_k$. This gives a point at least as good as Cauchy (so global convergence holds) but approaches Newton quality when CG succeeds. Cost: $O(n)$ per CG step × few CG steps. Ceres uses this for `ITERATIVE_SCHUR`.

🎯 Challenge 2: Exact trust region solution via secular equation

**Problem:** The exact solution of the trust region subproblem is $(H + \lambda^*I)\delta = -g$ with $\lambda^* \geq 0$ satisfying complementarity: $\lambda^*(\|\delta\| - \Delta) = 0$. Find $\lambda^*$ by solving the scalar secular equation $\|\delta(\lambda)\| = \Delta$. Implement using Newton's method on the secular equation. **Solution:** Define $\phi(\lambda) = \frac{1}{\|\delta(\lambda)\|} - \frac{1}{\Delta}$ where $\delta(\lambda) = -(H+\lambda I)^{-1}g$. Use eigendecomposition $H = Q\Lambda Q^T$: then $\delta(\lambda) = -\sum_i \frac{q_i^T g}{\lambda_i + \lambda} q_i$ and $\|\delta\|^2 = \sum_i \frac{(q_i^Tg)^2}{(\lambda_i+\lambda)^2}$. Newton on $\phi(\lambda) = 0$ converges quickly. Handle the hard case: $\lambda^* = -\lambda_{\min}$ and $q_{\min}^T g = 0$.

🎯 Challenge 3: Two-dimensional subspace trust region

**Problem:** Instead of the full trust region subproblem, restrict the search to the 2D subspace spanned by $g$ and $H^{-1}g$. This is cheaper than the full solve and captures both steepest descent and Newton directions. Implement and compare quality vs dogleg. **Solution:** Let $S = [g, H^{-1}g]$ (orthogonalized). Project: $\hat{H} = S^THS$ (2×2), $\hat{g} = S^Tg$ (2×1). Solve the 2D trust region subproblem $(|\hat{\delta}\| \leq \Delta)$ exactly (secular equation in 2D is cheap). Map back: $\delta = S\hat{\delta}$. This captures more curvature than dogleg (which only interpolates linearly) at cost $O(n)$ for forming $S^THS$. This is what Ceres uses in the `SUBSPACE_DOGLEG` variant.

Self-Assessment Checklist

[ ] I can implement trust region with Cauchy point and dogleg methods
[ ] I understand the ratio $\rho$ and how it drives radius adaptation
[ ] I can explain why trust regions handle indefinite Hessians naturally
[ ] I know the connection between trust region and Levenberg-Marquardt

Key Takeaways

Trust region = constrained step within $\|\delta\| \leq \Delta$ — safer than line search.
Cauchy point = GD in a ball — simple, globally convergent, slow.
Dogleg = interpolate Cauchy↔Newton — fast near solution, robust far away.
Radius update via $\rho = \text{actual} / \text{predicted}$ — expand when model is good, shrink when bad.