Day 28: Week 4 Review & Capstone — Build miniopt
Phase II — Unconstrained Optimization | Week 4 | 2.5 hours
Consolidate everything into a working optimization library with a test suite.
Navigation
OKS Relevance
This capstone cements your ability to build, test, and select optimization algorithms — a skill you'll use every time you prototype a new calibration, estimation, or control algorithm for the OKS stack. The miniopt library mirrors how real solver libraries are structured (Ceres, OSQP, scipy) with a unified interface and method selection.
Week 4 Review (60 min)
Key Concepts
| Day |
Topic |
Core Idea |
| 22 |
Convergence Theory |
Rates: linear ($r < 1$), superlinear, quadratic. Lipschitz conditions. |
| 23 |
Momentum Methods |
Heavy ball, Nesterov acceleration → $O(1/k^2)$ for convex |
| 24 |
Stochastic Methods |
SGD, mini-batch, Adam — variance reduction, adaptive learning rates |
| 25 |
Regularization |
L2 shrinks, L1 sparsifies, proximal operators, ISTA/FISTA |
| 26 |
Global Optimization |
Random restart, SA, basin-hopping — when local isn't enough |
| 27 |
scipy.optimize |
L-BFGS-B default, supply gradients, check .success |
Method Selection Guide
Problem → Method
│
├─ Smooth, small n (<100)
│ ├─ Gradients available → BFGS
│ └─ No gradients → Nelder-Mead
│
├─ Smooth, large n (>100)
│ ├─ Gradients available → L-BFGS-B
│ └─ Stochastic/online → Adam
│
├─ Noisy objective → SGD with momentum
│
├─ Sparse solution desired → ISTA/FISTA (L1)
│
├─ Many local minima → Random restart or basin-hopping
│
└─ Least-squares structure → Gauss-Newton (Week 5!)
Phase II Checklist
- [ ] Implemented gradient descent, Newton, BFGS, L-BFGS from scratch
- [ ] Understand line search (Wolfe conditions) and trust regions
- [ ] Can analyze convergence rate (linear/superlinear/quadratic)
- [ ] Know momentum (Nesterov) and stochastic methods (Adam)
- [ ] Understand regularization (L1/L2) and proximal operators
- [ ] Know when to use global optimization vs local
- [ ] Can choose the right scipy method for any given problem
- [ ] Built
miniopt with unified interface and test suite
Capstone: Build miniopt (90 min)
Requirements
Build a Python library miniopt with:
- Unified interface:
miniopt.minimize(f, x0, method='bfgs', **kwargs)
- Methods: At least 6:
-
'gd' — Gradient descent with backtracking
- 'momentum' — Heavy ball method
- 'nesterov' — Nesterov accelerated gradient
- 'bfgs' — BFGS quasi-Newton
- 'lbfgs' — L-BFGS (memory-limited)
- 'adam' — Adam optimizer
- Result dataclass:
OptimizeResult with fields: x, fun, nit, nfev, ngrad, success, message, history
- Standard test functions: Rosenbrock, Beale, Styblinski-Tang, Booth, sphere
- pytest test suite: At least 15 tests covering correctness, convergence, edge cases
Implementation
# See: code/week04/miniopt.py
# See: code/week04/test_miniopt.py
Test Expectations
pytest test_miniopt.py -v
Tests should cover:
- Each method converges on a quadratic
- BFGS converges on Rosenbrock
- L-BFGS produces similar results to BFGS
- Adam handles noisy objectives
- OptimizeResult has correct fields
- Method names are validated
- Edge cases: zero-dimensional, already at minimum
Practice (Remaining time)
- Draw the inheritance diagram for
miniopt: what's shared across methods? What differs?
- For each method in
miniopt, state its convergence rate class.
- Which
miniopt methods would you use on the OKS wheel calibration problem? Why?
- If you were adding one more method to
miniopt, what would it be? (Hint: Week 5)
- Compare your
miniopt.minimize(method='bfgs') with scipy.optimize.minimize(method='BFGS') on Rosenbrock 5D.
Expert Challenges
🎯 Challenge 1: Method auto-selection
**Problem:** Add a `method='auto'` option that selects the best method based on problem characteristics:
- If `jac` is provided and `n < 100` → BFGS
- If `jac` is provided and `n >= 100` → L-BFGS
- If no `jac` and `n < 20` → Nelder-Mead (numerical gradient)
- If `stochastic=True` → Adam
Test that `'auto'` selects correctly for various problem sizes.
**Solution:** Inspect function signature and keyword arguments to determine properties. Write tests that verify: (a) auto picks BFGS for 10D with gradients, (b) auto picks L-BFGS for 200D, (c) auto picks Adam for stochastic.
🎯 Challenge 2: Convergence visualization
**Problem:** Add a `miniopt.plot_convergence(results)` function that takes a list of `OptimizeResult` objects (from different methods on the same problem) and plots:
1. $f(x_k)$ vs iteration (log scale)
2. $\|x_k - x^*\|$ vs iteration (log scale)
3. $\|\nabla f(x_k)\|$ vs iteration (log scale)
4. Wall time vs accuracy achieved
**Solution:** Each method's `history` list stores `(f, x, grad_norm)` tuples. Matplotlib subplots with shared x-axis. Color-code by method. This visualization makes it immediately clear which method converges fastest and to what accuracy.
🎯 Challenge 3: Benchmark against scipy
**Problem:** Run `miniopt` and `scipy.optimize.minimize` on the same 5 test functions in dimensions $n = 2, 5, 10, 50$. Compare: iterations to convergence ($10^{-6}$ tolerance), total function evaluations, wall time, and final accuracy. Identify where your implementation beats scipy and where it doesn't.
**Solution:** scipy will generally win on wall time (C implementation) but iteration counts should be similar for BFGS. Your Adam may beat scipy's quasi-Newton on noisy problems. The comparison teaches that algorithmic choice matters more than implementation speed for most robotics problems.
Self-Assessment Checklist
- [ ]
miniopt.minimize() works with all 6 methods
- [ ] All pytest tests pass
- [ ] I can articulate when to use each method
- [ ] I'm ready for Phase III (nonlinear least squares)
Key Takeaways
- Unified interfaces make algorithm comparison easy and method swapping painless.
- Testing optimization code requires known-solution test functions, not just "it runs."
- Phase II complete — you now have a working optimizer library built from scratch.
- Next: Phase III applies these methods to the specific structure of least-squares problems.