This curriculum was reviewed by a panel of 10 specialists. Key improvements incorporated:
| Priority | Change | Location |
|---|---|---|
| CRITICAL | Lie groups & manifold optimization — dedicated day | Day 41 (was Ceres Advanced) |
| HIGH | Fisher Information, covariance estimation, observability | Day 35 (expanded) |
| HIGH | Schur complement, variable ordering, marginalization | Day 9 (expanded) |
| MEDIUM | MPC recursive feasibility & stability guarantees | Day 53 (expanded) |
| MEDIUM | Temporal calibration | Day 64 (expanded) |
| MEDIUM | Phase-gate self-assessments | Start of each Phase |
| MEDIUM | Proximal operators (ISTA/FISTA) | Day 25 (expanded) |
| LOW | SOCP/SDP exercises | Day 50 (added) |
| LOW | Unit testing numerical code (pytest) | Day 28 (added) |
| LOW | SGD/Adam marked as optional enrichment | Day 24 (reframed) |
| LOW | Distributed optimization awareness | Day 68 (added) |
| Phase | Weeks | Topic | Hours | Outcome |
|---|---|---|---|---|
| I | 1–2 | Mathematical Foundations | 35 | Fluent in gradients, Hessians, convexity, linear algebra for optimization |
| II | 3–4 | Unconstrained Optimization | 35 | Can implement GD, Newton, BFGS, trust region from scratch |
| III | 5–6 | Nonlinear Least Squares + Ceres | 35 | Can formulate and solve NLS problems in Python and Ceres C++ |
| IV | 7–8 | Constrained + Convex Optimization | 35 | Can formulate QP/LP, understand KKT, use CVXPY and OSQP |
| V | 9–10 | Robotics Applications (SLAM, MPC, Calibration) | 35 | Can build pose-graph SLAM, MPC controller, calibration pipeline |
| VI | 11–13 | Lie Groups & Manifold Optimization | 52.5 | SO(3)/SE(3) exp/log, manifold GN, pose-graph SLAM, IMU preintegration, mini VIO |
| VII | 14–16 | Game Theory for Optimization & Robotics | 52.5 | Nash equilibria, zero-sum LP, cooperative games, mechanism design, no-regret learning, congestion games |
python3 -m venv optim-env
source optim-env/bin/activate
pip install numpy scipy matplotlib sympy autograd jax jaxlib
pip install cvxpy osqp quadprog
pip install jupyter notebook
# Ceres Solver
sudo apt install libceres-dev
# g2o (for SLAM)
sudo apt install libg2o-dev
# Eigen (already a Ceres dep)
sudo apt install libeigen3-dev
# Build tools
sudo apt install cmake build-essential
| Book | Use For | Link |
|---|---|---|
| Boyd & Vandenberghe — Convex Optimization | Weeks 7-8, reference throughout | web.stanford.edu/~boyd/cvxbook |
| Nocedal & Wright — Numerical Optimization | Weeks 3-6, algorithm internals | Springer (library access) |
| Grisetti et al. — A Tutorial on Graph-Based SLAM | Week 9 | IEEE paper (free preprint) |
| Ceres Solver Tutorial | Week 5-6 | ceres-solver.org/tutorial.html |
Goal: Fluent in the math language that every optimization paper and solver uses. If you skip this, everything after Week 3 will feel like magic.
🏎️ FAST-TRACK OPTION: If you have a strong linear algebra and calculus background, take this 10-question prerequisites quiz. Score ≥ 8/10 → skip directly to Day 7 (Week 1 Review).
- Compute eigenvalues of $\begin{bmatrix} 3 & 1 \\ 1 & 3 \end{bmatrix}$ and interpret geometrically
- When is a symmetric matrix positive definite? Give 3 equivalent conditions.
- Compute $\nabla f$ and $\nabla^2 f$ for $f(x,y) = x^2 e^y + \sin(xy)$
- State the chain rule for $f(g(x))$ where $f: \mathbb{R}^m \to \mathbb{R}$, $g: \mathbb{R}^n \to \mathbb{R}^m$
- What does the SVD $A = U \Sigma V^T$ tell you about the rank of $A$?
- What is the condition number of a matrix and why does it matter for solving $Ax = b$?
- Compute the Taylor expansion of $e^x$ around $x=0$ to 3rd order
- What is the difference between a local minimum and a global minimum? Give a condition for each.
- What is a convex function? Give 2 equivalent definitions.
- For a quadratic $f(x) = \frac{1}{2}x^T A x - b^T x$, what is the minimizer?
Theory (45 min): - Vector spaces, basis, span, linear independence - Matrix as linear transformation - Column space, null space, rank - Orthogonality and projection
Implementation (60 min):
# Exercise: Implement from scratch (no numpy.linalg)
# 1. Gram-Schmidt orthogonalization
# 2. QR decomposition via Gram-Schmidt
# 3. Projection of vector onto subspace
Practice (45 min):
- 10 hand-computation problems: rank, null space, column space
- Verify each answer with numpy.linalg
Reading: Gilbert Strang — Introduction to Linear Algebra Ch 1-3 (skim if familiar)
Theory (45 min): - Eigenvalue decomposition: $A = Q \Lambda Q^T$ for symmetric matrices - Positive definite / semidefinite / indefinite classification - SVD: $A = U \Sigma V^T$ — geometric interpretation - Condition number: $\kappa(A) = \sigma_{\max} / \sigma_{\min}$
Implementation (60 min):
# Exercise:
# 1. Power iteration for dominant eigenvalue
# 2. Check positive definiteness via eigenvalues AND Cholesky attempt
# 3. Compute SVD, reconstruct matrix, verify low-rank approximation
# 4. Compute condition number, show how it predicts numerical error
Practice (45 min): - Classify 10 matrices as PD/PSD/indefinite - For each: verify via eigenvalues, determinant test, and Cholesky
Key insight: The Hessian of a function at a minimum is positive definite. This is the bridge from linear algebra to optimization.
Theory (45 min): - Partial derivatives, gradient vector $\nabla f$ - Jacobian matrix of vector-valued functions - Chain rule for composed functions: $\frac{\partial}{\partial x} f(g(x)) = J_f \cdot J_g$ - Directional derivative: $D_v f = \nabla f \cdot v$
Implementation (60 min):
# Exercise:
# 1. Compute gradient of f(x,y) = x²y + sin(xy) by hand, verify with sympy
# 2. Compute Jacobian of f(x,y,z) = [x²+y, yz, xz²] by hand, verify with sympy
# 3. Implement numerical gradient via finite differences:
# grad_f ≈ [f(x+h*e_i) - f(x-h*e_i)] / 2h for each dimension
# 4. Compare numerical vs analytic gradient — measure error vs step size h
Practice (45 min): - 8 gradient computations by hand (polynomial, trig, exponential, composed) - Gradient of Rosenbrock function: $f(x,y) = (1-x)^2 + 100(y-x^2)^2$
Theory (45 min): - Hessian matrix: $H_{ij} = \frac{\partial^2 f}{\partial x_i \partial x_j}$ - Second-order Taylor expansion: $f(x+\delta) \approx f(x) + \nabla f^T \delta + \frac{1}{2} \delta^T H \delta$ - Quadratic model interpretation - Connection: minimizing quadratic model → Newton step: $\delta^* = -H^{-1} \nabla f$
Implementation (60 min):
# Exercise:
# 1. Compute Hessian of Rosenbrock function by hand
# 2. Implement numerical Hessian via finite differences of gradients
# 3. Verify: at the minimum (1,1), Hessian should be PD
# 4. Plot: actual function vs quadratic Taylor approximation at several points
# 5. Show: better approximation near expansion point, worse far away
Practice (45 min): - 6 Hessian computations by hand - For each: classify as PD/PSD/indefinite → is the point a min/max/saddle?
Theory (45 min): - Convex set: $\lambda x + (1-\lambda)y \in S$ for $\lambda \in [0,1]$ - Convex function: $f(\lambda x + (1-\lambda)y) \leq \lambda f(x) + (1-\lambda) f(y)$ - First-order condition: $f(y) \geq f(x) + \nabla f(x)^T (y-x)$ - Second-order condition: $H \succeq 0$ everywhere - Key theorem: For convex functions, every local minimum is a global minimum
Implementation (60 min):
# Exercise:
# 1. Classify functions as convex/concave/neither:
# x², log(x), x³, ||x||, max(x₁,x₂), x₁x₂
# 2. For each: verify numerically via second-order condition (eigenvalues of Hessian)
# 3. Visualize: convex function with tangent planes (first-order condition)
# 4. Show: GD on convex function always converges to global min
# GD on non-convex function gets stuck at local min
Practice (45 min): - 12 convexity classification problems - Operations that preserve convexity: sum, max, composition with affine
Theory (45 min): - Necessary condition (first-order): $\nabla f(x^*) = 0$ - Sufficient condition (second-order): $\nabla f(x^*) = 0$ AND $H(x^*) \succ 0$ - Saddle points: $\nabla f = 0$ but $H$ indefinite - Why convexity simplifies everything: first-order sufficient
Implementation (60 min):
# Exercise:
# 1. Find all critical points of f(x,y) = x⁴ + y⁴ - 2x² - 2y²
# Classify each as min/max/saddle using Hessian eigenvalues
# 2. Implement critical point finder:
# - Start from random points, run Newton's method on ∇f = 0
# - Classify each found point
# - Visualize on contour plot
# 3. Monkey saddle: f(x,y) = x³ - 3xy² — show Hessian test is inconclusive at origin
Practice (45 min): - 8 problems: find and classify critical points
Review (45 min): - Flashcard drill: definitions, theorems, key formulas - Fill-in-the-gaps quiz covering all 6 days
Comprehensive Exercise (105 min):
# EXERCISE SET 1: Foundations Comprehensive (10 problems)
#
# 1. Given A (4×3), find rank, null space, column space
# 2. Prove: if A is PSD, then x^T A x ≥ 0 for all x
# 3. Compute gradient and Hessian of f(x) = x^T A x + b^T x + c
# 4. Show: Hessian of least-squares ||Ax-b||² is 2A^TA (always PSD)
# 5. Taylor-expand f(x,y) = e^(x+2y) around (0,0) to second order
# 6. Classify convexity: f(x) = log(1 + e^x) — "softplus"
# 7. Find all critical points of Himmelblau's function
# 8. Condition number: solve Ax=b with well-conditioned and ill-conditioned A
# Show how relative error amplifies with κ(A)
# 9. Implement: gradient checker that compares analytic vs numerical gradient
# and reports max relative error
# 10. Rosenbrock function: plot contours, gradient field, and Hessian eigenvectors
Theory (45 min): - Cholesky: $A = LL^T$ for PD matrices — fastest way to solve $Ax = b$ - LDL^T: for PSD matrices - QR decomposition for least squares - When to use what: Cholesky (PD systems), QR (rectangular), SVD (rank-deficient)
Implementation (60 min):
# Exercise:
# 1. Implement Cholesky decomposition (without numpy.linalg.cholesky)
# 2. Solve Ax=b using your Cholesky: forward sub on Ly=b, back sub on L^T x=y
# 3. Benchmark: Cholesky solve vs numpy.linalg.solve vs numpy.linalg.inv @ b
# 4. Show: Cholesky fails on non-PD matrix (as expected)
Practice (45 min): - Time complexity analysis: which decomposition for which problem shape?
Theory (45 min):
- Sparse vs dense: memory and computation
- Storage formats: COO, CSR, CSC
- Sparse Cholesky: fill-in, ordering strategies (AMD, METIS)
- Why SLAM matrices are sparse (only neighboring poses interact)
- Schur complement elimination: partition variables → eliminate subset → dense reduced system
- In BA: eliminate landmarks → camera-only system (why Ceres has SPARSE_SCHUR)
- In SLAM: marginalize old poses → keep recent window
- Variable ordering: bad ordering → fill-in explosion; AMD/METIS find good orderings
- Marginalization (preview): remove variables while preserving their information
- Sparsity-preserving vs dense marginalization (the "fill-in" problem)
Implementation (60 min):
# Exercise:
# 1. Create a tridiagonal system (like 1D pose graph) — 1000×1000
# 2. Solve dense vs scipy.sparse — compare time and memory
# 3. Visualize sparsity pattern with plt.spy()
# 4. Show: banded structure → O(n) solve vs O(n³) dense
Practice (45 min): - Create sparsity patterns for: tridiagonal, block-diagonal, pose-graph, banded
Theory (45 min): - Forward difference: $\frac{f(x+h) - f(x)}{h}$ — $O(h)$ error - Central difference: $\frac{f(x+h) - f(x-h)}{2h}$ — $O(h^2)$ error - Complex-step derivative: $\frac{\text{Im}(f(x+ih))}{h}$ — no cancellation error! - Truncation vs rounding error tradeoff — optimal $h \approx \epsilon^{1/2}$ (forward), $\epsilon^{1/3}$ (central)
Implementation (60 min):
# Exercise:
# 1. Plot error vs h for forward, central, and complex-step derivatives of sin(x)
# Show the V-shaped error curve (truncation vs rounding)
# 2. Find optimal h for each method experimentally
# 3. Implement complex-step Jacobian for vector functions
# 4. Compare all three methods on a function with very small gradient
# (where rounding error dominates)
Practice (45 min): - When would you use numerical derivatives vs analytic? vs automatic differentiation? - Real-world scenario: checking a hand-derived Jacobian
Theory (45 min): - Forward mode AD: dual numbers $a + b\epsilon$, $\epsilon^2 = 0$ - Reverse mode AD: computation graph, backpropagation - Forward vs reverse: when each is efficient (Jacobian shape) - AD in optimization: Ceres AutoDiff, JAX, PyTorch autograd
Implementation (60 min):
# Exercise:
# 1. Implement dual number class with +, *, sin, cos, exp
# 2. Use it to compute gradient of f(x,y) = sin(x*y) + x²
# 3. Compare: your dual numbers vs JAX grad vs numerical finite diff
# 4. Time comparison on a function with 100 parameters:
# numerical (O(100) evaluations) vs AD (O(1) pass)
import jax
import jax.numpy as jnp
# JAX automatic differentiation:
def f(x):
return jnp.sin(x[0] * x[1]) + x[0]**2
grad_f = jax.grad(f)
hess_f = jax.hessian(f)
Practice (45 min): - Derive forward-mode AD for a 3-node computation graph by hand - Map: which optimization tool uses which AD type?
Theory (45 min): - Condition number of optimization: $\kappa(H)$ = ratio of eigenvalues - Ill-conditioned = elongated contours = slow convergence - Preconditioning: transform variables to improve conditioning - Scaling: why normalizing variables matters in practice
Implementation (60 min):
# Exercise:
# 1. Create two quadratics: well-conditioned (κ=2) and ill-conditioned (κ=1000)
# 2. Run gradient descent on both — count iterations to convergence
# 3. Visualize: overlay GD trajectory on contour plots
# 4. Apply diagonal preconditioning — show dramatic speedup on ill-conditioned
# 5. Real example: least-squares where columns of A have very different scales
# Show: without scaling, solver is 100× slower
Practice (45 min): - Given a Hessian, predict GD convergence rate - Design a preconditioner for a given problem
Theory (45 min): - Why iterative? Dense direct solvers are $O(n^3)$ — too slow for large systems - Conjugate Gradient (CG) method for PD systems - Convergence: CG converges in at most $n$ steps, much faster if $\kappa$ is small - Preconditioned CG: $M^{-1}Ax = M^{-1}b$
Implementation (60 min):
# Exercise:
# 1. Implement CG from scratch (the classic algorithm, ~15 lines)
# 2. Solve a 1000×1000 PD system: CG vs dense Cholesky — compare time
# 3. Plot: residual vs iteration for well-conditioned vs ill-conditioned
# 4. Implement Jacobi preconditioner, show CG convergence improvement
# 5. Real scenario: sparse 10000×10000 system from pose-graph structure
# Dense solve: impossible. CG: seconds.
Practice (45 min): - When to use CG vs Cholesky vs LU? - Connection: CG is actually minimizing a quadratic — it IS optimization!
Review (45 min): - Concept map: connect all Week 1-2 topics - Identify: which foundation supports which optimization method
Comprehensive Exercise (105 min):
# EXERCISE SET 2: Advanced Foundations (8 problems)
#
# 1. Implement a gradient checker:
# Input: f(x), analytic_grad(x), x₀
# Output: max relative error between analytic and numerical gradient
# Use central differences with optimal h
#
# 2. Sparse tridiagonal system (n=5000):
# Solve with: (a) dense LU, (b) sparse Cholesky, (c) CG
# Report time and memory for each
#
# 3. Implement forward-mode AD for f: R^n → R using dual numbers
# Test on: f(x) = ||Ax - b||² where A is 50×20
#
# 4. Condition number study:
# Generate random PD matrices with κ = 1, 10, 100, 1000, 10000
# For each: solve Ax=b, perturb b by 1e-8, measure solution change
# Plot: relative error amplification vs κ
#
# 5. Implement preconditioned CG with incomplete Cholesky preconditioner
#
# 6. Rosenbrock (extended to n=50 dimensions):
# Compute condition number of Hessian at (1,...,1)
# Explain why GD is slow on this function
#
# 7. Complex-step derivative for a function involving matrix operations:
# f(θ) = ||R(θ)p - q||² where R is a 2D rotation matrix
#
# 8. Build a numerical differentiation test suite that you'll reuse
# throughout the course to verify every analytic derivative you write
Goal: Implement every major unconstrained optimizer from scratch, understand convergence guarantees, and know when to use which method.
🛣️ PHASE II GATE — Self-Assessment (15 min before Day 15) Answer these 10 questions. If you score < 7/10, review the relevant Phase I day. 1. Write the gradient of $f(x,y) = x^2 y + \sin(xy)$ 2. For $f(x) = x^T A x$, what is the Hessian? When is $f$ convex? 3. What does $\kappa(A) = 10^6$ mean practically for solving $Ax = b$? 4. Name 3 differences between forward-mode and reverse-mode AD 5. Given a 1000×1000 sparse matrix with 5000 nonzeros, which format is best for matrix-vector multiply? 6. What is the normal equation for least squares? When does it fail? 7. Compute $\nabla f$ for $f = \|Ax - b\|^2$ in matrix notation 8. What is the Schur complement of a 2×2 block matrix? 9. How does
jax.gradhandle a function $f: \mathbb{R}^n \to \mathbb{R}$? 10. What is the condition for a critical point to be a local minimum?
Theory (45 min): - Algorithm: $x_{k+1} = x_k - \alpha_k \nabla f(x_k)$ - Fixed step size: convergence condition $\alpha < 2/L$ where $L$ = Lipschitz constant of $\nabla f$ - Convergence rate: $O(1/k)$ for convex, $O(\rho^k)$ for strongly convex ($\rho = \frac{\kappa-1}{\kappa+1}$) - Why GD is slow: zig-zagging on ill-conditioned problems
Implementation (60 min):
# Exercise:
# 1. Implement GD with fixed step size
# 2. Implement GD with backtracking line search (Armijo condition)
# 3. Test on: (a) quadratic f(x) = x^T A x, (b) Rosenbrock, (c) log-sum-exp
# 4. Plot convergence: f(x_k) - f* vs iteration for each
# 5. Vary condition number: show GD iterations grow linearly with κ
Practice (45 min): - Derive convergence rate for quadratic case by hand - Predict: how many GD iterations for κ=100 to reach ε=1e-6?
Theory (45 min): - Exact line search: $\alpha^* = \arg\min_\alpha f(x - \alpha \nabla f)$ - Armijo (sufficient decrease) condition: $f(x + \alpha d) \leq f(x) + c_1 \alpha \nabla f^T d$ - Wolfe conditions: Armijo + curvature condition - Backtracking: start large, shrink until Armijo holds - Strong Wolfe conditions: guarantees for quasi-Newton methods
Implementation (60 min):
# Exercise:
# 1. Implement backtracking line search
# 2. Implement Wolfe line search (zoom algorithm from Nocedal & Wright)
# 3. Compare: fixed step vs Armijo vs Wolfe on Rosenbrock
# 4. Pathological case: show where Armijo alone is insufficient
# 5. Visualize: 1D slice along search direction with Armijo/Wolfe zones marked
Practice (45 min): - Prove: Armijo + backtracking always terminates (given descent direction) - Parameter sensitivity: effect of c₁ (Armijo) and c₂ (Wolfe)
Theory (45 min): - Newton step: $x_{k+1} = x_k - H^{-1} \nabla f$ - Derivation: minimize second-order Taylor model - Quadratic convergence near solution: $\|x_{k+1} - x^*\| \leq C \|x_k - x^*\|^2$ - Problem: Hessian must be PD (otherwise step may ascend!) - Problem: Hessian is $O(n^2)$ to store, $O(n^3)$ to invert
Implementation (60 min):
# Exercise:
# 1. Implement Newton's method with analytic Hessian
# 2. Test on Rosenbrock: count iterations (should be ~10-20 vs GD's ~10000)
# 3. Show: quadratic convergence by plotting log(error) vs iteration
# 4. Failure case: start far from minimum where Hessian is indefinite
# Show: pure Newton diverges, but damped Newton (with line search) works
# 5. Implement modified Newton: replace H with H + μI when H not PD
Practice (45 min): - Derive: Newton's method on f(x) = ax² + bx + c converges in 1 step - When does Newton fail? List 3 scenarios
Theory (45 min): - Trust region idea: restrict step to $\|\delta\| \leq \Delta$ (radius) - Subproblem: $\min_\delta m(\delta) = f + \nabla f^T \delta + \frac{1}{2}\delta^T H \delta$ s.t. $\|\delta\| \leq \Delta$ - Cauchy point: steepest descent restricted to trust region - Dogleg method: interpolate between Cauchy and Newton steps - Radius update: expand if good agreement, shrink if bad - Ratio: $\rho = \frac{f(x_k) - f(x_k + \delta)}{m(0) - m(\delta)}$
Implementation (60 min):
# Exercise:
# 1. Implement trust region with Cauchy point (simplest version)
# 2. Implement dogleg trust region method
# 3. Compare: trust region vs line search Newton on Rosenbrock
# 4. Pathological case: near saddle point — show trust region handles it better
# 5. Visualize: trust region circle on contour plot, show how radius adapts
Practice (45 min): - Trust region vs line search: when to prefer which? (table of tradeoffs) - Connection to Levenberg-Marquardt (preview for Week 5)
Theory (45 min): - Problem: computing Hessian is expensive - Idea: approximate Hessian from gradient differences - Secant condition: $B_{k+1} s_k = y_k$ where $s_k = x_{k+1} - x_k$, $y_k = \nabla f_{k+1} - \nabla f_k$ - BFGS update formula (rank-2 update to $B$) - Superlinear convergence: faster than GD, close to Newton, no Hessian needed - L-BFGS: store only last $m$ vectors instead of full matrix
Implementation (60 min):
# Exercise:
# 1. Implement BFGS with Wolfe line search
# 2. Implement L-BFGS (m=10) using two-loop recursion
# 3. Compare on Rosenbrock: GD vs Newton vs BFGS vs L-BFGS
# Plot: function value vs (a) iterations, (b) gradient evaluations, (c) wall time
# 4. High-dimensional test (n=1000): show L-BFGS scales, Newton doesn't
# 5. Show: BFGS with Armijo (not Wolfe) can fail — demonstrate
Practice (45 min): - Memory comparison: Newton O(n²), BFGS O(n²), L-BFGS O(mn) - Derive: BFGS update maintains symmetry and positive definiteness
Theory (45 min): - Nonlinear CG: Fletcher-Reeves, Polak-Ribière - Relation to linear CG (from Day 13) - CG as "momentum" for gradient descent - Advantages: O(n) per step, no matrix storage - Restart strategies
Implementation (60 min):
# Exercise:
# 1. Implement nonlinear CG (Polak-Ribière variant)
# 2. Compare: GD vs CG vs L-BFGS on extended Rosenbrock (n=100)
# 3. Show: CG converges in n steps for n-dimensional quadratic
# 4. Implement automatic restart (when conjugacy is lost)
# 5. Real benchmark: compare all methods on a 500-dimensional problem
Practice (45 min): - Method selection flowchart: given problem size and gradient cost, which method?
Review (45 min): - Fill in comparison table:
| Method | Per-iter cost | Memory | Convergence | Needs Hessian? |
|---|---|---|---|---|
| GD | O(n) | O(n) | Linear | No |
| Newton | O(n³) | O(n²) | Quadratic | Yes |
| BFGS | O(n²) | O(n²) | Superlinear | No |
| L-BFGS | O(mn) | O(mn) | Superlinear | No |
| CG | O(n) | O(n) | Variable | No |
| Trust Region | O(n³) | O(n²) | Quadratic | Yes |
Comprehensive Exercise (105 min):
# OPTIMIZER SHOOTOUT
# Implement a benchmarking framework:
#
# Functions: Rosenbrock, Styblinski-Tang, Rastrigin, Ackley, Beale
# Methods: GD, Newton, BFGS, L-BFGS, CG, Trust-Region-Newton
# Metrics: iterations, gradient evals, function evals, wall time, final error
#
# Generate a report table and identify:
# 1. Which method wins on convex problems?
# 2. Which method wins on ill-conditioned problems?
# 3. Which method wins on high-dimensional problems (n=1000)?
# 4. Which method is most robust to bad starting points?
Theory (60 min): - Formal convergence definitions: linear, superlinear, quadratic - $\|x_{k+1} - x^*\| \leq C \|x_k - x^*\|^p$ - Convergence rate of GD: depends on $\kappa$ - Convergence rate of Newton: quadratic (local) - Global convergence: guarantees that iterates approach a critical point (not necessarily the minimum) - Zoutendijk's theorem: sufficient decrease + GD-related → $\sum \|\nabla f_k\|^2 < \infty$
Implementation (60 min):
# Exercise:
# 1. Empirically measure convergence rate:
# Plot log(||x_k - x*||) vs k for each method
# Fit the convergence order p
# 2. Show: Newton has p=2 (quadratic), BFGS has p≈1.6 (superlinear)
# 3. Verify Zoutendijk: plot cumulative sum of ||∇f||² for GD with Armijo
Practice (30 min): - Prove: GD with exact line search on quadratic converges in n steps for n eigenvalues
Theory (45 min): - Heavy ball method: $x_{k+1} = x_k - \alpha \nabla f_k + \beta (x_k - x_{k-1})$ - Nesterov acceleration: $O(1/k^2)$ for convex (vs GD's $O(1/k)$) - Optimal first-order methods: matching lower bounds - Connection to control theory: momentum as PD controller
Implementation (60 min):
# Exercise:
# 1. Implement heavy ball method
# 2. Implement Nesterov accelerated gradient
# 3. Compare: GD vs heavy ball vs Nesterov on ill-conditioned quadratic
# 4. Show: oscillation problem with heavy ball (too much momentum)
# 5. Nesterov on Rosenbrock: compare convergence curves
Practice (45 min): - Derive optimal momentum parameter for quadratic case - Robotics connection: estimation smoothing as optimization with momentum
Note: SGD/Adam are primarily ML tools. Robotics optimization rarely uses stochastic methods. Treat this as enrichment — skim if time-constrained, deep-dive if working on learning-based robotics.
Theory (45 min): - SGD: replace ∇f with noisy estimate from subset of data - Convergence: $O(1/\sqrt{k})$ — slower but each step is cheap - Mini-batch: variance reduction - Adam, AdaGrad, RMSProp (brief survey — ML focus, but robotics uses them for learning-based methods) - When robotics does use stochastic methods: learned cost functions, neural network training for perception
Implementation (60 min):
# Exercise:
# 1. Implement SGD for sum-of-functions: f(x) = Σ fᵢ(x)
# 2. Implement mini-batch GD with batch size parameter
# 3. Compare: GD vs SGD vs mini-batch on linear regression (n=10000 samples)
# 4. Learning rate schedules: constant, 1/k, cosine annealing
# 5. Implement Adam optimizer from scratch
Practice (45 min): - When is SGD better than full-batch GD? (compute budget argument) - Robotics application: online parameter estimation
Theory (45 min): - Regularization as multi-objective: $\min f(x) + \lambda \|x\|^2$ (Tikhonov/Ridge) - L1 regularization: sparsity, LASSO - Pareto optimality (brief) - Regularization path: how solution changes with λ
Implementation (60 min):
# Exercise:
# 1. Ridge regression: solve closed-form and iteratively, compare
# 2. L1 proximal gradient (ISTA): implement for LASSO
# 3. Regularization path: plot solution components vs λ
# 4. Cross-validation to select λ
# 5. Robotics example: regularized sensor calibration (prevent overfitting to noise)
Practice (45 min): - Connection: Tikhonov = Bayesian prior (Gaussian), L1 = Laplace prior - When to regularize in robotics: calibration, map fitting, trajectory smoothing - Proximal operators (foundation for modern solvers): - Proximal operator: $\text{prox}_{\lambda f}(v) = \arg\min_x [f(x) + \frac{1}{2\lambda}\|x - v\|^2]$ - ISTA (Iterative Shrinkage-Thresholding): proximal gradient for L1 - FISTA: accelerated proximal gradient (Nesterov acceleration) - Why this matters: OSQP and SCS use ADMM (splitting + proximal ops) internally
Theory (45 min): - Problem: local methods find local optima, not global - Random restart: simple but effective - Simulated annealing: probabilistic acceptance of worse solutions - Genetic/evolutionary algorithms (brief) - Basin-hopping - When global optimization is needed vs when good initialization suffices
Implementation (60 min):
# Exercise:
# 1. Implement random restart + L-BFGS
# 2. Implement simulated annealing
# 3. Test on Rastrigin (many local minima): local method vs global
# 4. scipy.optimize: compare minimize() vs differential_evolution() vs basinhopping()
# 5. Robotics scenario: finding initial alignment for scan matching
Practice (45 min): - In robotics, is global optimization common? (Usually no — good initialization from sensors) - When does it appear? (Loop closure detection, place recognition)
Theory (30 min):
- scipy.optimize.minimize: interface, method selection
- Methods: Nelder-Mead, Powell, CG, BFGS, L-BFGS-B, TNC, Newton-CG, trust-constr
- scipy.optimize.least_squares: for NLS problems (preview)
Implementation (90 min):
# Exercise:
# 1. Solve same problem with every scipy method — compare results and timing
# 2. Callback function: log iteration history, plot convergence
# 3. Bounds: L-BFGS-B for box-constrained problems
# 4. Jacobian/Hessian supply: show speedup of providing analytic derivatives
# 5. Wrap your custom optimizer in scipy.optimize.minimize interface
# 6. Robotics problem: calibrate wheel radius using scipy.optimize.minimize
# Given: measured distances vs wheel encoder counts
# Solve: min Σ(d_measured - π * r * ticks)²
Practice (30 min): - Decision tree: which scipy method for which problem class?
Comprehensive Exercise (150 min):
# PHASE II CAPSTONE: Build Your Own Optimization Library
#
# Create a Python package `miniopt` with:
#
# 1. Unified interface: miniopt.minimize(f, x0, method='bfgs', **kwargs)
# 2. Methods: gd, newton, bfgs, lbfgs, cg, trust_region
# 3. Line search: backtracking (Armijo), Wolfe
# 4. Options: max_iter, tol, callback, verbose
# 5. Return: OptimizeResult(x, fun, grad, nit, nfev, success)
#
# Test suite: all methods pass on 5 standard test functions
# 6. **pytest tests**: write proper unit tests (test_miniopt.py)
# - Test convergence on Rosenbrock, quadratic, Styblinski-Tang
# - Test that OptimizeResult fields are populated correctly
# - Test that invalid inputs raise clear errors
# - Tolerance-based assertions: assert abs(result.fun - known_min) < 1e-6
# - Run: pytest test_miniopt.py -v
#
# This library will be your reference implementation for the rest of the course.
Goal: Master the optimization formulation most used in robotics. Sensor calibration, SLAM, bundle adjustment — all are nonlinear least squares.
🛣️ PHASE III GATE — Self-Assessment (15 min before Day 29) 1. Implement gradient descent with backtracking line search (pseudocode) 2. What is the convergence rate of Newton's method? When does it fail? 3. Why does BFGS approximate the Hessian instead of computing it? 4. What is the Armijo condition? Write it mathematically. 5. Trust region vs line search: give one advantage of each 6. What does L-BFGS store instead of the full Hessian? 7. Name the 3 components of a conjugate gradient step 8. When is Nesterov acceleration better than heavy ball? 9. What is regularization and why does it help ill-conditioned problems? 10. Your
minioptlibrary: which method would you recommend for a 10000-variable problem?
Theory (45 min): - Problem: $\min_x \|Ax - b\|^2$ - Normal equations: $A^T A x = A^T b$ - Geometric interpretation: projection onto column space - Solution via QR, SVD, Cholesky - Weighted least squares: $\min_x (Ax-b)^T W (Ax-b)$ - Rank-deficient case: minimum-norm solution via SVD
Implementation (60 min):
# Exercise:
# 1. Solve linear LS four ways: normal equations, QR, SVD, numpy.lstsq
# 2. Weighted LS: fit a line to noisy data with known per-point uncertainties
# 3. Rank-deficient: A is 100×50 with rank 30 — compare solutions
# 4. Show: normal equations are numerically unstable when κ(A) is large
# QR and SVD are stable
# 5. Robotics: linear odometry calibration — estimate wheel radii from straight-line data
Theory (45 min): - Problem: $\min_x \sum_i \|r_i(x)\|^2$ where $r_i$ are nonlinear residuals - Jacobian: $J_{ij} = \frac{\partial r_i}{\partial x_j}$ - Gradient: $\nabla f = J^T r$ - Approximate Hessian: $H \approx J^T J$ (Gauss-Newton approximation) - Why this works: $J^T J$ ignores second-order residual terms (small near solution) - When it fails: large residuals → need full Hessian
Implementation (60 min):
# Exercise:
# 1. Formulate circle fitting as NLS:
# Given points (xᵢ, yᵢ), find center (cx, cy) and radius r
# Residuals: rᵢ = sqrt((xᵢ-cx)² + (yᵢ-cy)²) - r
# 2. Compute Jacobian analytically
# 3. Verify with numerical Jacobian
# 4. Visualize: residuals at initial guess vs at solution
Practice (45 min): - List 5 robotics problems that are NLS: calibration, scan matching, ICP, BA, pose estimation - For each: what are the residuals? what are the parameters?
Theory (45 min): - Gauss-Newton step: $\delta = -(J^T J)^{-1} J^T r$ - Equivalently: solve $J^T J \delta = -J^T r$ (normal equations of linearized problem) - Convergence: superlinear when residuals are small at solution - Failure: diverges when residuals are large or $J^T J$ is singular
Implementation (60 min):
# Exercise:
# 1. Implement Gauss-Newton from scratch
# 2. Test on circle fitting — should converge in 5-10 iterations
# 3. Test on exponential curve fitting: y = a * exp(b * x) + c
# 4. Failure case: fit a model that's wrong (large residuals at solution)
# Show: GN oscillates or diverges
# 5. Regularized GN: $(J^T J + \mu I) \delta = -J^T r$ — show it stabilizes
Practice (45 min): - Connection to Newton: GN is Newton with approximate Hessian - When is the approximation good? (small residuals, overdetermined system)
Theory (45 min): - LM = Gauss-Newton + trust region: $(J^T J + \lambda I) \delta = -J^T r$ - When $\lambda \to 0$: Gauss-Newton step (aggressive) - When $\lambda \to \infty$: gradient descent step (conservative) - Damping strategy: adjust $\lambda$ based on actual vs predicted reduction - Connection to trust region: $\lambda$ is the Lagrange multiplier for $\|\delta\| \leq \Delta$ - This is what Ceres uses by default
Implementation (60 min):
# Exercise:
# 1. Implement LM from scratch with adaptive damping
# 2. Compare: GN vs LM on the failure case from Day 31 — LM should work
# 3. Plot: damping parameter λ vs iteration (should decrease near solution)
# 4. Implement gain ratio: ρ = (f(xk) - f(xk+δ)) / (predicted decrease)
# 5. Sensitivity: what happens with bad initial λ?
Practice (45 min): - Why is LM the default in most robotics software? - Derive: at the solution, LM and GN produce the same step
Theory (45 min): - Problem: outliers dominate $\sum r_i^2$ (quadratic penalty) - Loss functions: $\rho(r)$ instead of $r^2$ - Huber: quadratic for small $r$, linear for large $r$ - Cauchy: $\log(1 + r^2/c^2)$ - Tukey bisquare: completely rejects outliers beyond threshold - Iteratively Reweighted Least Squares (IRLS): solve weighted LS repeatedly - M-estimators: general framework
Implementation (60 min):
# Exercise:
# 1. Implement Huber, Cauchy, Tukey loss functions and their weights
# 2. Circle fitting with outliers:
# Standard LS: pulled toward outliers
# Huber LS: partially resists
# Cauchy LS: strongly resists
# 3. Implement IRLS loop
# 4. Line fitting with 30% outliers: RANSAC vs Huber LS comparison
# 5. Robotics: ICP scan matching with outlier rejection via Huber loss
Practice (45 min): - How to choose the loss function threshold? - Connection: RANSAC, M-estimators, and their tradeoffs
Theory (30 min):
- API: least_squares(fun, x0, jac, bounds, loss, method)
- Methods: 'trf' (trust region reflective), 'dogbox', 'lm' (Levenberg-Marquardt)
- Bounds: box constraints on parameters
- Sparse Jacobians: jac_sparsity for large problems
Implementation (90 min):
# Exercise:
# 1. Solve circle fitting with scipy.optimize.least_squares
# 2. Compare methods: trf vs lm — when does each win?
# 3. Bounds: constrain radius to be positive, center to be in a region
# 4. Sparse Jacobian: 2D grid deformation with 10000 points
# 5. Custom loss: provide Huber loss function
# 6. Full robotics example: camera calibration
# - Generate synthetic checkerboard images
# - Residuals: reprojection error
# - Parameters: focal length, principal point, distortion
# - Solve with least_squares, compare with/without distortion model
Practice (30 min): - Jacobian sparsity: draw the sparsity pattern for BA, pose graph, odometry calibration
Theory — Uncertainty in NLS (45 min): - Fisher Information Matrix: $\mathcal{I} = J^T \Omega J$ — how much information the data carries about parameters - Cramér-Rao Lower Bound: best achievable covariance $\Sigma \succeq \mathcal{I}^{-1}$ - Covariance from NLS: at solution, $\Sigma \approx (J^T \Omega J)^{-1}$ - Observability analysis: rank of $J^T J$ — full rank = all parameters observable - Degenerate configurations: pure-translation motion can’t calibrate rotation - Rank-deficient $J^T J$ = some parameters are unobservable - Marginalization vs conditioning: uncertainty on a subset of parameters - Confidence ellipses: visualize 2D parameter uncertainty as ellipses from covariance
# UNCERTAINTY EXERCISES (before capstone):
# 1. After solving circle fitting NLS, compute covariance from (J^T J)^{-1}
# Plot 95% confidence ellipse for (cx, cy)
# 2. Observability test: fit a line y = mx + b from 2 points vs 100 points
# Compare Fisher information — more points = tighter bounds
# 3. Degenerate case: all points on one side of circle → show rank deficiency
# Visualize: huge uncertainty ellipse in the poorly-observed direction
# 4. Data collection strategy: which additional measurement reduces uncertainty most?
# Hint: maximize det(J^T J) — D-optimal design
# CAPSTONE: Robot Wheel Calibration Pipeline
#
# Scenario: Differential drive robot, both wheels slightly different radius.
# Data: commanded velocities + measured pose changes (with noise + outliers)
#
# 1. Formulate as NLS: residuals = measured_pose - predicted_pose(r_left, r_right, baseline)
# 2. Implement Jacobian analytically
# 3. Solve with your LM implementation
# 4. Solve with scipy.optimize.least_squares
# 5. Add outlier rejection with Huber loss
# 6. Uncertainty estimation: covariance from J^T J at solution
# 7. Plot: parameter estimates converge to true values
# 8. Monte Carlo: run 100 trials with different noise → histogram of estimates
Theory (45 min): - Ceres Solver architecture: Problem → CostFunction → LossFunction → Solver - AutoDiff vs AnalyticDiff vs NumericDiff - Jet types for automatic differentiation - Problem construction: AddResidualBlock, AddParameterBlock
Implementation (60 min):
// Exercise 1: Hello Ceres — minimize (10 - x)²
#include <ceres/ceres.h>
struct CostFunctor {
template <typename T>
bool operator()(const T* const x, T* residual) const {
residual[0] = 10.0 - x[0];
return true;
}
};
// Exercise 2: Circle fitting in Ceres (translate Python version to C++)
// Exercise 3: Curve fitting: y = exp(mx + c) with AutoDiff
Practice (45 min): - Build and run, read solver summary output, interpret convergence info - Experiment with solver options: max_iterations, function_tolerance, gradient_tolerance
Theory (45 min):
- Jet
Implementation (60 min):
// Exercise:
// 1. Implement cost function with AutoDiff for 3D point triangulation
// Given: 2 camera poses + 2 pixel observations
// Parameter: 3D point [X, Y, Z]
// Residual: reprojection error in each camera
//
// 2. Same problem with AnalyticDiff — derive and implement Jacobian
// Compare: runtime and accuracy vs AutoDiff
//
// 3. NumericDiff wrapper for a black-box function (e.g., calling external library)
Practice (45 min): - When to use which: AutoDiff (default) > Analytic (performance-critical) > Numeric (last resort) - Read Ceres docs: local parameterizations for rotations (Quaternion, AngleAxis)
Theory (30 min): - Built-in losses: HuberLoss, CauchyLoss, ArctanLoss, TukeyLoss - Scaling parameter interpretation - Custom loss functions
Implementation (90 min):
// Exercise:
// 1. Circle fitting with outliers — compare NULL loss vs HuberLoss vs CauchyLoss
// 2. Experiment with loss function scale parameter
// 3. Implement custom loss function (Geman-McClure)
// 4. 2D scan matching with Ceres:
// - Source points + target points with correspondence
// - Parameter: 2D transform (tx, ty, θ)
// - Residual: transformed source - target for each pair
// - Add HuberLoss for outlier robustness
// 5. Compare: with/without loss on data with 20% outlier correspondences
Implementation (150 min):
// Exercise 1: IMU-Odometry Calibration
// Given: wheel odometry poses + IMU measurements
// Estimate: wheel radii, baseline, IMU-to-base transform
// Cost: pose difference between wheel prediction and IMU prediction
//
// Exercise 2: 2D Lidar Extrinsic Calibration
// Given: robot poses from odometry + lidar scans
// Estimate: lidar-to-base transform (x, y, θ)
// Cost: scan matching residual at each pose
//
// Exercise 3: Pose Graph Optimization (preview of Week 9)
// Given: odometry edges + loop closure edges
// Estimate: all robot poses
// Cost: relative pose error for each edge
Theory (45 min): - Solver types: DENSE_QR, DENSE_NORMAL_CHOLESKY, SPARSE_NORMAL_CHOLESKY, CGNR, DENSE_SCHUR, SPARSE_SCHUR, ITERATIVE_SCHUR - When to use which: problem structure matters! - Trust region: LEVENBERG_MARQUARDT vs DOGLEG - Convergence criteria: function_tolerance, gradient_tolerance, parameter_tolerance - Debugging: solver summary, iteration callbacks, ordering - Constant parameters: SetParameterBlockConstant (gauge freedom) - Bounds: SetParameterLowerBound/Upper - Covariance estimation: Ceres::Covariance class
Implementation (60 min):
// Exercise:
// 1. Same problem, all solver types — benchmark and compare
// 2. Large problem (5000 residuals): show SPARSE_NORMAL_CHOLESKY >> DENSE
// 3. Bundle adjustment structure: show SPARSE_SCHUR wins
// 4. Iteration callback: log and plot convergence in real time
// 5. Debugging: intentionally create a bad problem (rank-deficient Jacobian)
// Read the warning messages, understand what they mean
// 6. Covariance: after solving circle fit, compute Ceres::Covariance
// Visualize as confidence ellipses (connects to Day 35 Fisher Information)
// 7. Fix parameters: hold one camera fixed in BA (gauge freedom)
Practice (45 min): - Ceres solver selection flowchart - Common pitfalls: wrong initial values, unscaled parameters, singular Jacobian
⚠️ CRITICAL TOPIC — Without this, SLAM (Week 9) and calibration (Week 10) will feel like magic. This is the mathematical foundation for optimizing over rotations and poses.
Theory (60 min):
- The problem: rotations live on a manifold, not in Euclidean space
- You can’t just “add a delta” to a rotation matrix — it leaves SO(3)
- Need: perturbation in tangent space, then map back to manifold
- SO(2): rotation group in 2D, angle wrapping, exp/log maps
- SO(3): 3D rotations — Lie group structure
- Representations: rotation matrix, quaternion, angle-axis, Euler angles
- Exponential map: $\text{so}(3) \to SO(3)$ — Rodrigues’ formula
- Logarithmic map: $SO(3) \to \text{so}(3)$ — extract angle-axis
- Tangent space $\text{so}(3)$: skew-symmetric matrices / 3-vectors
- SE(2), SE(3): rigid body transformations (rotation + translation)
- Composition: $T_{AC} = T_{AB} \cdot T_{BC}$
- Inverse: $T^{-1}$
- Exp/Log maps for SE(3)
- Optimization on manifolds:
- Perturbation model: $x \oplus \delta = \text{Exp}(\delta) \cdot x$ (left perturbation)
- Jacobians on manifolds: $\frac{\partial r}{\partial \delta}$ where $\delta \in \mathbb{R}^{\text{dim}}$
- Retraction: how Ceres, g2o, GTSAM handle this internally
- Over-parameterization: quaternions have 4 params for 3 DoF
- Connection to Ceres:
- EigenQuaternionManifold, QuaternionManifold, SphereManifold
- Why SetManifold is needed: tells Ceres the tangent space dimension
Implementation (60 min):
# Exercise (Python first, then connect to Ceres):
#
# 1. Implement SO(2): exp(theta) -> R, log(R) -> theta
# Verify: exp(log(R)) = R, log(exp(theta)) = theta
#
# 2. Implement SO(3): Rodrigues formula
# exp(omega) -> R (3x3), log(R) -> omega (3-vec)
# Test: random rotations, verify exp(log(R)) = R
#
# 3. Implement SE(2) class with compose, inverse, exp, log
# Compute the Jacobian of composition: d(T1 * T2) / d(delta_T1)
#
# 4. Optimization on SO(3):
# Given: noisy rotation measurements R1...Rn
# Find: R_mean that minimizes sum ||Log(Ri^{-1} * R)||^2
# Solve with GN on manifold: perturbation in tangent space
#
# 5. Connect to Ceres (C++):
# Redo exercise 4 in Ceres with QuaternionManifold
# Compare: your manifold GN vs Ceres
// Exercise 6 (C++): Bundle adjustment mini-problem with manifolds
// - 5 cameras, 50 3D points, ~200 observations
// - Camera: angle-axis rotation (3) + translation (3) + focal length (1)
// - Point: 3 DoF
// - Use EigenQuaternionManifold or AngleAxisManifold
// - Use Schur complement solver (cameras are few, points are many)
// - Fix camera 0 (SetParameterBlockConstant) for gauge freedom
Practice (30 min): - Lie group cheat sheet: fill in exp/log/compose/inverse for SO(2), SO(3), SE(2), SE(3) - Why quaternions need local parameterization: 4 params, 3 DoF, unit constraint - Preview: how GTSAM uses manifold traits for automatic Jacobian computation
// CAPSTONE: Multi-Sensor Calibration Pipeline
//
// Scenario: Robot with wheel encoders + IMU + 2D lidar
// Data: robot drove a path, collected all sensor data
//
// Estimate simultaneously:
// - Left/right wheel radii, baseline
// - IMU-to-base transform (3 DoF: x, y, θ)
// - Lidar-to-base transform (3 DoF: x, y, θ)
// - IMU biases (gyro bias, accel bias)
//
// Residual blocks:
// - Odometry residuals (wheel model vs measured pose change)
// - IMU preintegration residuals
// - Lidar scan matching residuals
//
// Requirements:
// - AutoDiff for all cost functions
// - Huber loss on lidar residuals (outlier scans)
// - SPARSE_NORMAL_CHOLESKY solver
// - Covariance estimation on final parameters
// - Convergence plot
Goal: Handle the constraints that real robotics problems always have: velocity limits, obstacle avoidance, joint limits, force bounds.
🛣️ PHASE IV GATE — Self-Assessment (15 min before Day 43) 1. Write the Gauss-Newton update step. How does it differ from Newton? 2. When does LM behave like gradient descent? When like Gauss-Newton? 3. What is a loss function in robust estimation? Name two. 4. Explain IRLS in one sentence. 5. In
scipy.optimize.least_squares, what doesjac_sparsitydo? 6. Write a CeresCostFunctorfor $r(x) = 10 - x$ (template syntax) 7. What is the tangent space of SO(3)? How many dimensions? 8. Why does bundle adjustment useSPARSE_SCHUR? 9. What is covariance estimation from NLS? Write the formula. 10. When should you useSetParameterBlockConstantin Ceres?
Theory (45 min): - Problem: $\min f(x)$ s.t. $h(x) = 0$ - Lagrangian: $L(x, \lambda) = f(x) + \lambda^T h(x)$ - Necessary conditions: $\nabla_x L = 0$, $h(x) = 0$ - Geometric interpretation: gradients of $f$ and $h$ are parallel at optimum - Constraint qualification: when do Lagrange conditions hold?
Implementation (60 min):
# Exercise:
# 1. Minimize x² + y² subject to x + y = 1
# Solve: by hand, then verify with scipy.optimize.minimize(constraints=...)
# 2. Minimize distance to origin subject to being on an ellipse
# 3. Implement: find Lagrange multiplier system, solve with Newton
# 4. Robotics: minimum energy trajectory subject to reaching goal position
# 5. Visualize: contours of f, constraint curve, optimal point, gradient alignment
Theory (60 min): - Problem: $\min f(x)$ s.t. $g(x) \leq 0$, $h(x) = 0$ - KKT conditions (necessary for local optimum): 1. Stationarity: $\nabla f + \mu^T \nabla g + \lambda^T \nabla h = 0$ 2. Primal feasibility: $g(x) \leq 0$, $h(x) = 0$ 3. Dual feasibility: $\mu \geq 0$ 4. Complementary slackness: $\mu_i g_i(x) = 0$ - Active set: constraints where $g_i(x) = 0$ - When KKT are sufficient: convex problems
Implementation (60 min):
# Exercise:
# 1. Min x² + y² s.t. x + y ≥ 1 — solve KKT by hand, verify with solver
# 2. Min f(x,y) with 3 inequality constraints — identify active set
# 3. Implement: KKT checker (given a candidate solution, verify all conditions)
# 4. Robotics: minimize actuator effort s.t. velocity limits and position bounds
Practice (30 min): - Complementary slackness intuition: at optimum, either constraint is active OR multiplier is zero - Why this matters: active constraints determine the solution structure
Theory (45 min): - Quadratic penalty: $\min f(x) + \frac{\mu}{2} \|h(x)\|^2 + \frac{\mu}{2} \|\max(0, g(x))\|^2$ - Problem: need $\mu \to \infty$, but ill-conditioning - Augmented Lagrangian: adds explicit multiplier estimates, better conditioning - Log barrier: $\min f(x) - \frac{1}{t} \sum \log(-g_i(x))$ for inequality constraints - Interior point = barrier method with increasing $t$ (central path)
Implementation (60 min):
# Exercise:
# 1. Implement quadratic penalty method
# 2. Show: solution approaches feasible as μ increases, but conditioning worsens
# 3. Implement augmented Lagrangian (ADMM-style)
# 4. Implement log barrier method
# 5. Compare all three on a simple constrained QP
# 6. Plot: penalty function landscape for different μ values
Practice (45 min): - Interior point methods: path through strictly feasible interior → solution - Connection to SVM: hinge loss as penalty for constraint violation
Theory (45 min): - Standard form: $\min c^T x$ s.t. $Ax \leq b$, $x \geq 0$ - Geometry: feasible region is polytope, optimum at vertex - Simplex method: walk along vertices - Duality: primal-dual relationship, strong duality - Interior point for LP: modern alternative to simplex
Implementation (60 min):
# Exercise:
# 1. LP by hand: 2 variables, plot feasible region, find optimum graphically
# 2. Implement simplex method (basic version) from scratch
# 3. Solve with scipy.optimize.linprog — compare simplex vs interior point
# 4. Robotics: task assignment as LP — assign n robots to n tasks, minimize total cost
# 5. Transportation problem: move goods from warehouses to stores, minimize shipping
Practice (45 min): - LP duality: interpret dual variables as prices/shadow prices - Can LP model robot path planning? (not really — but LP relaxation is useful)
Theory (45 min): - QP: $\min \frac{1}{2} x^T P x + q^T x$ s.t. $Gx \leq h$, $Ax = b$ - Convex QP: $P \succeq 0$ → globally optimal solution - KKT system for QP: linear system! (unlike general NLP) - Active set methods: identify binding constraints - Interior point methods for QP
Implementation (60 min):
# Exercise:
# 1. Implement QP via KKT system (for equality-constrained case)
# 2. Solve with OSQP, qpOASES, scipy
# 3. Benchmark: problem size vs solve time for each solver
# 4. Portfolio optimization: min variance s.t. expected return ≥ target
# 5. Robotics: minimum effort joint torque s.t. task-space velocity constraint
# τ = J^T f where J is Jacobian — QP in f
Practice (45 min): - Why QP is the sweet spot: general enough for many constraints, solvable in milliseconds - MPC preview: MPC is just a sequence of QPs
Theory (45 min): - NLP: $\min f(x)$ s.t. $g(x) \leq 0$, $h(x) = 0$ (all possibly nonlinear) - Sequential Quadratic Programming (SQP): solve a QP at each iteration - SQP = Newton's method applied to KKT conditions - Interior point methods for NLP (IPOPT) - Local vs global solutions: NLP is NP-hard in general
Implementation (60 min):
# Exercise:
# 1. scipy.optimize.minimize with constraints:
# method='SLSQP', method='trust-constr'
# 2. Compare: SQP vs interior point on nonlinear constrained problem
# 3. Robotics: trajectory optimization with obstacle avoidance
# min Σ||u_k||² s.t. ||p_k - obstacle|| ≥ d for all k
# 4. Multiple local optima: show two different initializations → different solutions
Practice (45 min): - NLP solver landscape: IPOPT, SNOPT, Knitro, WORHP - When to formulate as NLP vs break into simpler subproblems
# CAPSTONE: Constrained Robot Arm Motion
#
# 2-link planar robot arm:
# - Joint angles: θ₁, θ₂
# - End effector: (x, y) via forward kinematics
# - Actuator torque limits: |τᵢ| ≤ τ_max
# - Joint limits: θ_min ≤ θᵢ ≤ θ_max
# - Workspace obstacle: ||end_effector - obstacle|| ≥ d
#
# Tasks:
# 1. Formulate inverse kinematics as constrained NLP
# 2. Formulate minimum-torque trajectory (N waypoints) as QP
# 3. Add obstacle avoidance → becomes NLP
# 4. Solve with scipy and compare SLSQP vs trust-constr
# 5. Visualize: arm motion, joint trajectories, torque profiles, constraint satisfaction
Theory (60 min): - Convex optimization = convex objective + convex constraints - Hierarchy: LP ⊂ QP ⊂ QCQP ⊂ SOCP ⊂ SDP ⊂ General Convex - Each level: more expressive but harder to solve - Duality: strong duality for convex problems - SOCP (Second-Order Cone Programming): extends QP with norm constraints - Example: robust least squares $\min \|Ax - b\|_2 + \lambda \|x\|_2$ s.t. $\|Cx - d\| \leq 1$ - SDP (Semidefinite Programming): variable is a positive semidefinite matrix - Example: rotation averaging $\min \sum \|R_i - R_j R_{ij}\|_F^2$ s.t. $R \in SO(3)$ (SDP relaxation) - Slater's condition: when strong duality holds
Implementation (60 min):
# Exercise:
# 1. Classify problems: which level of convex hierarchy?
# 2. For each: formulate in standard form, solve, verify
# 3. Show: a problem that looks non-convex but can be reformulated as convex
# Example: minimize ||Ax - b||₁ → LP reformulation
Practice (30 min): - Decision tree: is my problem convex? How to check?
Theory (30 min): - CVXPY: express optimization in math-like syntax - Disciplined Convex Programming (DCP): rules for valid convex programs - Solvers: SCS, OSQP, ECOS, MOSEK
Implementation (120 min):
import cvxpy as cp
# Exercise 1: Least-squares with CVXPY
x = cp.Variable(n)
objective = cp.Minimize(cp.sum_squares(A @ x - b))
prob = cp.Problem(objective)
prob.solve()
# Exercise 2: LASSO (L1 regularized least-squares)
objective = cp.Minimize(cp.sum_squares(A @ x - b) + lam * cp.norm1(x))
# Exercise 3: Robust regression (Huber loss)
objective = cp.Minimize(cp.sum(cp.huber(A @ x - b, M=1.0)))
# Exercise 4: Portfolio optimization
# min risk s.t. return ≥ target, sum(weights) = 1, weights ≥ 0
# Exercise 5: Minimum-time trajectory
# min T s.t. dynamics, velocity limits, start/end conditions
# Exercise 6: Robotics: minimum-snap trajectory for drone
# Polynomial trajectory through waypoints, minimize snap (4th derivative)
# Smooth motion = small snap
# Exercise 7 (SOCP): Robust least-squares
# min ||Ax - b||₂ s.t. ||x||₂ ≤ budget (second-order cone constraint)
# Compare: unconstrained solution vs cone-constrained
# Exercise 8 (SDP): Rotation averaging relaxation
# Given noisy relative rotations R_ij, find absolute rotations R_i
# SDP relaxation: relax R ∈ SO(3) to R^T R ⪰ I
# Solve with CVXPY (SDP), project back to SO(3)
Theory (45 min): - OSQP: Operator Splitting QP solver - ADMM algorithm: split problem into easy subproblems - Warm starting: use previous solution as initial guess (critical for MPC!) - Parameters: rho, sigma, max_iter, polish - Code generation: compile QP to C code for embedded systems
Implementation (60 min):
import osqp
import scipy.sparse as sp
# Exercise:
# 1. Formulate QP in OSQP format: min 0.5 x^T P x + q^T x s.t. l ≤ Ax ≤ u
# 2. Solve, warm start, resolve with slightly changed parameters — compare time
# 3. MPC-style: solve 100 QPs in sequence, each warm-started from previous
# Show: warm starting is 5-10× faster
# 4. Large QP (n=5000): benchmark OSQP vs CVXPY vs scipy
# 5. Code generation: osqp.codegen() → C code for embedded
Practice (45 min): - OSQP vs qpOASES vs ECOS: when to use which? - Real-time requirements: MPC needs QP solved in <1ms for 100Hz control
Theory (60 min): - MPC idea: optimize trajectory over horizon, apply first control, repeat - Formulation for linear system $x_{k+1} = Ax_k + Bu_k$: $\min \sum_{k=0}^{N-1} [x_k^T Q x_k + u_k^T R u_k] + x_N^T Q_f x_N$ s.t. $x_{k+1} = Ax_k + Bu_k$, $u_{\min} \leq u_k \leq u_{\max}$, $x_{\min} \leq x_k \leq x_{\max}$ - This is a QP! Stack all variables: $z = [x_0, u_0, x_1, u_1, ...]$ - Horizon length tradeoff: longer = better planning, slower solve - Stability: terminal cost $Q_f$ and terminal constraint - Recursive feasibility: if feasible at step $k$, must be feasible at step $k+1$ - Terminal constraint set $\mathcal{X}_f$: control-invariant set - Terminal cost $Q_f$: solve DARE (discrete algebraic Riccati equation) - $Q_f$ as Lyapunov function: guarantees closed-loop stability - Infeasibility handling: what to do when QP solver says "infeasible" - Soft constraints: slack variables with large penalty - Priority: hard constraints (dynamics) > safety > performance - Backup controller: switch to safe fallback if MPC fails - Robust MPC (awareness): tube MPC, scenario MPC — handles model uncertainty
Implementation (60 min):
# Exercise:
# 1. Formulate MPC for 1D double integrator (position + velocity)
# 2. Stack into QP: identify P, q, A, l, u matrices
# 3. Solve with OSQP
# 4. Simulate: MPC loop — solve QP, apply first control, propagate, repeat
# 5. Compare: no constraints vs with velocity limit vs with acceleration limit
# 6. Plot: position, velocity, control input over time
Practice (30 min): - Why receding horizon? Why not solve once for entire trajectory? - Computational budget: how does QP size scale with horizon and state dimension?
Implementation (150 min):
# Full MPC controller for differential-drive robot:
#
# State: [x, y, θ, v, ω]
# Control: [a, α] (linear/angular acceleration)
# Dynamics: nonlinear (bicycle model) → linearize around current state
#
# Constraints:
# - |v| ≤ v_max, |ω| ≤ ω_max
# - |a| ≤ a_max, |α| ≤ α_max
# - Obstacle avoidance: ||[x,y] - obs|| ≥ d (nonlinear → linearize)
#
# Implementation:
# 1. Linearize dynamics around current trajectory
# 2. Formulate as QP
# 3. Solve with OSQP (warm-started from previous solution)
# 4. Apply first control
# 5. Re-linearize and repeat
#
# Visualization:
# - Animated robot following reference trajectory
# - Obstacle avoidance in real-time
# - Control inputs vs limits
# - Solve time per iteration
Theory (45 min): - Linear MPC: fast (QP) but needs linearization - Nonlinear MPC: solve NLP directly — more accurate but slower - CasADi: symbolic framework for NLP with automatic differentiation - Multiple shooting vs single shooting vs collocation
Implementation (60 min):
# Exercise:
# 1. pip install casadi
# 2. Formulate the same mobile robot MPC as NLP in CasADi
# 3. Solve with IPOPT (interior point NLP solver)
# 4. Compare: linearized QP-MPC vs nonlinear CasADi-MPC
# - Trajectory accuracy
# - Solve time
# 5. When does the difference matter? (tight turns, near constraints)
Practice (45 min): - MPC in OKS robots: linear MPC for velocity tracking, nonlinear for tight maneuvers - Solver selection: OSQP for linear MPC, IPOPT/ACADOS for nonlinear MPC
# CAPSTONE: Warehouse Robot MPC Controller
#
# Scenario: OKS-like AMR navigating a warehouse grid
#
# 1. Define robot model (differential drive, velocity/acceleration limits)
# 2. Define environment: grid with static obstacles, narrow corridors
# 3. Given a global path (waypoints), implement MPC to track it
# 4. Constraints:
# - Velocity and acceleration limits
# - Obstacle clearance (linearized distance constraints)
# - Corridor width (stay in lane)
# 5. Implementation:
# - QP-based MPC with OSQP
# - Warm starting between timesteps
# - Horizon of N=20 steps at 10Hz control rate
# 6. Test scenarios:
# - Straight corridor with speed
# - Sharp 90° turn (velocity must slow)
# - Narrow passage (tight constraints)
# - Moving obstacle (re-plan)
# 7. Metrics: tracking error, constraint violations, solve time, smoothness
Goal: Apply everything to the three big robotics optimization problems: SLAM (graph optimization), calibration (NLS), and motion planning (MPC/NLP).
🛣️ PHASE V GATE — Self-Assessment (15 min before Day 57) 1. Write the KKT conditions for $\min f(x)$ s.t. $g(x) \leq 0$ 2. What is complementary slackness? 3. Penalty method: why can't $\mu$ be infinite from the start? 4. Formulate MPC for a double integrator as a QP. 5. What does OSQP warm-starting do? Why is it critical for MPC? 6. What is DCP (Disciplined Convex Programming) in CVXPY? 7. LP ⊂ QP ⊂ SOCP ⊂ SDP — give one example of each. 8. What is recursive feasibility in MPC? 9. CasADi: how does it differ from Ceres for trajectory optimization? 10. In NMPC, why is "direct collocation" preferred over "single shooting" for long horizons?
Theory (60 min): - Factor graph: bipartite graph with variable nodes and factor nodes - Joint probability: $p(x) \propto \prod_i f_i(x_{S_i})$ - MAP estimation: $\max_x \prod f_i = \min_x \sum -\log f_i$ - Gaussian factors → NLS problem! - Information matrix: $\Lambda = J^T J$ — sparse due to graph structure - Bayes tree: efficient incremental factorization
Implementation (60 min):
# Exercise:
# 1. Draw factor graphs for:
# (a) Odometry chain: x0 —odom— x1 —odom— x2 —odom— x3
# (b) GPS at every 5th pose: add unary factors
# (c) Loop closure: add factor between x0 and x3
# 2. Write out the NLS problem for each graph
# 3. Compute and visualize the information matrix sparsity pattern
# 4. Show: loop closure creates fill-in in the information matrix
Practice (30 min): - Why factor graphs? They make the structure of optimization problems explicit - Connection: factor graph = Bayesian network = NLS structure
Implementation (150 min):
# Build 1D SLAM optimizer from absolute zero:
#
# Setup:
# - Robot drives along a line: x0 → x1 → x2 → ... → x9
# - Odometry: noisy relative position measurements between consecutive poses
# - Loop closure: noisy measurement between x0 and x9 (robot returned to start)
# - Prior: x0 = 0 (anchor)
#
# Step 1: Define variables (10 scalar poses)
# Step 2: Define residuals for each factor:
# - Prior: r_prior = x0 - 0
# - Odometry: r_odom(i) = (x_{i+1} - x_i) - z_odom(i)
# - Loop closure: r_loop = (x_9 - x_0) - z_loop
# Step 3: Build Jacobian (sparse! tridiagonal + one off-diagonal entry)
# Step 4: Solve with Gauss-Newton: (J^T Ω J) Δx = -J^T Ω r
# Step 5: Iterate until convergence
#
# Visualize:
# - Before optimization: poses from odometry only (drift!)
# - After optimization: loop closure pulls poses into consistency
# - Information matrix sparsity pattern
Implementation (150 min):
# Extend to 2D: poses are (x, y, θ)
#
# Key difference: rotation makes everything nonlinear!
# Relative pose: T_ij = T_i^{-1} * T_j (SE(2) composition)
# Residual: r_ij = Log(Z_ij^{-1} * T_i^{-1} * T_j) where Log maps SE(2) → R³
#
# Implementation:
# 1. SE(2) class: compose, inverse, log, exp
# 2. Residual function for relative pose factor
# 3. Jacobian computation (analytical — derive on paper first!)
# 4. Build sparse Jacobian and information matrix
# 5. Gauss-Newton solver with sparse Cholesky (scipy.sparse)
# 6. Test: 50 poses in a loop, with 2-3 loop closures
#
# Visualize:
# - Trajectory before/after optimization
# - Covariance ellipses at each pose
# - Convergence: total error vs iteration
Theory (30 min): - g2o: General Graph Optimization — C++ framework - Vertex types: SE2, SE3 - Edge types: relative pose constraints - Solver backends: CSparse, Cholmod, PCG - Python bindings available
Implementation (120 min):
# Exercise (using g2o Python bindings or C++):
#
# 1. Install and set up g2o
# 2. Recreate the 2D SLAM problem from Day 59 using g2o
# 3. Compare: your implementation vs g2o (results should match!)
# 4. Large-scale: 1000 poses, 50 loop closures
# Your code: slow. g2o: fast (optimized sparse solver).
# 5. 3D pose graph: SE3 vertices (bonus if using C++)
#
# Or with GTSAM:
# pip install gtsam
import gtsam
# Use gtsam.NonlinearFactorGraph, gtsam.Values, gtsam.LevenbergMarquardtOptimizer
Theory (45 min): - GTSAM: factor graph library with focus on SLAM and SFM - iSAM2: incremental smoothing and mapping - Don't re-solve everything when a new factor arrives - Bayes tree: only update affected variables - O(affected variables) instead of O(all variables) - Why incremental matters: online SLAM must be fast
Implementation (60 min):
import gtsam
# Exercise:
# 1. Build 2D SLAM with GTSAM's factor graph
# 2. Add noise models (diagonal, full covariance)
# 3. Batch solve with LevenbergMarquardtOptimizer
# 4. Incremental solve with ISAM2:
# - Add factors one at a time (as if receiving data online)
# - After each addition, get updated estimate
# 5. Compare: batch vs incremental — results should be similar, incremental faster
# 6. Landmark SLAM: add point landmarks observed from multiple poses
Practice (45 min): - g2o vs GTSAM: when to use which? (g2o: simpler, GTSAM: more features + incremental) - Factor graph for OKS: what factors would a real warehouse SLAM system use?
Theory (45 min): - Problem: wrong loop closures (perceptual aliasing) → catastrophic failure - Switchable constraints: add binary switch variable to each loop closure - Max-mixture models: mixture of inlier and outlier distributions - DCS (Dynamic Covariance Scaling) - Graduated Non-Convexity (GNC): start convex, gradually tighten
Implementation (60 min):
# Exercise:
# 1. Create SLAM scenario with 3 correct and 2 wrong loop closures
# 2. Standard solver: completely wrong map
# 3. With Huber loss on loop closures: partially helps
# 4. Switchable constraints: implement in your custom solver
# 5. GTSAM robust factors: use gtsam.noiseModel.mEstimator
# 6. Compare all approaches: which correctly rejects wrong closures?
Practice (45 min): - Why is robust SLAM hard? (wrong closures look just like correct ones locally) - State of the art: Kimera, Hydra — what robustness strategies do they use?
# CAPSTONE: Full 2D SLAM Pipeline
#
# 1. Generate synthetic warehouse environment:
# - Grid of corridors (like OKS warehouse)
# - Robot does 3 loops around the warehouse
# - Odometry: differential drive model with noise
# - Lidar: simulated 2D scans (point clouds)
#
# 2. Frontend (given — focus on backend):
# - ICP scan matching between consecutive poses → odometry factors
# - Place recognition → loop closure candidates (some wrong!)
#
# 3. Your backend:
# - Factor graph with odometry + loop closure factors
# - Robust estimation (Huber or switchable constraints)
# - Solve with Gauss-Newton (your code) and verify with GTSAM
#
# 4. Evaluation:
# - Absolute Trajectory Error (ATE) vs ground truth
# - With/without loop closures
# - With/without robust estimation
# - Runtime comparison: your solver vs GTSAM
#
# 5. Visualization:
# - Trajectory: odometry-only vs optimized vs ground truth
# - Map: accumulated scans before and after optimization
# - Rejected loop closures highlighted
Theory (45 min): - Extrinsic calibration: find transform between sensors - Intrinsic calibration: find sensor parameters (focal length, wheel radius, ...) - Hand-eye calibration: $AX = XB$ problem - Observability: do we have enough data to calibrate? (Fisher information) - Temporal calibration: estimate time offset $\Delta t$ between sensors - Why: cameras, lidars, IMUs have different timestamps — even 5ms offset corrupts calibration - Method: add $\Delta t$ as optimization variable, interpolate measurements - Joint spatio-temporal calibration: solve for extrinsic + time offset together
Implementation (60 min):
# Exercise:
# 1. Camera intrinsic calibration:
# Generate synthetic checkerboard observations from multiple viewpoints
# Estimate: fx, fy, cx, cy, k1, k2 (distortion)
# Solve with scipy.optimize.least_squares
#
# 2. Lidar-to-camera extrinsic:
# Given: 3D lidar points + 2D image points of known targets
# Estimate: rotation + translation (6 DoF)
# Solve with Ceres (C++) or scipy (Python)
#
# 3. IMU bias calibration:
# Given: static IMU data (robot not moving)
# Estimate: gyro bias, accel bias, gravity direction
Practice (45 min): - What makes calibration hard? (initialization, local minima, degenerate configurations) - Data collection strategy: how to move to maximize Fisher information?
Theory (45 min): - Direct collocation: discretize trajectory, optimize all points simultaneously - Direct shooting: integrate dynamics forward, optimize controls - Polynomial trajectories: minimum-jerk, minimum-snap - Bezier curves and B-splines as trajectory representations
Implementation (60 min):
# Exercise:
# 1. Minimum-jerk trajectory between two points:
# 5th order polynomial, minimize ∫ jerk² dt
# Closed-form solution — derive and implement
#
# 2. Minimum-snap trajectory through waypoints (drone planning):
# Piecewise polynomial, continuity constraints
# Solve as QP with CVXPY
#
# 3. Direct collocation for double integrator with obstacles:
# NLP with CasADi, solved by IPOPT
#
# 4. Compare: polynomial vs collocation for a zigzag path
Practice (45 min): - Trajectory optimization in OKS: smoothing the global path for the local planner - Connection to MPC: trajectory optimization = open-loop, MPC = closed-loop
Theory (45 min): - Common failures: divergence, slow convergence, infeasible, unbounded - Diagnosis checklist: 1. Is the problem well-posed? (check Jacobian rank) 2. Is the initial guess reasonable? (residual at x0) 3. Is the Jacobian correct? (gradient checker) 4. Is the problem scaled? (column norms of Jacobian) 5. Is there numerical cancellation? (condition number) - Solver output interpretation: Ceres summary, OSQP info, IPOPT logs
Implementation (60 min):
# Exercise: Debug These Broken Optimizations
#
# 1. Problem: solver diverges → diagnosis: wrong Jacobian sign
# 2. Problem: solver converges to wrong answer → diagnosis: local minimum, need better init
# 3. Problem: solver is very slow → diagnosis: ill-conditioned, need scaling
# 4. Problem: solver says "infeasible" → diagnosis: conflicting constraints
# 5. Problem: solver oscillates → diagnosis: discontinuous cost function
#
# For each: identify the issue, fix it, verify solution
Practice (45 min): - Build your debugging checklist (you'll use this in production) - Real OKS scenario: estimator calibration diverged — walk through diagnosis
Theory (45 min): - Real-time requirements: solve within control loop period (1-10ms) - Warm starting: previous solution as initial guess - Early termination: good-enough solution in limited iterations - Code generation: compile problem to fast C code - Explicit MPC: precompute solution as piecewise affine function of state
Implementation (60 min):
# Exercise:
# 1. MPC loop at 100Hz: measure solve time per iteration
# Must be <10ms consistently!
# 2. Warm starting benchmark: cold start vs warm start solve time
# 3. Iteration-limited solve: 5, 10, 20 max iterations — quality vs time tradeoff
# 4. OSQP code generation: generate C solver, call from Python via ctypes
# 5. Compile-time: benchmark generated C solver vs Python OSQP
Practice (45 min): - Real-time constraints in OKS: MPC runs at 10-50Hz, must be deterministic - WCET (Worst Case Execution Time): why average isn't enough
Theory (60 min): Survey of production optimization tools:
| Tool | Type | Language | Strength |
|---|---|---|---|
| Ceres | NLS | C++ | Robotics calibration, SLAM |
| g2o | Graph NLS | C++ | SLAM backends |
| GTSAM | Factor graph | C++ | Incremental SLAM, SFM |
| OSQP | QP | C/Python | Real-time MPC |
| CVXPY | Convex | Python | Prototyping, modeling |
| CasADi | NLP/AD | Python/C++ | Nonlinear MPC, optimal control |
| IPOPT | NLP | C++ | General large-scale NLP |
| ACADOS | NMPC | C/Python | Embedded nonlinear MPC |
| Eigen | Dense LA | C++ | Foundation for all above |
| SuiteSparse | Sparse LA | C | Backend for Ceres, g2o |
| OR-Tools | LP/MIP | C++/Python | Task assignment, scheduling |
Implementation (90 min):
# Exercise: Solve the SAME problem in multiple tools
#
# Problem: 2D robot trajectory optimization with obstacles
# - 50 timesteps, differential drive
# - 3 circular obstacles
# - Start and goal positions
#
# Solve with:
# 1. scipy.optimize.minimize (SLSQP)
# 2. CVXPY (convex relaxation)
# 3. CasADi + IPOPT
# 4. OSQP (linearized version)
#
# Compare: solution quality, solve time, ease of formulation
# FINAL CAPSTONE: Integrated Robotics Optimization Pipeline
#
# Build a complete optimization stack for a simulated warehouse robot:
#
# === Module 1: Calibration (Ceres C++ or scipy Python) ===
# - Input: raw sensor data (wheel encoders, IMU, lidar)
# - Output: calibrated parameters (wheel radii, baseline, sensor transforms)
# - Method: NLS with Huber loss, Levenberg-Marquardt
#
# === Module 2: SLAM Backend (custom or GTSAM) ===
# - Input: odometry factors + loop closure factors
# - Output: optimized pose graph (trajectory + map)
# - Method: Gauss-Newton on factor graph, robust to wrong loop closures
#
# === Module 3: MPC Controller (OSQP or CasADi) ===
# - Input: reference path from global planner, current state
# - Output: velocity commands
# - Method: QP-MPC with velocity/acceleration limits and obstacle avoidance
# - Runs at 10Hz with warm starting
#
# === Integration ===
# 1. Calibrate sensors → use calibrated parameters in SLAM and MPC
# 2. Run SLAM → get corrected map and trajectory
# 3. Plan path on SLAM map → reference for MPC
# 4. MPC tracks path while avoiding obstacles
#
# === Deliverables ===
# - Working code for all three modules
# - Convergence plots for each optimization
# - Runtime benchmarks
# - Written report: which optimization method for which module, and why
## Day [N]: [Topic]
Date: ____
### Theory (45 min)
- [ ] Read assigned material
- [ ] Took notes on key concepts
- [ ] Can explain main idea in one sentence
### Implementation (60 min)
- [ ] Completed all exercises
- [ ] Code runs and produces correct output
- [ ] Compared with reference solution / library
### Practice (45 min)
- [ ] Completed problem set
- [ ] Reviewed mistakes
- [ ] Updated flashcards
### Reflection
- Hardest concept today:
- Connection to previous material:
- Still unclear:
Before starting, you should be comfortable with:
If not, spend 3-5 days on a crash course before Day 1.
| Priority | Resource | When |
|---|---|---|
| Essential | Boyd & Vandenberghe (free PDF) | Weeks 7-8, reference throughout |
| Essential | Ceres Solver tutorial | Week 6 |
| Essential | Grisetti SLAM tutorial paper | Week 9 |
| High | Nocedal & Wright Ch 1-7 | Weeks 3-5, algorithm detail |
| Medium | GTSAM tutorial | Week 9 |
| Medium | Rawlings MPC textbook | Week 8 |
| Reference | OSQP paper | Week 8 |
| Reference | g2o paper | Week 9 |
| Week | Phase | Status | Hours | Notes |
|---|---|---|---|---|
| 1 | Math Foundations I | ☐ | 17.5 | |
| 2 | Math Foundations II | ☐ | 17.5 | |
| 3 | Unconstrained I | ☐ | 17.5 | |
| 4 | Unconstrained II | ☐ | 17.5 | |
| 5 | Least Squares (Python) | ☐ | 17.5 | |
| 6 | Ceres Solver (C++) | ☐ | 17.5 | |
| 7 | Constrained Optimization | ☐ | 17.5 | |
| 8 | Convex + MPC | ☐ | 17.5 | |
| 9 | SLAM | ☐ | 17.5 | |
| 10 | Applications + Capstone | ☐ | 17.5 | |
| 11 | Lie Groups: Group Theory & Rotations | ☐ | 17.5 | Phase VI |
| 12 | Lie Groups: Rigid Motions & Lie Algebra | ☐ | 17.5 | |
| 13 | Lie Groups: Manifold Optimization + Capstone | ☐ | 17.5 | |
| 14 | Game Theory: Normal Form & Zero-Sum | ☐ | 17.5 | Phase VII |
| 15 | Game Theory: Cooperative & Dynamic | ☐ | 17.5 | |
| 16 | Game Theory: Algorithmic GT + Capstone | ☐ | 17.5 | |
| Total | 280 |