Phase I — Mathematical Foundations | Week 2 | 2.5 hours Integrate everything: decompositions, sparsity, differentiation, conditioning, and iterative solvers.
This review pulls together the computational pipeline that Ceres and GTSAM use internally. When you call ceres::Solve(), this is what happens: the Jacobian $J$ is assembled (sparse), the normal equations $J^TJ \Delta x = -J^Tg$ are formed (exploiting sparsity via Schur complement), the system is solved (sparse Cholesky or preconditioned CG), and variable scaling ensures good conditioning. Every topic from this week is part of that pipeline.
Matrix Decompositions (Day 8)
├── LU, Cholesky, QR
├── Pivoting for stability
└── Cholesky ← used by Ceres for normal equations
│
Sparse Matrices & Schur (Day 9)
├── CSR/CSC storage
├── Fill-reducing ordering (AMD, METIS)
└── Schur complement for BA
│
Numerical Differentiation (Day 10)
├── Forward, central, complex-step
├── Truncation vs round-off trade-off
└── Gradient/Jacobian/Hessian checking
│
Automatic Differentiation (Day 11)
├── Forward mode (dual numbers)
├── Reverse mode (backpropagation)
└── Ceres Jet type, JAX
│
Conditioning (Day 12)
├── κ(A) = λ_max/λ_min
├── Sources: scale mismatch, gauge freedom
└── Preconditioning, variable scaling
│
Iterative Methods — CG (Day 13)
├── Krylov subspace, A-conjugacy
├── O(√κ) convergence
└── Preconditioned CG → Ceres ITERATIVE_SCHUR
Build a function check_gradient(f, grad_f, x) that:
1. Computes the analytic gradient grad_f(x)
2. Computes the numerical gradient using central differences
3. Reports the relative error for each component
4. Returns PASS if all relative errors < $10^{-5}$
Test it on $f(x) = \|Ax - b\|^2$ and intentionally introduce a sign error to verify it catches bugs.
Create a tridiagonal system $Ax = b$ with $n = 50000$: $$A_{ij} = \begin{cases} 4 & i = j \\ -1 & |i-j| = 1 \\ 0 & \text{otherwise} \end{cases}$$
Solve using: (a) scipy.sparse.linalg.spsolve, (b) your CG implementation, (c) PCG with Jacobi. Report times and verify solutions match.
Using your dual number implementation from Day 11, compute the full $m \times n$ Jacobian of: $$f(x_1, x_2, x_3) = \begin{pmatrix} x_1^2 + \sin(x_2 x_3) \\ e^{x_1} \cos(x_2) \\ x_3 / (1 + x_1^2) \end{pmatrix}$$
Verify against both symbolic and numerical Jacobians.
For $n = 100$, create SPD matrices with $\kappa \in \{10, 100, 10^3, 10^4, 10^5, 10^6\}$: 1. Solve $Ax = b$ and measure digits of precision 2. Run gradient descent and count iterations to $10^{-6}$ relative error 3. Run CG and count iterations 4. Plot all three metrics vs $\log_{10}(\kappa)$
Verify: GD iterations ∝ $\kappa$, CG iterations ∝ $\sqrt{\kappa}$.
For the matrix from Problem 4 with $\kappa = 10^4$: 1. CG without preconditioning: record iteration count 2. Jacobi preconditioning: record iteration count 3. "Perfect" preconditioning ($M = A$): verify 1 iteration 4. Compute $\kappa(M^{-1}A)$ for each case
Create a block system: $$\begin{pmatrix} B & E \\ E^T & C \end{pmatrix} \begin{pmatrix} x_c \\ x_l \end{pmatrix} = \begin{pmatrix} g_c \\ g_l \end{pmatrix}$$
Where $B$ is $10 \times 10$ (cameras), $C$ is $90 \times 90$ block-diagonal (landmarks), $E$ is $10 \times 90$.
Solve using: (a) full dense solve, (b) Schur complement eliminate landmarks first, then back-substitute. Verify answers match.
For $f(x) = \text{softmax}(x)^T \text{softmax}(x)$ at $x = [1, 2, 3, 4, 5]$: 1. Forward-mode AD gradient (JAX or your dual numbers) 2. Central difference gradient with $h = 10^{-7}$ 3. Complex-step gradient with $h = 10^{-30}$ 4. Compare accuracy and time
Simulate a mini-SLAM problem: - 5 robot poses (each $[x, y, \theta]$), 20 landmarks (each $[l_x, l_y]$) - Random odometry and observation measurements with noise - Assemble the sparse information matrix $H = J^T \Sigma^{-1} J$ - Compute $\kappa(H)$ (should be large due to gauge freedom) - Add a prior on pose 0 to fix the gauge: $H \leftarrow H + \text{diag}(10, 10, 10, 0, \ldots, 0)$ - Solve with CG and report the condition number improvement
With Week 2 complete, you have the mathematical toolkit for optimization: - Linear algebra: vectors, matrices, eigenvalues, SVD, decompositions - Calculus: gradients, Jacobians, Hessians, Taylor expansions - Differentiation: numerical, automatic (forward + reverse) - Computation: sparse solvers, conditioning, CG/PCG
Phase II (Weeks 3-4) builds on all of this: gradient descent uses gradients and step-size selection; Newton's method uses Hessians and Cholesky; trust region methods use CG (Steihaug); BFGS approximates the Hessian without forming it.