Day 2: Eigenvalues, Positive Definiteness, SVD
Phase I — Mathematical Foundations | Week 1 | 2.5 hours
PD matrices are the foundation of optimization: every well-behaved minimum has a PD Hessian.
Navigation
OKS Relevance
The information matrix in SLAM (inverse covariance) must be PD. SVD reveals which directions of a state have strong measurements vs weak. Condition number predicts whether your optimizer will converge reliably.
Theory (45 min)
2.1 Eigenvalue Decomposition
For symmetric $A \in \mathbb{R}^{n \times n}$:
$$A = Q \Lambda Q^T$$
where $Q$ is orthogonal ($Q^T Q = I$) and $\Lambda = \text{diag}(\lambda_1, \ldots, \lambda_n)$.
Geometric interpretation: Eigenvectors define the principal axes. Eigenvalues define the stretching along each axis.
2.2 Positive Definiteness Classification
| Classification |
Eigenvalues |
Meaning |
| Positive definite (PD) |
All $\lambda_i > 0$ |
Bowl shape — unique minimum |
| Positive semidefinite (PSD) |
All $\lambda_i \geq 0$ |
Flat valleys — degenerate minimum |
| Indefinite |
Mixed signs |
Saddle point |
| Negative definite |
All $\lambda_i < 0$ |
Hill — maximum |
Tests for PD:
1. All eigenvalues positive
2. All leading principal minors positive (Sylvester's criterion)
3. Cholesky decomposition succeeds
4. $x^T A x > 0$ for all $x \neq 0$
2.3 Singular Value Decomposition
For any $A \in \mathbb{R}^{m \times n}$:
$$A = U \Sigma V^T$$
- $U \in \mathbb{R}^{m \times m}$: left singular vectors (column space basis)
- $\Sigma \in \mathbb{R}^{m \times n}$: singular values $\sigma_1 \geq \sigma_2 \geq \cdots \geq 0$
- $V \in \mathbb{R}^{n \times n}$: right singular vectors (row space basis)
SVD reveals everything: rank = # nonzero $\sigma_i$, null space = columns of $V$ for zero $\sigma_i$.
2.4 Condition Number
$$\kappa(A) = \frac{\sigma_{\max}}{\sigma_{\min}} = \frac{\lambda_{\max}}{\lambda_{\min}} \quad (\text{for PD } A)$$
Meaning: A small perturbation $\delta b$ to $Ax = b$ causes solution change bounded by $\kappa(A) \cdot \|\delta b\| / \|b\|$.
- $\kappa \approx 1$: well-conditioned → stable
- $\kappa \gg 1$: ill-conditioned → small input changes cause large output changes
- $\kappa = \infty$: singular
Optimization connection: Condition number of the Hessian determines gradient descent convergence rate: $O\left(\left(\frac{\kappa-1}{\kappa+1}\right)^{2k}\right)$ steps.
Implementation (60 min)
Exercise 1: Power Iteration
# See: code/week01/eigenvalues_svd.py
Implement power iteration to find the dominant eigenvalue/eigenvector.
Exercise 2: PD Classification
Check positive definiteness via: (a) eigenvalues, (b) Cholesky attempt, (c) quadratic form test.
Exercise 3: SVD and Low-Rank Approximation
Compute SVD, reconstruct with $k < r$ singular values, measure approximation error.
Exercise 4: Condition Number → Numerical Error
Solve $Ax = b$, perturb $b$, show solution error amplifies by $\kappa(A)$.
Practice (45 min)
Classify the following matrices as PD/PSD/indefinite. For each, verify via eigenvalues, determinant test, and Cholesky:
- $A = \begin{pmatrix} 2 & 1 \\ 1 & 2 \end{pmatrix}$
- $A = \begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}$
- $A = \begin{pmatrix} 4 & 2 \\ 2 & 1 \end{pmatrix}$
- $A = \begin{pmatrix} 1 & 0 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 1 \end{pmatrix}$
- $I_{3 \times 3}$
6–10. Additional 5 random symmetric matrices (generate and classify).
Expert Challenges
🎯 Challenge 1: Why is the Hessian at a minimum PD?
**Problem:** Prove that if $x^*$ is a strict local minimum of $f$, then $H(x^*) \succeq 0$ (PSD). Under what additional condition is it PD?
**Solution:** By Taylor expansion: $f(x^* + \delta) \approx f(x^*) + \nabla f(x^*)^T \delta + \frac{1}{2}\delta^T H \delta$. Since $\nabla f(x^*) = 0$, we get $f(x^* + \delta) - f(x^*) \approx \frac{1}{2}\delta^T H \delta$. For this to be $\geq 0$ for all $\delta$, we need $H \succeq 0$. If $f$ is *strictly* convex near $x^*$, then $H \succ 0$.
🎯 Challenge 2: Condition number and GD convergence
**Problem:** For $f(x) = \frac{1}{2}x^T A x - b^T x$ with PD $A$ having condition number $\kappa$, how many gradient descent iterations are needed to reduce the error by a factor of $10^{-6}$?
**Solution:** GD converges as $\|x_k - x^*\| \leq \left(\frac{\kappa - 1}{\kappa + 1}\right)^k \|x_0 - x^*\|$. For $\kappa = 100$: convergence rate $\approx 0.98$, so $0.98^k = 10^{-6}$ gives $k \approx 683$ iterations. For $\kappa = 2$: rate $\approx 0.33$, so $k \approx 12$. This motivates preconditioning and second-order methods.
🎯 Challenge 3: SVD of a measurement Jacobian
**Problem:** A robot has a range sensor giving measurements $z_i = \|p_{\text{robot}} - p_{\text{landmark}_i}\| + \epsilon_i$. The Jacobian $J$ has singular values $[5.2, 3.1, 0.01]$. What does this tell you about observability?
**Solution:** The third direction has singular value near zero — it's nearly unobservable. This means you can't reliably estimate the state in that direction. The condition number $\kappa = 5.2/0.01 = 520$ is high, meaning the estimate will be highly sensitive to noise in that direction. Geometrically, the landmarks may be nearly collinear, giving poor depth resolution in one direction.
Self-Assessment Checklist
- [ ] I can classify any symmetric matrix as PD/PSD/indefinite using 3 different tests
- [ ] I understand SVD geometrically and can compute low-rank approximations
- [ ] I can predict numerical trouble from the condition number
- [ ] I see the bridge: PD Hessian ↔ local minimum ↔ optimization
Key Takeaways
- PD = bowl = minimum. The entire field of optimization revolves around this.
- SVD is the swiss army knife — rank, null space, condition number, low-rank approximation.
- Condition number predicts convergence — if $\kappa(H)$ is bad, simple methods will struggle.
- Every SLAM information matrix must be PD — otherwise the MAP estimate is ill-defined.