Self-assessment guide: These exercises test your ability to diagnose and fix numerical issues — the skill that separates textbook knowledge from production debugging.
A1. The normal equations $J^T J \Delta x = -J^T r$ square the condition number: $\kappa(J^T J) = \kappa(J)^2$. For a problem where $\kappa(J) = 10^6$, how many digits of accuracy can you expect in the solution when using: 1. Normal equations (Cholesky on $J^T J$) 2. QR factorization of $J$ 3. SVD of $J$
A2. You're using finite differences to compute a Jacobian. The step size $h$ affects accuracy: - Too large: truncation error dominates - Too small: roundoff error dominates
For forward differences, the optimal $h$ is about $\sqrt{\epsilon_{\text{mach}}} \approx 1.5 \times 10^{-8}$. For central differences, it's $\epsilon_{\text{mach}}^{1/3} \approx 6 \times 10^{-6}$. Explain why central differences allow a larger (better) step size.
A3. What is "fill-in" in sparse matrix factorization, and why does variable ordering matter?
H = [X X X X X] L = [X . . . .] (good ordering)
[X X . . .] [X X . . .]
[X . X . .] [X . X . .]
[X . . X .] [X . . X .]
[X . . . X] [X X X X X] ← fill-in in last row
vs.
H = [X . . . X] L = [X . . . .] (bad ordering — hub first)
[. X . . X] [. X . . .]
[. . X . X] [. . X . .]
[. . . X X] [. . . X .]
[X X X X X] [X X X X X] ← same, but rows above have fill too
Actually, putting the "hub" node last produces minimal fill-in (the dense row only fills the last row of $L$). Putting it first causes fill-in to cascade through all subsequent rows.
**AMD (Approximate Minimum Degree):** At each step, eliminate the variable with the fewest connections (smallest degree in the elimination graph). This heuristic minimizes fill-in for most practical problems.
**For SLAM:** Poses at the end of long corridors (few connections) should be eliminated first. Poses at intersections (many connections) should be eliminated last.
B1. Write a function that checks the gradient of a cost function using finite differences, similar to Ceres' CheckGradients functionality.
import numpy as np
def check_gradient(f, grad_f, x, h=1e-7, tol=1e-5):
"""
Compare analytic gradient with central-difference approximation.
Returns: max relative error, per-component errors.
"""
# TODO: implement
# For each component, compute central difference
# Compare with analytic gradient
# Report max relative error
pass
# Test with a known function
def f(x):
return x[0]**2 * np.sin(x[1]) + np.exp(x[0] * x[1])
def grad_f(x):
return np.array([
2 * x[0] * np.sin(x[1]) + x[1] * np.exp(x[0] * x[1]),
x[0]**2 * np.cos(x[1]) + x[0] * np.exp(x[0] * x[1])
])
x_test = np.array([1.5, 0.7])
max_err = check_gradient(f, grad_f, x_test)
print(f"Max relative error: {max_err:.2e}") # Should be < 1e-8
import numpy as np
def check_gradient(f, grad_f, x, h=1e-7, tol=1e-5):
n = len(x)
analytic = grad_f(x)
numeric = np.zeros(n)
for i in range(n):
x_plus = x.copy()
x_minus = x.copy()
x_plus[i] += h
x_minus[i] -= h
numeric[i] = (f(x_plus) - f(x_minus)) / (2 * h)
errors = np.abs(analytic - numeric)
# Relative error: use max(|analytic|, |numeric|, 1) to avoid division by zero
scale = np.maximum(np.abs(analytic), np.maximum(np.abs(numeric), 1.0))
rel_errors = errors / scale
max_rel_err = np.max(rel_errors)
for i in range(n):
status = "OK" if rel_errors[i] < tol else "FAIL"
print(f" x[{i}]: analytic={analytic[i]:.8f}, "
f"numeric={numeric[i]:.8f}, "
f"rel_err={rel_errors[i]:.2e} [{status}]")
return max_rel_err
# Test
def f(x):
return x[0]**2 * np.sin(x[1]) + np.exp(x[0] * x[1])
def grad_f(x):
return np.array([
2 * x[0] * np.sin(x[1]) + x[1] * np.exp(x[0] * x[1]),
x[0]**2 * np.cos(x[1]) + x[0] * np.exp(x[0] * x[1])
])
x_test = np.array([1.5, 0.7])
max_err = check_gradient(f, grad_f, x_test)
print(f"\nMax relative error: {max_err:.2e}")
# Now introduce a deliberate bug:
def grad_f_buggy(x):
return np.array([
2 * x[0] * np.sin(x[1]) + x[1] * np.exp(x[0] * x[1]),
x[0]**2 * np.sin(x[1]) + x[0] * np.exp(x[0] * x[1]) # BUG: sin not cos
])
print("\nWith buggy gradient:")
max_err = check_gradient(f, grad_f_buggy, x_test)
print(f"Max relative error: {max_err:.2e}") # Will show FAIL for x[1]
B2. Demonstrate the effect of condition number on solver accuracy.
import numpy as np
def solve_and_check(kappa_target):
"""
Create a system Ax = b with known condition number,
solve it, and measure the actual error.
"""
n = 10
# Create matrix with specific condition number
U, _ = np.linalg.qr(np.random.randn(n, n))
# Singular values linearly spaced from 1 to kappa
S = np.linspace(1, kappa_target, n)
A = U @ np.diag(S) @ U.T
# Known solution
x_true = np.ones(n)
b = A @ x_true
# Solve
x_computed = np.linalg.solve(A, b)
# Measure error
rel_error = np.linalg.norm(x_computed - x_true) / np.linalg.norm(x_true)
actual_kappa = np.linalg.cond(A)
return rel_error, actual_kappa
# TODO: Run for kappa = 1, 1e3, 1e6, 1e9, 1e12, 1e15
# Plot relative error vs condition number
# At what kappa does the solution become unreliable?
import numpy as np
def solve_and_check(kappa_target, n=10):
np.random.seed(42)
U, _ = np.linalg.qr(np.random.randn(n, n))
S = np.logspace(0, np.log10(kappa_target), n)
A = U @ np.diag(S) @ U.T
x_true = np.ones(n)
b = A @ x_true
x_computed = np.linalg.solve(A, b)
rel_error = np.linalg.norm(x_computed - x_true) / np.linalg.norm(x_true)
actual_kappa = np.linalg.cond(A)
return rel_error, actual_kappa
print(f"{'κ target':>12s} {'κ actual':>12s} {'Relative Error':>15s} {'Digits Lost':>12s}")
print("-" * 55)
for k in [1, 1e3, 1e6, 1e9, 1e12, 1e15]:
err, kappa = solve_and_check(k)
digits_lost = max(0, np.log10(kappa))
print(f"{k:12.0e} {kappa:12.2e} {err:15.2e} {digits_lost:12.1f}")
# Expected pattern:
# κ=1: error ~1e-16 (machine precision)
# κ=1e3: error ~1e-13 (lost 3 digits)
# κ=1e6: error ~1e-10 (lost 6 digits)
# κ=1e12: error ~1e-4 (lost 12 digits — barely usable)
# κ=1e15: error ~1e-1 (lost 15 digits — garbage)
Rule of thumb: you lose $\log_{10}(\kappa)$ digits of accuracy. With 16 digits (double precision), $\kappa > 10^{12}$ means fewer than 4 reliable digits.
C1. You're debugging the OKS navigation estimator and find that the covariance matrix $P$ has grown to have a condition number of $10^{10}$. This causes the Kalman gain computation $K = PH^T(HPH^T + R)^{-1}$ to produce inaccurate results.
C2. A Ceres optimization for calibrating a sensor model converges to a solution, but the residuals have a clear pattern (not random). Your colleague says "the solver found the minimum, so we're done." What's wrong?