Self-assessment guide: These exercises cover the full landscape of matrix theory needed for optimization and robotics — from hand computation to Python verification to real-world scenarios. Complete them in order; later exercises build on earlier ones.
Difficulty key: ⭐ Foundational | ⭐⭐ Intermediate | ⭐⭐⭐ Advanced | ⭐⭐⭐⭐ Challenge
A1. ⭐ Classify each matrix as symmetric / skew-symmetric / orthogonal / none. State which spectral classes (PD, PSD, indefinite, etc.) are even possible for that structural type.
$$A = \begin{pmatrix} 3 & 1 \\ 1 & 4 \end{pmatrix}, \quad B = \begin{pmatrix} 0 & -2 \\ 2 & 0 \end{pmatrix}, \quad C = \frac{1}{\sqrt{2}} \begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix}$$
A2. ⭐ Given the block matrix:
$$M = \begin{pmatrix} 2I_3 & J \\ J^T & 5I_2 \end{pmatrix}$$
where $J \in \mathbb{R}^{3 \times 2}$. Is $M$ symmetric? What conditions on $J$ make $M$ positive definite?
A3. ⭐⭐ A matrix $A \in \mathbb{R}^{5 \times 5}$ has eigenvalues $\{8, 4, 2, 0.5, 0.001\}$.
B1. ⭐ Apply Sylvester's criterion to determine if each matrix is positive definite:
$$P = \begin{pmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \end{pmatrix}, \qquad Q = \begin{pmatrix} 1 & 2 \\ 2 & 3 \end{pmatrix}$$
B2. ⭐⭐ Prove: if $A \succ 0$ and $B \succ 0$, then $A + B \succ 0$.
B3. ⭐⭐ A noisy sensor gives you this estimated covariance matrix:
$$\hat{\Sigma} = \begin{pmatrix} 4.0 & 1.0 & 3.1 \\ 1.0 & 2.0 & -0.5 \\ 3.1 & -0.5 & 1.0 \end{pmatrix}$$
import numpy as np
Sigma_hat = np.array([[4.0, 1.0, 3.1], [1.0, 2.0, -0.5], [3.1, -0.5, 1.0]])
eigvals, Q = np.linalg.eigh(Sigma_hat)
eigvals_clipped = np.maximum(eigvals, 0.01)
Sigma_pd = Q @ np.diag(eigvals_clipped) @ Q.T
print(np.linalg.cholesky(Sigma_pd)) # Now succeeds
**3. Why PD matters:** A covariance matrix that isn't PD implies some direction has zero or negative variance, which is physically meaningless. The Mahalanobis distance $d^2 = (x-\mu)^T \Sigma^{-1}(x-\mu)$ requires $\Sigma \succ 0$. Non-PD covariance will cause Kalman filter updates, optimization weights, and uncertainty ellipsoid calculations to break.
B4. ⭐⭐⭐ The Schur complement test says $M = \begin{pmatrix} A & B \\ B^T & C \end{pmatrix} \succ 0$ iff $A \succ 0$ and $C - B^T A^{-1} B \succ 0$.
In bundle adjustment, the Hessian is: $$H = \begin{pmatrix} H_{cc} & H_{cl} \\ H_{lc} & H_{ll} \end{pmatrix}$$
where $H_{cc}$ is the camera-camera block ($6c \times 6c$) and $H_{ll}$ is the landmark-landmark block ($3l \times 3l$), with $l \gg c$.
B5. ⭐ Check diagonal dominance for each matrix. Is it strictly diagonally dominant? Is it PD? Comment on the relationship.
$$D_1 = \begin{pmatrix} 5 & 1 & 1 \\ 1 & 4 & 1 \\ 1 & 1 & 6 \end{pmatrix}, \qquad D_2 = \begin{pmatrix} 2 & -1 & 0 \\ -1 & 2 & -1 \\ 0 & -1 & 2 \end{pmatrix}, \qquad D_3 = \begin{pmatrix} 1 & 2 \\ 2 & 1 \end{pmatrix}$$
C1. ⭐ Compute the full SVD of $A = \begin{pmatrix} 3 & 0 \\ 0 & -2 \end{pmatrix}$ by hand.
C2. ⭐⭐ Compute the SVD of $A = \begin{pmatrix} 1 & 1 \\ 0 & 0 \end{pmatrix}$ by hand. What is the rank? Identify the four fundamental subspaces (column space, null space, row space, left null space) from the SVD.
import numpy as np
A = np.array([[1, 1], [0, 0]])
U, s, Vt = np.linalg.svd(A)
print(f"σ = {s}") # [1.414..., 0.]
print(f"U =\n{U}") # [[1, 0], [0, 1]] (or sign-flipped)
print(f"V^T =\n{Vt}") # [[0.707, 0.707], [0.707, -0.707]]
print(f"Rank = {np.sum(s > 1e-10)}") # 1
# Verify: null space is V[:, 1:]
print(f"Null space basis: {Vt[1, :]}") # [0.707, -0.707] → span{(1,-1)}
C3. ⭐⭐ A $100 \times 50$ matrix $A$ has singular values that decay as $\sigma_i = 10 / i^2$.
C4. ⭐⭐⭐ Using the pseudoinverse, find the minimum-norm least-squares solution to:
$$\begin{pmatrix} 1 & 2 \\ 1 & 2 \\ 0 & 0 \end{pmatrix} x = \begin{pmatrix} 3 \\ 3 \\ 1 \end{pmatrix}$$
Explain why there's no exact solution and why the pseudoinverse gives the "best" approximate answer.
D1. ⭐ Compute $\|A\|_1$, $\|A\|_\infty$, $\|A\|_F$, and $\|A\|_2$ for:
$$A = \begin{pmatrix} 3 & -1 \\ 2 & 4 \end{pmatrix}$$
D2. ⭐⭐ Prove that $\|A\|_2 \leq \|A\|_F \leq \sqrt{r} \|A\|_2$ where $r = \text{rank}(A)$.
E1. ⭐⭐ You're solving a SLAM problem and the Jacobian $J$ has singular values:
$$\sigma = [100, 50, 20, 5, 0.01, 0.0001]$$
E2. ⭐⭐⭐ Catastrophic cancellation in action. Consider:
$$A = \begin{pmatrix} 1 & 1 \\ 1 & 1 + \epsilon \end{pmatrix}, \quad b = \begin{pmatrix} 2 \\ 2 + \epsilon \end{pmatrix}$$
import numpy as np
eps = 1e-12
A = np.array([[1.0, 1.0], [1.0, 1.0 + eps]])
b = np.array([2.0, 2.0 + eps])
# Direct solve
x = np.linalg.solve(A, b)
print(f"Solution: {x}")
print(f"True solution: [1.0, 1.0]")
print(f"Error: {np.linalg.norm(x - np.array([1.0, 1.0]))}")
print(f"Condition number: {np.linalg.cond(A):.2e}")
You'll see errors on the order of $10^{-4}$ to $10^{-2}$ — the solution is wildly inaccurate despite the true answer being the simple vector $(1, 1)^T$.
E3. ⭐⭐⭐ The L-curve in practice. For the system $Ax = b$ where:
$$A = \begin{pmatrix} 1 & 0 \\ 0 & 10^{-8} \end{pmatrix}, \quad b = \begin{pmatrix} 1 + \delta_1 \\ 10^{-8} + \delta_2 \end{pmatrix}$$
with noise $\delta_1, \delta_2 \sim \mathcal{N}(0, 10^{-4})$. True solution: $x = (1, 1)^T$.
Write Python code to: 1. Solve with Tikhonov regularization for $\lambda \in [10^{-16}, 10^{0}]$ 2. Plot the L-curve ($\log\|x_\lambda\|$ vs $\log\|Ax_\lambda - b\|$) 3. Identify the optimal $\lambda$ at the L-curve corner
import numpy as np
import matplotlib.pyplot as plt
A = np.diag([1.0, 1e-8])
x_true = np.array([1.0, 1.0])
np.random.seed(42)
noise = np.random.normal(0, 1e-4, 2)
b = A @ x_true + noise
lambdas = np.logspace(-16, 0, 200)
residuals = []
sol_norms = []
for lam in lambdas:
x_lam = np.linalg.solve(A.T @ A + lam * np.eye(2), A.T @ b)
residuals.append(np.linalg.norm(A @ x_lam - b))
sol_norms.append(np.linalg.norm(x_lam))
plt.figure(figsize=(8, 6))
plt.loglog(residuals, sol_norms, 'b-', linewidth=1.5)
plt.xlabel(r'$\|Ax_\lambda - b\|_2$')
plt.ylabel(r'$\|x_\lambda\|_2$')
plt.title('L-Curve for Tikhonov Regularization')
# Mark the corner (maximum curvature)
log_r = np.log(residuals)
log_n = np.log(sol_norms)
# Approximate curvature from discrete differences
dr = np.gradient(log_r)
dn = np.gradient(log_n)
ddr = np.gradient(dr)
ddn = np.gradient(dn)
curvature = np.abs(dr * ddn - dn * ddr) / (dr**2 + dn**2)**1.5
corner_idx = np.argmax(curvature[10:-10]) + 10 # avoid edges
plt.plot(residuals[corner_idx], sol_norms[corner_idx], 'ro', markersize=10)
plt.annotate(f'λ = {lambdas[corner_idx]:.2e}',
xy=(residuals[corner_idx], sol_norms[corner_idx]),
xytext=(10, 10), textcoords='offset points')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('l_curve.png', dpi=150)
plt.show()
# Compare solutions
for lam in [0, 1e-12, lambdas[corner_idx], 1e-2]:
if lam == 0:
x = np.linalg.solve(A.T @ A, A.T @ b)
else:
x = np.linalg.solve(A.T @ A + lam * np.eye(2), A.T @ b)
print(f"λ={lam:.1e}: x={x}, error={np.linalg.norm(x - x_true):.4e}")
The L-curve should show:
- **Top-left:** Large $\lambda$ → small solution norm but large residual (over-regularized)
- **Bottom-right:** Small $\lambda$ → small residual but large solution norm (noise amplification)
- **Corner:** The balance point — optimal regularization
F1. ⭐ Verify that $R(\pi/4) = \frac{1}{\sqrt{2}}\begin{pmatrix} 1 & -1 \\ 1 & 1 \end{pmatrix}$ satisfies all rotation matrix properties: $R^TR = I$, $\det(R) = +1$, $\kappa(R) = 1$.
F2. ⭐⭐ Using Rodrigues' rotation formula, compute the 3D rotation matrix for axis $\hat{\omega} = (0, 0, 1)^T$ (z-axis) and angle $\theta = \pi/3$ (60°). Then verify: 1. $R^TR = I_3$ 2. $\det(R) = +1$ 3. The rotation acts correctly on the vector $p = (1, 0, 0)^T$
import numpy as np
theta = np.pi / 3
omega_hat = np.array([0, 0, 1.0])
K = np.array([[0, -1, 0], [1, 0, 0], [0, 0, 0.0]]) # skew-symmetric
R = np.eye(3) + np.sin(theta) * K + (1 - np.cos(theta)) * K @ K
print(f"R^T R = I? {np.allclose(R.T @ R, np.eye(3))}") # True
print(f"det(R) = {np.linalg.det(R):.6f}") # 1.000000
print(f"R @ [1,0,0] = {R @ [1,0,0]}") # [0.5, 0.866, 0]
F3. ⭐⭐ A 2D pose-graph SLAM problem has 4 poses in a loop. The information matrix (graph Laplacian) is:
$$\Lambda = \begin{pmatrix} 2 & -1 & 0 & -1 \\ -1 & 2 & -1 & 0 \\ 0 & -1 & 2 & -1 \\ -1 & 0 & -1 & 2 \end{pmatrix}$$
F4. ⭐⭐⭐ The Jacobian of a 2D robot observing a landmark at relative position $(r, \theta)$ is:
$$J = \begin{pmatrix} \frac{\partial r}{\partial x} & \frac{\partial r}{\partial y} & \frac{\partial r}{\partial \phi} \\ \frac{\partial \theta}{\partial x} & \frac{\partial \theta}{\partial y} & \frac{\partial \theta}{\partial \phi} \end{pmatrix} = \begin{pmatrix} \cos\alpha & \sin\alpha & 0 \\ -\sin\alpha / d & \cos\alpha / d & -1 \end{pmatrix}$$
where $d$ is the distance and $\alpha$ is the bearing.
G1. ⭐⭐ Write a function classify_matrix(A, tol=1e-10) that returns a dictionary with keys: "symmetric", "pd", "psd", "indefinite", "orthogonal", "singular", "condition_number", "rank".
import numpy as np
def classify_matrix(A, tol=1e-10):
"""Classify a matrix's properties."""
n = A.shape[0]
result = {}
# Symmetry
result["symmetric"] = np.allclose(A, A.T, atol=tol)
# Orthogonality
if A.shape[0] == A.shape[1]:
result["orthogonal"] = np.allclose(A.T @ A, np.eye(n), atol=tol)
else:
result["orthogonal"] = False
# SVD for rank and condition number
U, s, Vt = np.linalg.svd(A)
result["rank"] = int(np.sum(s > tol * s[0] * max(A.shape)))
result["singular"] = (result["rank"] < min(A.shape))
result["condition_number"] = s[0] / s[-1] if s[-1] > tol else np.inf
# Definiteness (only for symmetric)
if result["symmetric"]:
eigvals = np.linalg.eigvalsh(A)
if np.all(eigvals > tol):
result["pd"] = True
result["psd"] = True
result["indefinite"] = False
elif np.all(eigvals > -tol):
result["pd"] = False
result["psd"] = True
result["indefinite"] = False
else:
result["pd"] = False
result["psd"] = False
result["indefinite"] = True
else:
result["pd"] = None
result["psd"] = None
result["indefinite"] = None
return result
# Test
A = np.array([[4, 2, 1], [2, 5, 3], [1, 3, 6]])
print(classify_matrix(A))
# {'symmetric': True, 'orthogonal': False, 'rank': 3, 'singular': False,
# 'condition_number': ~8.8, 'pd': True, 'psd': True, 'indefinite': False}
G2. ⭐⭐ Implement nearest_pd(A, epsilon=1e-6) that takes any symmetric matrix and returns the nearest PD matrix using Higham's algorithm. Verify with Cholesky.
import numpy as np
def nearest_pd(A, epsilon=1e-6):
"""Find nearest positive definite matrix to A (Higham 1988)."""
# Ensure symmetry
A_sym = (A + A.T) / 2
# Eigendecomposition
eigvals, Q = np.linalg.eigh(A_sym)
# Clip eigenvalues
eigvals_clipped = np.maximum(eigvals, epsilon)
# Reconstruct
A_pd = Q @ np.diag(eigvals_clipped) @ Q.T
# Ensure perfect symmetry (numerical safety)
A_pd = (A_pd + A_pd.T) / 2
return A_pd
# Test with the noisy covariance from B3
Sigma_hat = np.array([[4.0, 1.0, 3.1],
[1.0, 2.0, -0.5],
[3.1, -0.5, 1.0]])
# Original: Cholesky fails
try:
np.linalg.cholesky(Sigma_hat)
print("Original is PD")
except np.linalg.LinAlgError:
print("Original is NOT PD")
# Fixed: Cholesky succeeds
Sigma_fixed = nearest_pd(Sigma_hat)
L = np.linalg.cholesky(Sigma_fixed)
print(f"Fixed matrix:\n{Sigma_fixed}")
print(f"Eigenvalues: {np.linalg.eigvalsh(Sigma_fixed)}")
print(f"Cholesky succeeded, L diag: {np.diag(L)}")
G3. ⭐⭐⭐ Implement tikhonov_solve(A, b, lam) and truncated_svd_solve(A, b, k). Compare their behavior on an ill-conditioned system.
import numpy as np
import matplotlib.pyplot as plt
def tikhonov_solve(A, b, lam):
"""Tikhonov-regularized least-squares solution."""
n = A.shape[1]
return np.linalg.solve(A.T @ A + lam * np.eye(n), A.T @ b)
def truncated_svd_solve(A, b, k):
"""Truncated SVD solution keeping top k singular values."""
U, s, Vt = np.linalg.svd(A, full_matrices=False)
# Keep only top k
x = np.zeros(A.shape[1])
for i in range(k):
x += (U[:, i] @ b / s[i]) * Vt[i, :]
return x
# Create ill-conditioned problem
n = 20
np.random.seed(42)
# Singular values with sharp decay
s_true = np.logspace(0, -10, n)
U, _ = np.linalg.qr(np.random.randn(n, n))
V, _ = np.linalg.qr(np.random.randn(n, n))
A = U @ np.diag(s_true) @ V.T
x_true = np.ones(n)
b_clean = A @ x_true
noise = 1e-6 * np.random.randn(n)
b = b_clean + noise
print(f"Condition number: {np.linalg.cond(A):.2e}")
# Compare methods
x_direct = np.linalg.solve(A, b)
x_tikh = tikhonov_solve(A, b, lam=1e-8)
x_tsvd = truncated_svd_solve(A, b, k=10)
print(f"Direct solve error: {np.linalg.norm(x_direct - x_true):.4e}")
print(f"Tikhonov (λ=1e-8) error: {np.linalg.norm(x_tikh - x_true):.4e}")
print(f"Truncated SVD (k=10) error: {np.linalg.norm(x_tsvd - x_true):.4e}")
# Sweep lambda for Tikhonov
lambdas = np.logspace(-14, 2, 100)
errors = [np.linalg.norm(tikhonov_solve(A, b, lam) - x_true) for lam in lambdas]
plt.figure(figsize=(8, 5))
plt.semilogx(lambdas, errors, 'b-')
plt.xlabel('λ (regularization)')
plt.ylabel('‖x_λ - x_true‖')
plt.title('Tikhonov Regularization: Error vs λ')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.savefig('tikhonov_error.png', dpi=150)
plt.show()
**Expected output:** Direct solve has large error due to noise amplification in small singular value directions. Both Tikhonov and truncated SVD dramatically reduce error by damping or removing those directions.
G4. ⭐⭐⭐ Write a function svd_analysis(A) that prints a comprehensive report: rank, condition number, energy distribution, 4 fundamental subspaces, and a singular value bar chart.
import numpy as np
import matplotlib.pyplot as plt
def svd_analysis(A, tol=None):
"""Comprehensive SVD analysis of a matrix."""
m, n = A.shape
U, s, Vt = np.linalg.svd(A, full_matrices=True)
if tol is None:
tol = max(m, n) * s[0] * np.finfo(float).eps
r = np.sum(s > tol)
energy = np.cumsum(s**2) / np.sum(s**2)
print("=" * 60)
print(f"SVD Analysis: {m}×{n} matrix")
print("=" * 60)
print(f"\nRank: {r} (out of min(m,n) = {min(m,n)})")
print(f"Condition number: {s[0]/s[min(m,n)-1]:.4e}" if s[min(m,n)-1] > 0
else f"Condition number: ∞ (singular)")
print(f"\nSingular values:")
for i, si in enumerate(s[:min(10, len(s))]):
bar = "█" * max(1, int(40 * si / s[0]))
print(f" σ_{i+1:2d} = {si:12.6e} {bar} ({energy[i]*100:.1f}% energy)")
if len(s) > 10:
print(f" ... ({len(s) - 10} more)")
print(f"\nFour Fundamental Subspaces:")
print(f" Column space C(A): dim = {r}, basis = first {r} columns of U")
print(f" Row space C(Aᵀ): dim = {r}, basis = first {r} rows of Vᵀ")
print(f" Null space N(A): dim = {n - r}", end="")
if n - r > 0:
print(f", basis = last {n-r} rows of Vᵀ")
else:
print(" (trivial)")
print(f" Left null N(Aᵀ): dim = {m - r}", end="")
if m - r > 0:
print(f", basis = last {m-r} columns of U")
else:
print(" (trivial)")
# Plot singular values
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
ax1.bar(range(1, len(s)+1), s, color='steelblue')
ax1.set_xlabel('Index')
ax1.set_ylabel('Singular Value')
ax1.set_title('Singular Value Spectrum')
ax1.set_yscale('log')
ax2.plot(range(1, len(s)+1), energy * 100, 'ro-')
ax2.axhline(y=99, color='gray', linestyle='--', alpha=0.5, label='99% threshold')
ax2.set_xlabel('Number of Components')
ax2.set_ylabel('Cumulative Energy (%)')
ax2.set_title('Energy Distribution')
ax2.legend()
plt.tight_layout()
plt.savefig('svd_analysis.png', dpi=150)
plt.show()
return {"U": U, "s": s, "Vt": Vt, "rank": r, "condition": s[0]/max(s[-1], 1e-300)}
# Example: a SLAM-like Jacobian
np.random.seed(42)
# 10 measurements, 6 state variables, rank 5 (one unobservable direction)
A = np.random.randn(10, 5) @ np.random.randn(5, 6) # rank ≤ 5
result = svd_analysis(A)
G5. ⭐⭐ Derive the gradient of the least-squares cost $f(x) = \|Ax - b\|^2$ from first principles.
Hint: Expand the squared norm as $(Ax-b)^T(Ax-b)$, distribute, then differentiate each term with respect to $x$.
import numpy as np
A = np.random.randn(5, 3); b = np.random.randn(5)
x = np.random.randn(3)
# Analytical gradient
grad_analytical = 2 * A.T @ (A @ x - b)
# Numerical gradient (finite differences)
eps = 1e-7
grad_numerical = np.zeros(3)
for i in range(3):
x_plus = x.copy(); x_plus[i] += eps
x_minus = x.copy(); x_minus[i] -= eps
grad_numerical[i] = (np.linalg.norm(A @ x_plus - b)**2
- np.linalg.norm(A @ x_minus - b)**2) / (2*eps)
print(f"Match: {np.allclose(grad_analytical, grad_numerical)}") # True
G6. ⭐⭐ In Gauss-Newton optimization, the cost is $f(x) = \frac{1}{2}\|r(x)\|^2$ where $r(x)$ is a vector of residuals. Using the chain rule:
H1. ⭐⭐⭐ Full pipeline exercise. A SLAM system estimates 3 poses ($x, y, \theta$ each) from 4 relative measurements. The Jacobian is:
$$J = \begin{pmatrix} 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 & 0 \\ 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 & 0 \\ 0 & 0 & 0 & 0 & 1 & 0 & 0 & -1 & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & -1 \\ -1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & -1 & 0 & 0 & 0 & 0 & 0 & 1 & 0 \end{pmatrix}$$
(4 measurements: pose1→pose2 (3D), pose2→pose3 (3D), and a partial loop closure pose3→pose1 (only x, y).)
import numpy as np
# Construct J
J = np.zeros((8, 9))
# Measurement 1→2 (rows 0-2)
J[0, 0], J[0, 3] = 1, -1
J[1, 1], J[1, 4] = 1, -1
J[2, 2], J[2, 5] = 1, -1
# Measurement 2→3 (rows 3-5)
J[3, 3], J[3, 6] = 1, -1
J[4, 4], J[4, 7] = 1, -1
J[5, 5], J[5, 8] = 1, -1
# Loop closure 3→1 (rows 6-7, only x,y)
J[6, 6], J[6, 0] = 1, -1
J[7, 7], J[7, 1] = 1, -1
# 1. J^T J
H = J.T @ J
print("J^T J (information matrix):")
print(H.astype(int))
# Verify: each row sums to zero? Not exactly — it's a weighted Laplacian
# because pose 3's theta has no loop constraint
print(f"Row sums: {H.sum(axis=1)}") # theta_3 row sum != 0
# 2. Null space
U, s, Vt = np.linalg.svd(J)
print(f"\nSingular values: {np.round(s, 4)}")
null_dim = np.sum(s < 1e-10)
print(f"Null space dimension: {null_dim}")
if null_dim > 0:
null_vectors = Vt[-null_dim:]
print(f"Null space basis:\n{np.round(null_vectors, 3)}")
# Physical interpretation: the null space vectors correspond to
# global transformations that don't change any measurement.
# For full 3-constraint connections: translate + rotate the whole graph.
# With partial loop closure: only translation in x,y is unobservable.
# 3. Pin pose 1 at origin
# Remove first 3 columns (x1, y1, theta1 = 0)
J_reduced = J[:, 3:] # 8x6
b = np.array([0.5, 0.3, 0.1, -0.2, 0.4, 0.05, 0.3, 0.1])
x_reduced = np.linalg.lstsq(J_reduced, b, rcond=None)[0]
x_full = np.zeros(9)
x_full[3:] = x_reduced
print(f"\nSolution with pose 1 pinned:")
print(f"Pose 1: x={x_full[0]:.3f}, y={x_full[1]:.3f}, θ={x_full[2]:.3f}")
print(f"Pose 2: x={x_full[3]:.3f}, y={x_full[4]:.3f}, θ={x_full[5]:.3f}")
print(f"Pose 3: x={x_full[6]:.3f}, y={x_full[7]:.3f}, θ={x_full[8]:.3f}")
**Physical interpretation of null space:** Since the loop closure only constrains $x$ and $y$ (not $\theta$), the null space has dimension ≥ 1 containing a mode where all $\theta$ values shift by the same amount. The global translation is also unobservable (3 DOF). With the partial loop closure, some of this gauge freedom is broken.
H2. ⭐⭐⭐⭐ Matrix conditioning in Levenberg-Marquardt. Implement a simplified LM solver that:
Test on a simple 2D curve fitting: $y = a \cdot e^{-bx} + c$ with noisy data.
import numpy as np
import matplotlib.pyplot as plt
def levenberg_marquardt(residual_fn, jacobian_fn, x0, max_iter=50,
lam0=1e-3, tol=1e-8):
"""Simplified LM solver with conditioning diagnostics."""
x = x0.copy()
lam = lam0
n = len(x)
history = {"cost": [], "lambda": [], "kappa_H": [], "kappa_damped": [], "x": []}
for iteration in range(max_iter):
r = residual_fn(x)
J = jacobian_fn(x)
cost = 0.5 * r @ r
# Gauss-Newton Hessian
H = J.T @ J
g = J.T @ r
# Condition numbers
kappa_H = np.linalg.cond(H)
kappa_damped = np.linalg.cond(H + lam * np.eye(n))
history["cost"].append(cost)
history["lambda"].append(lam)
history["kappa_H"].append(kappa_H)
history["kappa_damped"].append(kappa_damped)
history["x"].append(x.copy())
# Solve damped system
dx = np.linalg.solve(H + lam * np.eye(n), -g)
# Evaluate new cost
x_new = x + dx
cost_new = 0.5 * residual_fn(x_new) @ residual_fn(x_new)
# Accept/reject and update lambda
rho = (cost - cost_new) / (0.5 * dx @ (lam * dx - g) + 1e-30)
if rho > 0:
x = x_new
lam = max(lam / 3, 1e-15)
if np.linalg.norm(dx) < tol:
print(f"Converged at iteration {iteration}")
break
else:
lam = min(lam * 2, 1e10)
if iteration % 5 == 0:
print(f"Iter {iteration:3d}: cost={cost:.6e}, λ={lam:.2e}, "
f"κ(H)={kappa_H:.2e}, κ(H+λI)={kappa_damped:.2e}")
return x, history
# Test: fit y = a * exp(-b*x) + c
np.random.seed(42)
x_data = np.linspace(0, 5, 50)
a_true, b_true, c_true = 3.0, 1.5, 0.5
y_data = a_true * np.exp(-b_true * x_data) + c_true + 0.1 * np.random.randn(50)
def residual(params):
a, b, c = params
return a * np.exp(-b * x_data) + c - y_data
def jacobian(params):
a, b, c = params
J = np.zeros((len(x_data), 3))
J[:, 0] = np.exp(-b * x_data) # dr/da
J[:, 1] = -a * x_data * np.exp(-b * x_data) # dr/db
J[:, 2] = np.ones(len(x_data)) # dr/dc
return J
# Start from poor initial guess
x0 = np.array([1.0, 0.5, 0.0])
x_opt, hist = levenberg_marquardt(residual, jacobian, x0)
print(f"\nOptimized: a={x_opt[0]:.4f}, b={x_opt[1]:.4f}, c={x_opt[2]:.4f}")
print(f"True: a={a_true}, b={b_true}, c={c_true}")
# Plot diagnostics
fig, axes = plt.subplots(2, 2, figsize=(12, 8))
axes[0, 0].semilogy(hist["cost"])
axes[0, 0].set_title("Cost")
axes[0, 0].set_xlabel("Iteration")
axes[0, 1].semilogy(hist["lambda"])
axes[0, 1].set_title("λ (damping)")
axes[0, 1].set_xlabel("Iteration")
axes[1, 0].semilogy(hist["kappa_H"], label="κ(H)")
axes[1, 0].semilogy(hist["kappa_damped"], label="κ(H+λI)")
axes[1, 0].set_title("Condition Number")
axes[1, 0].set_xlabel("Iteration")
axes[1, 0].legend()
axes[1, 1].scatter(x_data, y_data, s=10, alpha=0.5, label="Data")
axes[1, 1].plot(x_data, x_opt[0]*np.exp(-x_opt[1]*x_data) + x_opt[2],
'r-', linewidth=2, label="Fit")
axes[1, 1].set_title("Curve Fit Result")
axes[1, 1].legend()
plt.tight_layout()
plt.savefig('lm_diagnostics.png', dpi=150)
plt.show()
**Key observations:**
- Early iterations: large $\lambda$ → well-conditioned (gradient descent-like)
- Late iterations: small $\lambda$ → near-Newton, higher $\kappa$ but faster convergence
- $\kappa(H + \lambda I)$ is always much smaller than $\kappa(H)$
H3. ⭐⭐⭐⭐ Comprehensive matrix diagnosis. Write a "matrix doctor" function that takes any matrix $A$ and outputs:
import numpy as np
from scipy import sparse
def matrix_doctor(A, verbose=True):
"""Comprehensive matrix diagnosis with solver recommendation."""
m, n = A.shape
is_square = (m == n)
report = {}
# --- Structure ---
report["shape"] = (m, n)
report["square"] = is_square
report["symmetric"] = is_square and np.allclose(A, A.T, atol=1e-12)
report["skew_symmetric"] = is_square and np.allclose(A, -A.T, atol=1e-12)
report["diagonal"] = is_square and np.allclose(A, np.diag(np.diag(A)), atol=1e-12)
report["upper_triangular"] = np.allclose(A, np.triu(A), atol=1e-12)
report["lower_triangular"] = np.allclose(A, np.tril(A), atol=1e-12)
nnz_total = m * n
nnz_actual = np.count_nonzero(np.abs(A) > 1e-15)
report["sparsity"] = 1 - nnz_actual / nnz_total
report["sparse"] = report["sparsity"] > 0.5
if is_square:
report["orthogonal"] = np.allclose(A.T @ A, np.eye(n), atol=1e-10)
# --- SVD ---
U, s, Vt = np.linalg.svd(A, full_matrices=False)
tol = max(m, n) * s[0] * np.finfo(float).eps
rank = int(np.sum(s > tol))
kappa = s[0] / s[rank-1] if rank > 0 else np.inf
report["rank"] = rank
report["full_rank"] = (rank == min(m, n))
report["condition_number"] = kappa
report["singular_values"] = s
energy = np.cumsum(s**2) / np.sum(s**2) if np.sum(s**2) > 0 else np.zeros_like(s)
k_99 = int(np.searchsorted(energy, 0.99) + 1)
report["rank_for_99pct_energy"] = min(k_99, len(s))
# --- Definiteness ---
if report["symmetric"]:
eigvals = np.linalg.eigvalsh(A)
report["eigenvalues"] = eigvals
if np.all(eigvals > tol):
report["definiteness"] = "positive definite"
elif np.all(eigvals > -tol):
report["definiteness"] = "positive semidefinite"
elif np.all(eigvals < -tol):
report["definiteness"] = "negative definite"
elif np.all(eigvals < tol):
report["definiteness"] = "negative semidefinite"
else:
report["definiteness"] = "indefinite"
# --- Warnings ---
warnings = []
if kappa > 1e12:
warnings.append(f"SEVERELY ILL-CONDITIONED (κ={kappa:.1e})")
elif kappa > 1e6:
warnings.append(f"ILL-CONDITIONED (κ={kappa:.1e})")
elif kappa > 1e3:
warnings.append(f"Moderately conditioned (κ={kappa:.1e})")
if not report["full_rank"]:
warnings.append(f"RANK DEFICIENT: rank={rank}, expected={min(m,n)}")
if report.get("definiteness") == "indefinite":
warnings.append("INDEFINITE: not a valid Hessian at a minimum")
report["warnings"] = warnings
# --- Solver Recommendation ---
if not is_square or m != n:
if rank < min(m, n):
solver = "SVD / pseudoinverse (rank-deficient)"
elif m > n:
solver = "QR factorization (overdetermined least-squares)"
else:
solver = "SVD (underdetermined, minimum-norm)"
elif report.get("definiteness") == "positive definite":
if report["sparse"] and n > 100:
solver = "Sparse Cholesky (CHOLMOD) with AMD ordering"
else:
solver = "Dense Cholesky (fastest for dense PD)"
elif report.get("definiteness") == "positive semidefinite":
solver = "LDLᵀ or regularized Cholesky (H + εI)"
elif report.get("definiteness") == "indefinite":
solver = "Modified Cholesky or LDLᵀ with pivoting"
elif report["symmetric"]:
solver = "LDLᵀ with pivoting"
else:
solver = "LU with partial pivoting"
report["recommended_solver"] = solver
# --- Print ---
if verbose:
print("=" * 60)
print("MATRIX DOCTOR REPORT")
print("=" * 60)
props = []
if report["symmetric"]: props.append("Symmetric")
if report["skew_symmetric"]: props.append("Skew-symmetric")
if report.get("orthogonal"): props.append("Orthogonal")
if report["diagonal"]: props.append("Diagonal")
if report["upper_triangular"]: props.append("Upper triangular")
if report["lower_triangular"]: props.append("Lower triangular")
if report["sparse"]: props.append(f"Sparse ({report['sparsity']*100:.0f}%)")
print(f"Shape: {m}×{n}")
print(f"Properties: {', '.join(props) if props else 'None special'}")
if "definiteness" in report:
print(f"Definiteness: {report['definiteness']}")
print(f"Rank: {rank}/{min(m,n)} | κ = {kappa:.2e}")
print(f"99% energy at rank {report['rank_for_99pct_energy']}")
if warnings:
print(f"\n⚠ WARNINGS:")
for w in warnings:
print(f" • {w}")
else:
print("\n✓ No issues detected")
print(f"\n→ Recommended solver: {solver}")
print("=" * 60)
return report
# Demo
A1 = np.array([[4, 2, 1], [2, 5, 3], [1, 3, 6]])
matrix_doctor(A1)
print()
# Ill-conditioned Jacobian
J = np.random.randn(10, 5) * np.array([100, 10, 1, 0.01, 0.0001])
matrix_doctor(J)
| Exercise | Key Result |
|---|---|
| A1 | $A$: symmetric, $B$: skew-symmetric, $C$: orthogonal |
| A2 | PD when $\sigma_{\max}(J) < \sqrt{10}$ |
| A3 | $\kappa = 8000$, QR preferred over normal equations |
| B1 | $P$ is PD, $Q$ is indefinite |
| B3 | $\hat{\Sigma}$ not PD, fix via eigenvalue clipping |
| B4 | Schur complement is $6c \times 6c$, vastly smaller than full system |
| C1 | $\sigma_1 = 3$, $\sigma_2 = 2$, sign absorbed into $U$ |
| C2 | Rank 1, null space = $\text{span}\{(1,-1)^T\}$ |
| C3 | $\kappa = 2500$, rank 3 for 99% energy |
| D1 | $\|A\|_1 = 5$, $\|A\|_\infty = 6$, $\|A\|_F \approx 5.48$, $\|A\|_2 \approx 4.52$ |
| E1 | $\kappa(J) = 10^6$, numerical rank = 4, null space = gauge freedom |
| E2 | True $x = (1,1)$, $\kappa \approx 4/\epsilon$, fails for $\epsilon = 10^{-12}$ |
| F1 | All rotation properties verified |
| F2 | Eigenvalues $\{0, 2, 2, 4\}$, zero = gauge freedom |
Next: Implement these exercises in Python → code/matrix_mastery/