Phase II — Unconstrained Optimization | Week 4 | 2.5 hours Mastering the Swiss Army knife of scientific optimization in Python.
scipy.optimize is the default Python solver for quick prototyping and offline analysis at Rapyuta. When analyzing OKS logs, you'll minimize residuals with least_squares, calibrate sensor models with minimize, and fit distributions with curve_fit. For production, Ceres/OSQP are preferred — but scipy is where you prototype, validate, and debug. Knowing every method and when to use each saves hours of trial and error.
minimize Interfacescipy.optimize.minimize(fun, x0, method=..., jac=..., hess=...,
bounds=..., constraints=..., callback=..., options=...)
Returns OptimizeResult with .x, .fun, .success, .nfev, .nit, .jac, .hess_inv.
| Method | Order | Gradient? | Hessian? | Bounds? | Best for |
|---|---|---|---|---|---|
Nelder-Mead |
0 | No | No | No | Small $n$, noisy, no derivatives |
Powell |
0 | No | No | No | Moderate $n$, no derivatives |
CG |
1 | Yes | No | No | Large $n$, smooth |
BFGS |
quasi-2 | Yes | No | No | General purpose, $n < 1000$ |
L-BFGS-B |
quasi-2 | Yes | No | Yes | Large $n$ with bounds |
TNC |
quasi-2 | Yes | No | Yes | Large $n$ with bounds |
Newton-CG |
2 | Yes | Yes | No | Accurate Hessian available |
trust-constr |
2 | Yes | Optional | Yes | Constraints + bounds |
trust-ncg |
2 | Yes | Yes | No | Sparse Hessian, positive definite |
trust-krylov |
2 | Yes | Yes | No | Very large $n$ |
dogleg |
2 | Yes | Yes | No | Small $n$, positive definite Hessian |
Decision tree:
1. Can you compute gradients? → No: Nelder-Mead (small $n$) or Powell
2. Can you compute Hessians? → No: BFGS or L-BFGS-B
3. Need bounds? → L-BFGS-B or TNC
4. Need constraints? → trust-constr or SLSQP
5. Very large $n$ ($>10^4$)? → L-BFGS-B (memory-limited)
6. Least-squares structure? → Use least_squares instead
Three options:
1. Analytic Jacobian: jac=my_gradient_function — fastest, most accurate
2. Finite differences: jac='2-point' or '3-point' — automatic, $O(n)$ extra evaluations
3. Automatic differentiation: Use JAX's jax.grad then pass to scipy
Hessian options: hess=callable, hessp=callable (Hessian-vector product), or BFGS approximation.
def callback(xk):
"""Called at each iteration with current x."""
print(f" x = {xk}, f(x) = {fun(xk)}")
Useful for: logging convergence, implementing early stopping, saving intermediate results.
scipy.optimize.least_squares(fun, x0, method='trf', jac=..., bounds=...)
Solves: $\min \sum r_i(x)^2$ — exploits the least-squares structure (Gauss-Newton internally).
Methods:
- 'trf' (Trust Region Reflective) — handles bounds, default
- 'dogbox' — handles rectangular bounds efficiently
- 'lm' — Levenberg-Marquardt, no bounds but fastest
Week 5 goes deep on this. Today: just know when to use least_squares vs minimize.
Nelder-Mead for $n > 20$ — scales terriblysuccess flag — always check result.successL-BFGS-B without bounds when needed# See: code/week04/scipy_optimize_deep_dive.py
Run all applicable scipy methods on the Rosenbrock function. Compare iterations, function evaluations, and time.
Implement a callback that logs $f(x_k)$, $\|x_k - x^*\|$, and $\|\nabla f(x_k)\|$ at each iteration. Plot convergence curves for BFGS vs CG vs Newton-CG.
Solve a problem where variables have different scales ($x_1 \in [0, 1]$, $x_2 \in [0, 10000]$). Show that without scaling, BFGS converges slowly. With proper scaling, convergence is fast.
Compare jac=None (finite differences), jac='3-point', and jac=analytic for Rosenbrock. Measure total function evaluations and wall time.
Wrap your Week 3 BFGS implementation in a scipy-compatible interface. Compare with scipy's built-in BFGS on the same problem.
Practical: estimate wheel radius from encoder ticks and measured distances. Use least_squares for best results. Compare with minimize on the same problem.
BFGS and L-BFGS-B? When would you choose each?Nelder-Mead is derivative-free. Why is it still useful despite being slow?OptimizeResult.nfev tell you that .nit doesn't?result.success == False, what should you do? (List 3 concrete steps.)least_squares instead of minimize? What's the advantage?trust-constr handle both equality and inequality constraints?m parameter (history size) affect it?minimize default to BFGS when gradients are provided?least_squares is better than minimizeL-BFGS-B is the default workhorse — bounded, memory-efficient, gradient-based.result.success — silent failures are the most dangerous.least_squares exploits problem structure — use it when you have residuals.