Goal: Add constraints to the picture — equality and inequality constraints, Lagrange multipliers, KKT conditions, duality, and practical algorithms.
$$\min_{x \in \mathbb{R}^n} f(x)$$ $$\text{subject to} \quad g_i(x) \leq 0, \quad i = 1, \ldots, m$$ $$\phantom{\text{subject to}} \quad h_j(x) = 0, \quad j = 1, \ldots, p$$
Feasible set: $\mathcal{F} = \{x \mid g_i(x) \leq 0, \; h_j(x) = 0 \; \forall i, j\}$
| Problem | Objective | Constraints |
|---|---|---|
| Path planning | Minimize path length | Obstacle avoidance $g_i(x) \leq 0$, kinematics $h_j(x) = 0$ |
| MPC | Minimize tracking error | Velocity/acceleration limits, actuator bounds |
| Calibration | Minimize residual | Known geometric constraints (e.g., unit quaternion) |
| Robot arm IK | Minimize joint motion | Joint angle limits, collision avoidance |
$$\min_x f(x) \quad \text{s.t.} \quad h(x) = 0$$
At the constrained optimum $x^*$: - $\nabla f(x^*)$ must be parallel to $\nabla h(x^*)$ - Otherwise, you could move along the constraint surface and decrease $f$
So there exists a scalar $\lambda^*$ such that:
$$\nabla f(x^*) + \lambda^* \nabla h(x^*) = 0$$
$$\mathcal{L}(x, \lambda) = f(x) + \lambda^T h(x)$$
The necessary conditions for optimality:
$$\nabla_x \mathcal{L} = \nabla f + \sum_j \lambda_j \nabla h_j = 0$$ $$\nabla_\lambda \mathcal{L} = h(x) = 0$$
These are $n + p$ equations in $n + p$ unknowns ($x$ and $\lambda$).
$$\min_{x} x_1 + x_2 + x_3 \quad \text{s.t.} \quad x_1^2 + x_2^2 + x_3^2 = 1$$
Lagrangian: $\mathcal{L} = x_1 + x_2 + x_3 + \lambda(x_1^2 + x_2^2 + x_3^2 - 1)$
$$\frac{\partial \mathcal{L}}{\partial x_i} = 1 + 2\lambda x_i = 0 \implies x_i = -\frac{1}{2\lambda}$$
From constraint: $3 \cdot \frac{1}{4\lambda^2} = 1 \implies \lambda = \pm\frac{\sqrt{3}}{2}$
Minimum at $x^* = (-\frac{1}{\sqrt{3}}, -\frac{1}{\sqrt{3}}, -\frac{1}{\sqrt{3}})$.
The Karush-Kuhn-Tucker conditions are the necessary conditions for constrained optimality (and sufficient when the problem is convex).
Given the Lagrangian:
$$\mathcal{L}(x, \lambda, \mu) = f(x) + \sum_j \lambda_j h_j(x) + \sum_i \mu_i g_i(x)$$
At the optimum $(x^*, \lambda^*, \mu^*)$:
| Condition | Formula | Meaning |
|---|---|---|
| Stationarity | $\nabla_x \mathcal{L} = 0$ | Gradient balance |
| Primal feasibility | $g_i(x^*) \leq 0$, $h_j(x^*) = 0$ | Solution is feasible |
| Dual feasibility | $\mu_i^* \geq 0$ | Multipliers for inequalities are non-negative |
| Complementary slackness | $\mu_i^* g_i(x^*) = 0 \; \forall i$ | Either constraint inactive or multiplier zero |
For each inequality constraint: - Active ($g_i(x^*) = 0$): constraint is "tight", $\mu_i^* \geq 0$ (constraint is pushing) - Inactive ($g_i(x^*) < 0$): constraint doesn't matter, so $\mu_i^* = 0$
You never need to check inactive constraints — their multipliers are simply zero.
$\mu_i^*$ tells you: "How much would the objective improve if constraint $i$ were relaxed by $\epsilon$?"
$$\frac{\partial f^*}{\partial b_i} = -\mu_i^*$$
where $g_i(x) \leq b_i$ and $b_i = 0$ at the original problem.
OKS relevance: In MPC, the multiplier on the velocity constraint tells you how much the tracking error would decrease if the robot could go faster. Active constraints at their limits → the robot is saturating.
$$q(\lambda, \mu) = \min_x \mathcal{L}(x, \lambda, \mu)$$
For any $\lambda$ and $\mu \geq 0$, $q(\lambda, \mu) \leq f^*$ (weak duality).
$$\max_{\lambda, \mu} q(\lambda, \mu) \quad \text{s.t.} \quad \mu \geq 0$$
| Property | Primal | Dual |
|---|---|---|
| Direction | Minimize | Maximize |
| Variables | $x$ (original) | $\lambda, \mu$ (multipliers) |
| Constraints | $g_i \leq 0, h_j = 0$ | $\mu \geq 0$ |
| Bound | Upper bound on $f^*$ | Lower bound on $f^*$ |
For convex problems satisfying a constraint qualification (e.g., Slater's condition: there exists a strictly feasible point):
$$d^* = p^* \quad \text{(zero duality gap)}$$
This means: 1. Solving the dual gives the exact primal optimum 2. KKT conditions are both necessary and sufficient 3. We can solve whichever (primal or dual) is easier
Replace $\min f(x)$ s.t. $h(x) = 0$ with:
$$\min_x f(x) + \frac{\rho}{2} \|h(x)\|^2$$
As $\rho \to \infty$, the solution approaches the constrained optimum.
Problem: Large $\rho$ → ill-conditioned Hessian → numerical issues.
Replace $\min f(x)$ s.t. $g_i(x) \leq 0$ with:
$$\min_x f(x) - \frac{1}{t} \sum_i \log(-g_i(x))$$
The barrier $-\log(-g_i)$ approaches $+\infty$ as $g_i \to 0^-$, keeping $x$ strictly inside the feasible region.
As $t \to \infty$, the barrier vanishes and the solution approaches the constrained optimum.
This is the basis for interior-point methods (Chapter 05).
def barrier_method(f, g_list, x0, t=1.0, mu=10.0, tol=1e-6):
"""
f: objective, g_list: list of g_i(x) <= 0 constraints
"""
x = x0.copy()
m = len(g_list)
while m / t > tol:
# Minimize: t * f(x) + barrier
def phi(x):
barrier = -sum(np.log(-g(x)) for g in g_list)
return t * f(x) + barrier
x = minimize_unconstrained(phi, x) # Use Newton/BFGS
t *= mu
return x
Generalize Newton's method to constrained problems by solving a QP subproblem at each step:
$$\min_{\Delta x} \nabla f_k^T \Delta x + \frac{1}{2} \Delta x^T H_k \Delta x$$ $$\text{s.t.} \quad g_i(x_k) + \nabla g_i(x_k)^T \Delta x \leq 0$$ $$\phantom{\text{s.t.}} \quad h_j(x_k) + \nabla h_j(x_k)^T \Delta x = 0$$
This linearizes the constraints and uses a quadratic model of the objective.
1. At x_k, form QP subproblem (linearize constraints, quadratic objective)
2. Solve QP → get step Δx and multiplier estimates λ, μ
3. Line search or trust region on a merit function
4. Update x_{k+1} = x_k + α Δx
5. Update Hessian estimate (BFGS on Lagrangian Hessian)
SQP is the workhorse for medium-sized NLPs. It's what SNOPT, IPOPT (in SQP mode), and many MPC solvers use.
$$\mathcal{L}_A(x, \lambda, \rho) = f(x) + \lambda^T h(x) + \frac{\rho}{2}\|h(x)\|^2$$
1. Minimize L_A(x, λ_k, ρ_k) over x (unconstrained subproblem)
2. Update multipliers: λ_{k+1} = λ_k + ρ_k h(x_{k+1})
3. Optionally increase ρ_k
4. Repeat until convergence
Advantage over pure penalty: The multiplier estimate $\lambda_k$ means you don't need $\rho \to \infty$ — finite $\rho$ suffices, avoiding ill-conditioning.
| Solver | Method | Best for | Language |
|---|---|---|---|
| IPOPT | Interior-point | Large NLP | C++ / Python (via pyomo, casadi) |
| SNOPT | SQP | Medium NLP, sparse | Fortran / MATLAB |
| OSQP | ADMM | Convex QP | C / Python |
| ECOS | Interior-point | Convex SOCP | C / Python |
| qpOASES | Active-set QP | Real-time MPC | C++ |
| FORCES | Code-gen QP/NLP | Embedded MPC | C |
scipy.optimize.minimize |
SQP/trust-constr | General Python | Python |
OKS relevance:
- MPC uses QP solvers (OSQP, qpOASES) at the inner loop
- Trajectory optimization uses NLP solvers (IPOPT, SNOPT)
- Ceres doesn't handle general constraints (it's NLLS only), but you can add bounds via SetParameterLowerBound/SetParameterUpperBound
Ceres is primarily for unconstrained/bound-constrained least squares, but you can handle constraints:
problem.SetParameterLowerBound(params, 0, -M_PI); // θ ≥ -π
problem.SetParameterUpperBound(params, 0, M_PI); // θ ≤ π
For unit quaternions ($\|q\| = 1$), don't add a constraint — use a manifold:
// Ceres 2.x
problem.SetManifold(quaternion, new ceres::EigenQuaternionManifold);
This tells Ceres to update the quaternion on the 3-sphere rather than in $\mathbb{R}^4$.
Turn a constraint into a penalty:
// Soft constraint: x[0] ≈ target_value
struct SoftConstraint {
template <typename T>
bool operator()(const T* x, T* residual) const {
residual[0] = T(weight) * (x[0] - T(target));
return true;
}
double weight, target;
};
| Concept | Where it's used next |
|---|---|
| KKT conditions | 05 (optimality in convex programs, interior-point methods) |
| Duality | 05 (LP/QP duals, SDP) |
| Log-barrier | 05 (interior-point algorithm) |
| SQP | 05 (QP subproblems), MPC formulation in control |
| Penalty methods | 06 (graph optimization with soft constraints) |
| Ceres bounds + manifolds | 06 (rotation manifolds in SLAM) |