← Back to Optimization

04 — Constrained Optimization

Goal: Add constraints to the picture — equality and inequality constraints, Lagrange multipliers, KKT conditions, duality, and practical algorithms.


1. Problem Formulation

Standard Form

$$\min_{x \in \mathbb{R}^n} f(x)$$ $$\text{subject to} \quad g_i(x) \leq 0, \quad i = 1, \ldots, m$$ $$\phantom{\text{subject to}} \quad h_j(x) = 0, \quad j = 1, \ldots, p$$

  • $f$ — objective function
  • $g_i$ — inequality constraints (≤ 0 by convention)
  • $h_j$ — equality constraints

Feasible set: $\mathcal{F} = \{x \mid g_i(x) \leq 0, \; h_j(x) = 0 \; \forall i, j\}$

Examples in Robotics

Problem Objective Constraints
Path planning Minimize path length Obstacle avoidance $g_i(x) \leq 0$, kinematics $h_j(x) = 0$
MPC Minimize tracking error Velocity/acceleration limits, actuator bounds
Calibration Minimize residual Known geometric constraints (e.g., unit quaternion)
Robot arm IK Minimize joint motion Joint angle limits, collision avoidance

2. Lagrange Multipliers (Equality Constraints Only)

Setup

$$\min_x f(x) \quad \text{s.t.} \quad h(x) = 0$$

Geometric Insight

At the constrained optimum $x^*$: - $\nabla f(x^*)$ must be parallel to $\nabla h(x^*)$ - Otherwise, you could move along the constraint surface and decrease $f$

So there exists a scalar $\lambda^*$ such that:

$$\nabla f(x^*) + \lambda^* \nabla h(x^*) = 0$$

The Lagrangian

$$\mathcal{L}(x, \lambda) = f(x) + \lambda^T h(x)$$

The necessary conditions for optimality:

$$\nabla_x \mathcal{L} = \nabla f + \sum_j \lambda_j \nabla h_j = 0$$ $$\nabla_\lambda \mathcal{L} = h(x) = 0$$

These are $n + p$ equations in $n + p$ unknowns ($x$ and $\lambda$).

Example: Minimize on a Sphere

$$\min_{x} x_1 + x_2 + x_3 \quad \text{s.t.} \quad x_1^2 + x_2^2 + x_3^2 = 1$$

Lagrangian: $\mathcal{L} = x_1 + x_2 + x_3 + \lambda(x_1^2 + x_2^2 + x_3^2 - 1)$

$$\frac{\partial \mathcal{L}}{\partial x_i} = 1 + 2\lambda x_i = 0 \implies x_i = -\frac{1}{2\lambda}$$

From constraint: $3 \cdot \frac{1}{4\lambda^2} = 1 \implies \lambda = \pm\frac{\sqrt{3}}{2}$

Minimum at $x^* = (-\frac{1}{\sqrt{3}}, -\frac{1}{\sqrt{3}}, -\frac{1}{\sqrt{3}})$.


3. KKT Conditions (Inequality + Equality Constraints)

The Karush-Kuhn-Tucker conditions are the necessary conditions for constrained optimality (and sufficient when the problem is convex).

The Conditions

Given the Lagrangian:

$$\mathcal{L}(x, \lambda, \mu) = f(x) + \sum_j \lambda_j h_j(x) + \sum_i \mu_i g_i(x)$$

At the optimum $(x^*, \lambda^*, \mu^*)$:

Condition Formula Meaning
Stationarity $\nabla_x \mathcal{L} = 0$ Gradient balance
Primal feasibility $g_i(x^*) \leq 0$, $h_j(x^*) = 0$ Solution is feasible
Dual feasibility $\mu_i^* \geq 0$ Multipliers for inequalities are non-negative
Complementary slackness $\mu_i^* g_i(x^*) = 0 \; \forall i$ Either constraint inactive or multiplier zero

Complementary Slackness — The Key Insight

For each inequality constraint: - Active ($g_i(x^*) = 0$): constraint is "tight", $\mu_i^* \geq 0$ (constraint is pushing) - Inactive ($g_i(x^*) < 0$): constraint doesn't matter, so $\mu_i^* = 0$

You never need to check inactive constraints — their multipliers are simply zero.

Interpretation of Multipliers

$\mu_i^*$ tells you: "How much would the objective improve if constraint $i$ were relaxed by $\epsilon$?"

$$\frac{\partial f^*}{\partial b_i} = -\mu_i^*$$

where $g_i(x) \leq b_i$ and $b_i = 0$ at the original problem.

OKS relevance: In MPC, the multiplier on the velocity constraint tells you how much the tracking error would decrease if the robot could go faster. Active constraints at their limits → the robot is saturating.


4. Duality

The Lagrangian Dual Function

$$q(\lambda, \mu) = \min_x \mathcal{L}(x, \lambda, \mu)$$

For any $\lambda$ and $\mu \geq 0$, $q(\lambda, \mu) \leq f^*$ (weak duality).

The Dual Problem

$$\max_{\lambda, \mu} q(\lambda, \mu) \quad \text{s.t.} \quad \mu \geq 0$$

Property Primal Dual
Direction Minimize Maximize
Variables $x$ (original) $\lambda, \mu$ (multipliers)
Constraints $g_i \leq 0, h_j = 0$ $\mu \geq 0$
Bound Upper bound on $f^*$ Lower bound on $f^*$

Strong Duality

For convex problems satisfying a constraint qualification (e.g., Slater's condition: there exists a strictly feasible point):

$$d^* = p^* \quad \text{(zero duality gap)}$$

This means: 1. Solving the dual gives the exact primal optimum 2. KKT conditions are both necessary and sufficient 3. We can solve whichever (primal or dual) is easier


5. Penalty and Barrier Methods

Penalty Method (Equality Constraints)

Replace $\min f(x)$ s.t. $h(x) = 0$ with:

$$\min_x f(x) + \frac{\rho}{2} \|h(x)\|^2$$

As $\rho \to \infty$, the solution approaches the constrained optimum.

Problem: Large $\rho$ → ill-conditioned Hessian → numerical issues.

Log-Barrier Method (Inequality Constraints)

Replace $\min f(x)$ s.t. $g_i(x) \leq 0$ with:

$$\min_x f(x) - \frac{1}{t} \sum_i \log(-g_i(x))$$

The barrier $-\log(-g_i)$ approaches $+\infty$ as $g_i \to 0^-$, keeping $x$ strictly inside the feasible region.

As $t \to \infty$, the barrier vanishes and the solution approaches the constrained optimum.

This is the basis for interior-point methods (Chapter 05).

def barrier_method(f, g_list, x0, t=1.0, mu=10.0, tol=1e-6):
    """
    f: objective, g_list: list of g_i(x) <= 0 constraints
    """
    x = x0.copy()
    m = len(g_list)
    while m / t > tol:
        # Minimize: t * f(x) + barrier
        def phi(x):
            barrier = -sum(np.log(-g(x)) for g in g_list)
            return t * f(x) + barrier
        x = minimize_unconstrained(phi, x)  # Use Newton/BFGS
        t *= mu
    return x

6. Sequential Quadratic Programming (SQP)

Idea

Generalize Newton's method to constrained problems by solving a QP subproblem at each step:

$$\min_{\Delta x} \nabla f_k^T \Delta x + \frac{1}{2} \Delta x^T H_k \Delta x$$ $$\text{s.t.} \quad g_i(x_k) + \nabla g_i(x_k)^T \Delta x \leq 0$$ $$\phantom{\text{s.t.}} \quad h_j(x_k) + \nabla h_j(x_k)^T \Delta x = 0$$

This linearizes the constraints and uses a quadratic model of the objective.

Algorithm

1. At x_k, form QP subproblem (linearize constraints, quadratic objective)
2. Solve QP → get step Δx and multiplier estimates λ, μ
3. Line search or trust region on a merit function
4. Update x_{k+1} = x_k + α Δx
5. Update Hessian estimate (BFGS on Lagrangian Hessian)

SQP is the workhorse for medium-sized NLPs. It's what SNOPT, IPOPT (in SQP mode), and many MPC solvers use.


7. Augmented Lagrangian Method (ALM)

Combine Penalty + Lagrange Multipliers

$$\mathcal{L}_A(x, \lambda, \rho) = f(x) + \lambda^T h(x) + \frac{\rho}{2}\|h(x)\|^2$$

Algorithm

1. Minimize L_A(x, λ_k, ρ_k) over x (unconstrained subproblem)
2. Update multipliers: λ_{k+1} = λ_k + ρ_k h(x_{k+1})
3. Optionally increase ρ_k
4. Repeat until convergence

Advantage over pure penalty: The multiplier estimate $\lambda_k$ means you don't need $\rho \to \infty$ — finite $\rho$ suffices, avoiding ill-conditioning.


8. Solver Landscape

Solver Method Best for Language
IPOPT Interior-point Large NLP C++ / Python (via pyomo, casadi)
SNOPT SQP Medium NLP, sparse Fortran / MATLAB
OSQP ADMM Convex QP C / Python
ECOS Interior-point Convex SOCP C / Python
qpOASES Active-set QP Real-time MPC C++
FORCES Code-gen QP/NLP Embedded MPC C
scipy.optimize.minimize SQP/trust-constr General Python Python

OKS relevance: - MPC uses QP solvers (OSQP, qpOASES) at the inner loop - Trajectory optimization uses NLP solvers (IPOPT, SNOPT) - Ceres doesn't handle general constraints (it's NLLS only), but you can add bounds via SetParameterLowerBound/SetParameterUpperBound


9. Constrained Optimization in Ceres

Ceres is primarily for unconstrained/bound-constrained least squares, but you can handle constraints:

Parameter Bounds

problem.SetParameterLowerBound(params, 0, -M_PI);   // θ ≥ -π
problem.SetParameterUpperBound(params, 0,  M_PI);   // θ ≤ π

Local Parameterizations (Manifold Constraints)

For unit quaternions ($\|q\| = 1$), don't add a constraint — use a manifold:

// Ceres 2.x
problem.SetManifold(quaternion, new ceres::EigenQuaternionManifold);

This tells Ceres to update the quaternion on the 3-sphere rather than in $\mathbb{R}^4$.

Soft Constraints (as Cost Terms)

Turn a constraint into a penalty:

// Soft constraint: x[0] ≈ target_value
struct SoftConstraint {
    template <typename T>
    bool operator()(const T* x, T* residual) const {
        residual[0] = T(weight) * (x[0] - T(target));
        return true;
    }
    double weight, target;
};

Summary: What to Carry Forward

Concept Where it's used next
KKT conditions 05 (optimality in convex programs, interior-point methods)
Duality 05 (LP/QP duals, SDP)
Log-barrier 05 (interior-point algorithm)
SQP 05 (QP subproblems), MPC formulation in control
Penalty methods 06 (graph optimization with soft constraints)
Ceres bounds + manifolds 06 (rotation manifolds in SLAM)