Phase VI — Robot Learning: RL, Diffusion & Data | Week 12 | 2.5 hours "The way you represent actions determines what your policy can learn. Get this wrong and no architecture will save you."
| Representation | Example | Pros | Cons |
|---|---|---|---|
| Joint positions (absolute) | $q = [q_1, \ldots, q_7]$ | Precise, reproducible | Task-specific, not transferable |
| Joint velocities | $\dot{q} = [\dot{q}_1, \ldots, \dot{q}_7]$ | Smooth motion | Drift over time |
| Joint position deltas | $\Delta q = q_{t+1} - q_t$ | Small values, stable | Still joint-specific |
| End-effector pose (abs) | $[x,y,z,\text{quat}]$ | Task-relevant | Needs IK, singularities |
| End-effector delta | $[\Delta x, \Delta y, \Delta z, \Delta\text{rot}]$ | Transferable | Cumulative error |
| Discrete tokens | bin index: $[128, 64, 200]$ | Language model compatible | Resolution limited |
Absolute: "Move to position (0.5, 0.3, 0.1)"
+ Repeatable trajectories
+ No drift
- Must learn full pose from scratch
- Distribution differs per task/setup
Delta: "Move +0.01 in x, -0.02 in y"
+ Small values → easier to learn
+ Transferable across setups
- Error accumulates over time
- Sensitive to control frequency
In practice: delta end-effector actions with a compliance controller are the most common choice for manipulation VLAs.
End-Effector space: Joint space:
"Move gripper to (x,y,z)" "Set joints to (q1,...,q7)"
│ │
▼ ▼
Inverse Kinematics (IK) Direct motor commands
│ │
▼ ▼
Joint commands Motor execution
| Factor | End-Effector | Joint |
|---|---|---|
| Transferability | High (across robot morphologies) | Low (robot-specific) |
| Precision | Lower (IK errors) | Higher (direct control) |
| VLA compatibility | Better (observation-aligned) | Harder (abstract) |
| Action dimension | 6-7 (pose + gripper) | 6-12+ (per joint) |
Representing 3D rotations is surprisingly tricky:
| Representation | Dim | Continuity | Used By |
|---|---|---|---|
| Euler angles | 3 | Discontinuous (gimbal lock) | Avoid for learning |
| Quaternion | 4 | Double cover (q = -q) | RT-1, Octo |
| Axis-angle | 3 | Singularity at 0 | Some, with care |
| Rotation matrix | 9 | Redundant but continuous | Preferred for learning |
| 6D representation | 6 | Continuous, minimal+ | Zhou et al. (2019) |
The 6D rotation from Zhou et al. (2019) extracts the first two columns of the rotation matrix, then reconstructs the third via cross product:
$$R = [\mathbf{a}_1, \mathbf{a}_2, \mathbf{a}_1 \times \mathbf{a}_2], \quad \text{after Gram-Schmidt}$$
| Method | Description | Trade-offs |
|---|---|---|
| Continuous | Gripper width: 0.0 (closed) to 1.0 (open) | Smooth but mode-averaging |
| Binary | 0 (close) or 1 (open) | Simple, natural for pick-place |
| Velocity | Open/close speed | Responsive but drifts |
| Frequency | Latency Budget | Suitable For |
|---|---|---|
| 50 Hz | 20ms | Low-level joint control |
| 10 Hz | 100ms | End-effector delta control |
| 5 Hz | 200ms | VLA inference (current limit) |
| 2 Hz | 500ms | High-level planning |
Action chunking mitigates low frequency: predict 10 actions at 5 Hz inference = 50 Hz effective control.
import numpy as np
import torch
class ActionConverter:
"""Convert between action representations."""
@staticmethod
def absolute_to_delta(actions):
"""Convert absolute positions to deltas."""
deltas = np.diff(actions, axis=0)
return deltas
@staticmethod
def delta_to_absolute(deltas, start_pos):
"""Convert deltas back to absolute positions."""
positions = [start_pos]
for d in deltas:
positions.append(positions[-1] + d)
return np.array(positions)
@staticmethod
def euler_to_rotation_matrix(euler):
"""Convert euler angles (roll, pitch, yaw) to rotation matrix."""
r, p, y = euler
Rx = np.array([[1,0,0],[0,np.cos(r),-np.sin(r)],[0,np.sin(r),np.cos(r)]])
Ry = np.array([[np.cos(p),0,np.sin(p)],[0,1,0],[-np.sin(p),0,np.cos(p)]])
Rz = np.array([[np.cos(y),-np.sin(y),0],[np.sin(y),np.cos(y),0],[0,0,1]])
return Rz @ Ry @ Rx
@staticmethod
def rotation_matrix_to_6d(R):
"""Extract 6D continuous rotation representation."""
return R[:, :2].T.flatten() # First two columns, flattened
@staticmethod
def sixd_to_rotation_matrix(sixd):
"""Reconstruct rotation matrix from 6D representation."""
a1 = sixd[:3]
a2 = sixd[3:]
# Gram-Schmidt
b1 = a1 / np.linalg.norm(a1)
b2 = a2 - np.dot(b1, a2) * b1
b2 = b2 / np.linalg.norm(b2)
b3 = np.cross(b1, b2)
return np.column_stack([b1, b2, b3])
# --- Experiment: delta accumulation error ---
def simulate_delta_drift(n_steps=200, noise_std=0.001):
"""Show how small errors in delta actions accumulate."""
true_trajectory = np.sin(np.linspace(0, 4*np.pi, n_steps)).reshape(-1, 1)
true_trajectory = np.hstack([true_trajectory, np.cos(np.linspace(0, 4*np.pi, n_steps)).reshape(-1, 1)])
# Perfect deltas
deltas = np.diff(true_trajectory, axis=0)
# Add noise to deltas
noisy_deltas = deltas + np.random.randn(*deltas.shape) * noise_std
# Reconstruct
noisy_trajectory = ActionConverter.delta_to_absolute(noisy_deltas, true_trajectory[0])
# Measure drift
errors = np.linalg.norm(true_trajectory - noisy_trajectory, axis=1)
print(f"Final drift: {errors[-1]:.4f}")
print(f"Max drift: {errors.max():.4f}")
return true_trajectory, noisy_trajectory, errors
true_traj, noisy_traj, errors = simulate_delta_drift()
class ActionNormalizer:
"""Normalize actions to [-1, 1] for stable learning."""
def __init__(self, low, high):
self.low = np.array(low, dtype=np.float32)
self.high = np.array(high, dtype=np.float32)
self.range = self.high - self.low
def normalize(self, actions):
return 2.0 * (actions - self.low) / self.range - 1.0
def denormalize(self, normalized):
return (normalized + 1.0) * self.range / 2.0 + self.low
# Example: 7-DOF robot arm
normalizer = ActionNormalizer(
low=[-0.05, -0.05, -0.05, -0.1, -0.1, -0.1, 0.0], # delta xyz, delta rot, gripper
high=[0.05, 0.05, 0.05, 0.1, 0.1, 0.1, 1.0],
)
Drift experiment: Generate a trajectory with simulate_delta_drift for noise levels $\sigma \in \{0.0001, 0.001, 0.01\}$. Plot drift over time. At what noise level does delta representation become impractical?
Rotation continuity: Sample 1000 random rotations. Convert to Euler, quaternion, and 6D. Perturb slightly and convert back. Which representation is most stable (smallest reconstruction error)?
Action space for OKS: Your OKS AMR robot has $(v, \omega)$ velocity commands. Design the action representation: continuous delta? Discrete bins? What chunk size? Write a specification.
Normalization impact: Train a BC policy on unnormalized vs normalized actions. Compare learning curves.
Continuous actions work for diffusion/flow-based policies. But what if you want to use a language model to generate actions? Tomorrow we tokenize: discretize continuous actions into a finite vocabulary so transformers can predict them with cross-entropy loss — exactly how RT-2 turns a VLM into a VLA.