Phase VI — Robot Learning: RL, Diffusion & Data | Week 12 | 2.5 hours "From diffusing pixels to diffusing robot actions. Same math, different space. The abstraction layer is what makes intelligence general."
This is a reflection day. No new implementation. Consolidate the conceptual leap from generative models for images to generative models for robot actions. Write, think, connect.
You've now seen four approaches to generating robot actions:
| Approach | Key Idea | Strengths | Weaknesses |
|---|---|---|---|
| BC (Day 78) | Supervised regression | Simple | Mode averaging, compounding error |
| Decision Transformer (Day 80) | Sequence prediction | Return conditioning | No stitching, limited |
| ACT (Day 79) | CVAE + chunking | Multimodal, temporal | Training instability (KL) |
| Diffusion Policy (Day 81) | Denoise actions | Full distribution, expressive | Slow inference |
Write 500+ words answering: "Why is generating robot actions harder than generating images, and what properties of diffusion models make them well-suited for both?"
Consider:
| Dimension | Images | Robot Actions |
|---|---|---|
| Dimensionality | High (512×512×3) | Low (7-20 DOF) |
| Temporal structure | Single frame | Sequential, causal |
| Physical constraints | None (any pixel valid) | Joint limits, collisions |
| Evaluation | Visual quality (FID) | Task success (binary) |
| Multimodality | "Draw a cat" → many valid cats | "Pick up mug" → multiple grasps |
| Safety | Bad image = harmless | Bad action = crash |
The field has split into two camps:
Camp 1: Tokenize actions → language model generates them - RT-2, OpenVLA: discretize actions, predict with cross-entropy - Advantage: leverage massive LLM pre-training - Risk: discretization loses precision
Camp 2: Keep actions continuous → diffusion/flow head generates them - Diffusion Policy, π₀: separate action generation module - Advantage: full continuous distribution - Risk: more complex architecture
Trace this from Day 5 through today:
| Day | Concept | Connection |
|---|---|---|
| 5 | Cross-entropy = compression | |
| 10-14 | Attention = selective compression | |
| 22 | Tokenization = lossless encoding | |
| 25 | Scaling laws = compression efficiency | |
| 74 | DDPM = learning to reverse entropy | |
| 77 | Flow matching = optimal transport | |
| 81 | Diffusion Policy = compressing action distributions | |
| 83 | Action tokenization = discretizing the action signal |
Write: "How does the compression/prediction thread explain why diffusion models work for robot actions? Is a diffusion policy 'compressing' the space of valid actions?"
Before proceeding to Week 13, verify you can answer:
Phase VI gave you the tools: RL foundations, diffusion models, flow matching, imitation learning, action representations, and tokenization. Next week: the practical realities of data collection, policy evaluation, and debugging — then the Phase VI capstone. After that, Phase VII applies everything to build actual VLAs.