← Week 13: Data & Evaluation

Day 91: Phase VI Capstone — Day 3: Integration & Checkpoint

Phase VI — Robot Learning: RL, Diffusion & Data | Week 13 | 3 hours "The tools are built. The pipeline works. Now consolidate before we apply everything to VLAs."


Part 1: Pipeline Integration (60 min)

Complete Pipeline Summary

Consolidate your capstone work into a single clean document:

YOUR ROBOT LEARNING PIPELINE
=============================

Task: _______________________
Environment: ________________

Data:
  - Episodes collected: ___
  - Success rate in demos: ___%
  - Total transitions: ___
  - Augmentations used: ___

Models trained:
  1. Baseline BC:        SR = ___%  [CI: ___, ___]
  2. Advanced (___):     SR = ___%  [CI: ___, ___]
  3. Expert upper bound: SR = ___%

Key ablation findings:
  - Data quantity: ___
  - Chunk size: ___
  - Augmentation impact: ___

Failure analysis:
  - Dominant failure mode: ___
  - Fix applied: ___
  - Improvement: ___ → ___%

Architecture Map

Draw the complete architecture of your best policy:

Input → [Obs Encoder] → [Policy Network] → [Action Decoder] → Output
  │         │                  │                   │              │
  │    What type?         What type?          What type?     How executed?
  │    (MLP/CNN/ViT)    (BC/GMM/Diff)      (chunk/token)   (direct/IK)

Part 2: Phase VI Reflection (60 min)

The Knowledge Architecture

You've now built three stacks of knowledge:

Stack 1: Generative Models (Weeks 11-12)
  ├── RL foundations (MDP, policy gradient, PPO)
  ├── DDPM, DDIM, classifier-free guidance
  ├── Latent diffusion, flow matching
  └── Connection: same math generates images AND robot actions

Stack 2: Imitation Learning (Week 12)
  ├── BC → DAgger → ACT → Decision Transformer → Diffusion Policy
  ├── Action representations (joint/EE, absolute/delta, rotation)
  ├── Action tokenization (uniform bins, VQ-VAE)
  └── Connection: transformers can predict actions like tokens

Stack 3: Data & Evaluation (Week 13)
  ├── Data collection, quality, mixing
  ├── Policy evaluation with statistical rigor
  ├── Systematic debugging
  └── Connection: data quality > model architecture

Writing Prompt

Write 500+ words: "What is the single most important lesson from Phase VI, and how does it change your understanding of what VLAs will need to succeed?"

Consider: - Why data matters more than architecture - Why multimodality is the core challenge - How diffusion/flow models solve the right problem - What evaluation rigor means for VLA deployment


Part 3: Phase VI Checkpoint (60 min)

8-Question Checkpoint

Answer each question in 3-5 sentences with mathematical detail:

Q1. DDPM Training Objective Write the DDPM loss function. Explain what $\epsilon_\theta$, $\alpha_t$, and $\bar{\alpha}_t$ represent. Why does predicting noise work?

Q2. Diffusion Policy vs BC You have a task where the robot can grasp a mug from the left or right. Explain with a diagram why BC fails and Diffusion Policy succeeds.

Q3. Behavioral Cloning vs DAgger BC suffers from compounding errors. Explain the mechanism (with the $T^2$ error bound) and how DAgger fixes it.

Q4. Action Tokenization RT-2 uses 256 bins for action discretization. For a robot arm with $\Delta x \in [-5\text{cm}, 5\text{cm}]$, what's the resolution per bin? Is this sufficient for manipulation?

Q5. PPO and RLHF Write the PPO clipped objective. Explain why clipping prevents catastrophic policy updates. How does RLHF use PPO?

Q6. Flow Matching vs Diffusion Name three advantages of flow matching over DDPM. What does "straight paths in probability space" mean geometrically?

Q7. Data Quality You have 1000 demonstrations but your policy only achieves 40% success rate. List 5 potential data quality issues and how to diagnose each.

Q8. Policy Debugging Your policy reaches for the correct object but overshoots by 2cm every time. Categorize this failure, hypothesize the root cause, and propose a fix.

Grading Rubric

Score Meaning
8/8 Ready for Phase VII
6-7/8 Review weak areas, then proceed
4-5/8 Re-study relevant days before continuing
<4/8 Repeat Week 12 exercises

Phase VI → Phase VII Transition

What Changes

Aspect Phase VI Phase VII
Focus Building blocks Complete systems
Scale Single-task Multi-task, multi-embodiment
Architecture Policy networks VLMs + action heads
Data Hundreds of demos Millions of episodes
Evaluation Simulation Real-world deployment

What Carries Forward

Everything. Phase VII VLAs are built from Phase VI components: - RT-1 = ViT encoder + action tokenization (Days 22, 83) - RT-2 = VLM + action tokens (Days 36-40, 83) - Diffusion Policy = denoising action head (Day 81) - π₀ = VLM + flow matching (Days 77, 81) - OpenVLA = open-source VLM + action tokens (Days 36-40, 83)


Key Takeaways

  1. Phase VI gave you the full toolkit — RL, diffusion, IL, data, evaluation
  2. Data quality > architecture — the most consistent finding across all experiments
  3. Multimodality is the core challenge — MSE loss fails; diffusion/flow/GMM succeed
  4. Statistical rigor matters — report confidence intervals, not just point estimates
  5. Phase VII applies these tools to build actual VLAs that understand language, see images, and generate actions

Connection to the Thread

Phase VII begins tomorrow with RT-1 — the first Robotics Transformer. It's a 35M parameter model that takes images + language instructions and outputs tokenized actions. Simple, effective, and the foundation everything else builds on. The transition from "robot learning components" to "complete VLA systems" starts now.


Further Reading

  • Review all Phase VI papers: DDPM, DDIM, Flow Matching, Diffusion Policy, ACT, Decision Transformer
  • Preview: Brohan et al. (2022), "RT-1: Robotics Transformer for Real-World Control at Scale"
← Day 90: Phase VI Capstone Day 2 Day 92: RT-1 →