Phase VII · Days 92–98 · 17.5 hours
The culmination of the curriculum: study every major Vision-Language-Action model from RT-1 through π₀.₅ and GR00T N1. Understand how VLMs become VLAs, how actions are generated, and how these systems scale.
| Day | Topic | Focus |
|---|---|---|
| 92 | RT-1 | FiLM-conditioned EfficientNet, tokenized actions |
| 93 | RT-2 — VLM to VLA | PaLM-E + actions, co-fine-tuning |
| 94 | Octo | Generalist policy, multi-embodiment |
| 95 | OpenVLA | Open-source VLA, Prismatic backbone |
| 96 | π₀ | Flow matching for robot actions |
| 97 | π₀.₅ | Scaling Physical Intelligence |
| 98 | GR00T N1 + PaLM-E | NVIDIA humanoid model, embodied reasoning |