← Back to Curriculum

Week 14: VLA Architectures

Phase VII · Days 92–98 · 17.5 hours

The culmination of the curriculum: study every major Vision-Language-Action model from RT-1 through π₀.₅ and GR00T N1. Understand how VLMs become VLAs, how actions are generated, and how these systems scale.

Daily Lessons

Day Topic Focus
92 RT-1 FiLM-conditioned EfficientNet, tokenized actions
93 RT-2 — VLM to VLA PaLM-E + actions, co-fine-tuning
94 Octo Generalist policy, multi-embodiment
95 OpenVLA Open-source VLA, Prismatic backbone
96 π₀ Flow matching for robot actions
97 π₀.₅ Scaling Physical Intelligence
98 GR00T N1 + PaLM-E NVIDIA humanoid model, embodied reasoning

Study Notes Reference