Week 14: VLA Architectures

Phase VII · Days 92–98 · 17.5 hours

The culmination of the curriculum: study every major Vision-Language-Action model from RT-1 through π₀.₅ and GR00T N1. Understand how VLMs become VLAs, how actions are generated, and how these systems scale.

Daily Lessons

Day	Topic	Focus
92	RT-1	FiLM-conditioned EfficientNet, tokenized actions
93	RT-2 — VLM to VLA	PaLM-E + actions, co-fine-tuning
94	Octo	Generalist policy, multi-embodiment
95	OpenVLA	Open-source VLA, Prismatic backbone
96	π₀	Flow matching for robot actions
97	π₀.₅	Scaling Physical Intelligence
98	GR00T N1 + PaLM-E	NVIDIA humanoid model, embodied reasoning

Study Notes Reference

15 — VLA Architectures
13 — Imitation Learning