Days 64–70 · 17.5 hours
This week surveys the open VLM landscape, tackles spatial reasoning, reflects on the gap between VLMs and VLAs, then closes Phase V with capstone projects and hands-on fine-tuning.
| Day | Topic | Phase | Focus |
|---|---|---|---|
| 64 | Open VLM Landscape | V | InternVL, Qwen-VL, Phi-3-Vision |
| 65 | Spatial Reasoning & Grounding | V | Visual grounding, referring expressions |
| 66 | Stop & Reflect #4 | V | From seeing to acting |
| 67 | Phase V Capstone Day 1 | V | VLM inference pipeline |
| 68 | Phase V Capstone Day 2 | V | Evaluation + checkpoint |
| 69 | VLM Fine-Tuning Day 1 | V | LoRA fine-tuning on custom data |
| 70 | VLM Fine-Tuning Day 2 | V | Evaluation vs base model |