Phase VI · Days 71–77 · 17.5 hours
This week builds the mathematical and practical foundations for robot learning: reinforcement learning theory (MDPs, policy gradients, PPO), then transitions into diffusion models — the generative framework that powers modern robot action prediction.
| Day | Topic | Focus |
|---|---|---|
| 71 | RL Foundations Day 1 | MDP, policy, value function, Bellman |
| 72 | RL Foundations Day 2 | Actor-critic, GAE |
| 73 | PPO & RLHF Connection | Clipped objective, RLHF link |
| 74 | Diffusion Models — DDPM | Forward & reverse process |
| 75 | Diffusion — DDIM + CFG | Deterministic sampling, guidance |
| 76 | Diffusion — Latent Diffusion | VAE + latent space diffusion |
| 77 | Flow Matching | CNFs, ODE formulation, π₀ link |