← Back to Curriculum

Week 11: RL + Diffusion Foundations

Phase VI · Days 71–77 · 17.5 hours

This week builds the mathematical and practical foundations for robot learning: reinforcement learning theory (MDPs, policy gradients, PPO), then transitions into diffusion models — the generative framework that powers modern robot action prediction.

Daily Lessons

Day Topic Focus
71 RL Foundations Day 1 MDP, policy, value function, Bellman
72 RL Foundations Day 2 Actor-critic, GAE
73 PPO & RLHF Connection Clipped objective, RLHF link
74 Diffusion Models — DDPM Forward & reverse process
75 Diffusion — DDIM + CFG Deterministic sampling, guidance
76 Diffusion — Latent Diffusion VAE + latent space diffusion
77 Flow Matching CNFs, ODE formulation, π₀ link

Study Notes Reference