Week 11: RL + Diffusion Foundations

Phase VI · Days 71–77 · 17.5 hours

This week builds the mathematical and practical foundations for robot learning: reinforcement learning theory (MDPs, policy gradients, PPO), then transitions into diffusion models — the generative framework that powers modern robot action prediction.

Daily Lessons

Day	Topic	Focus
71	RL Foundations Day 1	MDP, policy, value function, Bellman
72	RL Foundations Day 2	Actor-critic, GAE
73	PPO & RLHF Connection	Clipped objective, RLHF link
74	Diffusion Models — DDPM	Forward & reverse process
75	Diffusion — DDIM + CFG	Deterministic sampling, guidance
76	Diffusion — Latent Diffusion	VAE + latent space diffusion
77	Flow Matching	CNFs, ODE formulation, π₀ link

Study Notes Reference

11 — RL Foundations
12 — Diffusion & Flow Matching