← Back to Home
ML Systems & Compilers
70-day curriculum — from GPU architecture to TVM, Triton, and distributed training
📅 10 weeks
⏱ 175 hours
🎯 5 phases
⚡ 70 daily lessons
Phases
Phase I
Hardware & Compute Foundations: GPU Architecture → PyTorch Internals
Days 1–14
Phase II
Compiler Infrastructure: IRs, Passes, Triton & torch.compile
Days 15–28
Phase III
Apache TVM Deep Dive: Relay → TIR → Tuning → MLIR & XLA
Days 29–49
Phase IV
Inference Optimization: Quantization, TensorRT & LLM Serving
Days 50–63
Phase V
Training at Scale: Distributed Training & Capstone Project
Days 64–70
Weekly Schedule
Phase I
Week 1: GPU Architecture & CUDA
Day 1: Why ML Needs Compilers
Day 2: GPU Architecture
Day 3: CUDA Programming Basics
Day 4: Memory Coalescing & Shared Memory
Day 5: CUDA Profiling & Roofline
Day 6: Matrix Multiply — Naive to Tiled
Day 7: Mini-Project — GEMM
Phase I
Week 2: PyTorch Internals
Day 8: PyTorch Under the Hood
Day 9: Memory Management in PyTorch
Day 10: Custom C++ Extensions
Day 11: torch.profiler & Trace Analysis
Day 12: Eager vs Graph Mode
Day 13: Operator Fusion Fundamentals
Day 14: Stop & Reflect #1
Phase II
Week 3: IR & Compiler Passes
Day 15: Compiler 101 for ML Engineers
Day 16: Computation Graphs as IR
Day 17: Graph-Level Optimizations
Day 18: The Polyhedral Model
Day 19: Loop Tiling & Scheduling
Day 20: Auto-Tuning & Search Spaces
Day 21: Mini-Project — FX Transform Pass
Phase II
Week 4: Triton & Kernel Engineering
Day 22: Triton Language Basics
Day 23: Triton Matrix Multiplication
Day 24: Triton Flash Attention
Day 25: torch.compile Internals
Day 26: TorchInductor Code Generation
Day 27: Custom Triton Backend
Day 28: Stop & Reflect #2
Phase III
Week 5: TVM Foundations
Day 29: TVM Architecture Overview
Day 30: Relay IR
Day 31: Relay Optimization Passes
Day 32: Tensor Expression (TE)
Day 33: TIR & Schedule Primitives
Day 34: TVM Runtime & Deployment
Day 35: Mini-Project — TVM Compilation
Phase III
Week 6: TVM Tuning & Backends
Day 36: AutoTVM & AutoScheduler
Day 37: MetaSchedule
Day 38: BYOC — Bring Your Own Codegen
Day 39: Quantization in TVM
Day 40: TVM for Edge Devices
Day 41: TVM Unity & Relax
Day 42: Stop & Reflect #3
Phase III
Week 7: TVM Advanced & MLC
Day 43: MLIR for ML
Day 44: XLA & StableHLO
Day 45: ONNX Runtime Deep Dive
Day 46: MLC-LLM
Day 47: Contributing to Apache TVM
Day 48: Compiler Testing & Verification
Day 49: Stop & Reflect #4
Phase IV
Week 8: Model Formats & Runtimes
Day 50: Model Formats & ONNX
Day 51: Weight Compression & Pruning
Day 52: Knowledge Distillation
Day 53: TensorRT Optimization
Day 54: Inference on CPU
Day 55: Inference on Edge Devices
Day 56: Mini-Project — Optimization Pipeline
Phase IV
Week 9: LLM Serving Systems
Day 57: LLM Inference Challenges
Day 58: KV Cache Optimization
Day 59: vLLM & PagedAttention
Day 60: Speculative Decoding
Day 61: LLM Quantization — GPTQ, AWQ, GGUF
Day 62: Serving Frameworks Comparison
Day 63: Stop & Reflect #5
Phase V
Week 10: Distributed Training & Capstone
Day 64: Distributed Training Basics
Day 65: Data Parallel & FSDP
Day 66: Tensor & Pipeline Parallelism
Day 67: Compiler's Role in Training
Day 68: Capstone — Design
Day 69: Capstone — Implementation
Day 70: Capstone — Evaluation