Phase VII — VLAs: Architecture to Deployment | Week 16 | 3 hours "Design a complete VLA system. Not a toy — a production-viable architecture with every component specified."
Over three days, you'll design, implement, and evaluate a complete VLA system.
Scenario: Design a VLA for a kitchen robot that can: - Follow natural language cooking instructions - Manipulate diverse kitchen objects (utensils, containers, food) - Operate safely around humans - Learn from human corrections during deployment - Run at ≥15 Hz on edge hardware (Jetson Orin)
Complete this design document:
VLA SYSTEM DESIGN DOCUMENT
═══════════════════════════
1. MODEL ARCHITECTURE
1.1 VLM Backbone:
Model: _______________
Parameters: ___________
Why this choice: ______
1.2 Vision Encoder:
Architecture: __________
Input resolution: ______
Output dim: ____________
Freeze strategy: _______
1.3 Action Representation:
Type: [tokens / continuous / flow / hybrid]
Dimension: _____________
Chunk size: ____________
Frequency: _____________
Why this choice: _______
1.4 Action Head:
Architecture: __________
Parameters: ____________
Conditioning: __________
2. TRAINING PIPELINE
2.1 Data:
Robot data source: ____
Web data source: ______
Robot/web ratio: ______
Total training examples: ___
Data augmentation: _____
2.2 Training Schedule:
Stage 1 (align): ______
Stage 2 (co-fine-tune): ___
Stage 3 (specialize): ___
Total compute: ________
2.3 Multi-task Strategy:
Task list: ____________
Sampling temperature: __
Loss weighting: _______
3. DEPLOYMENT STACK
3.1 Hardware:
Compute: ______________
Sensors: ______________
Robot: ________________
3.2 Inference:
Quantization: _________
Target latency: _______
Throughput: ___________
Caching strategy: _____
3.3 Safety:
Layer 1 (physical): ___
Layer 2 (hardware): ___
Layer 3 (motion): _____
Layer 4 (behavior): ___
Layer 5 (task): _______
3.4 Monitoring:
Key metrics: __________
Alert thresholds: _____
Dashboard: ____________
4. ADAPTATION
4.1 Online learning:
Strategy: _____________
Buffer size: __________
Fine-tune frequency: __
4.2 Fleet management:
Rollout strategy: _____
A/B testing: __________
Rollback plan: ________
Draw a detailed system diagram showing data flow:
Fill in:
Camera ──→ [___________] ──→ [___________] ──→ [___________]
│
Proprio ──→ [___________] ──────────────────────→ [___________]
│
Language ──→ [___________] ──────────────────────→ [___________]
│
▼
[___________]
│
▼
[___________]
│
▼
Robot Controller
Identify the top 5 risks and mitigations:
| Rank | Risk | Probability | Impact | Mitigation |
|---|---|---|---|---|
| 1 | ||||
| 2 | ||||
| 3 | ||||
| 4 | ||||
| 5 |
For each design decision, document:
Decision 1: Action representation - Option A: Tokenized (256 bins) - Pro: _, _ - Con: _, _ - Option B: Flow matching - Pro: _, _ - Con: _, _ - Option C: Hybrid coarse-to-fine - Pro: _, _ - Con: _, _ - Chosen: _ because ___
Decision 2: Model size - Option A: 3B VLM + LoRA - Option B: 7B VLM frozen + small action head - Option C: 300M end-to-end - Chosen: _ because ___
Decision 3: Training data strategy - Option A: Real-only (expensive but high quality) - Option B: Sim-to-real (cheap but gap) - Option C: Web pretrain + real fine-tune - Chosen: _ because ___
Define success metrics:
Offline Metrics:
- Action prediction loss: target < ___
- Action accuracy (within 5mm): target > ___%
- Language grounding accuracy: target > ___%
Online Metrics (simulation):
- Task success rate: target > ___%
- Mean episode length: target < ___ steps
- Safety violation rate: target < ___%
Online Metrics (real):
- Task success rate: target > ___%
- Human correction rate: target < ___%
- Control frequency: target ≥ ___ Hz
- Time to first failure: target > ___ minutes
Generalization Tests:
- Novel objects success: target > ___%
- Novel instructions success: target > ___%
- Distractor robustness: target > ___%
By end of session, you should have: - [ ] Complete architecture specification (all fields filled) - [ ] System diagram with all components and data flows - [ ] Risk analysis with mitigations - [ ] Documented trade-off rationale for 3+ key decisions - [ ] Evaluation plan with concrete numeric targets
Architecture designed. Tomorrow (Day 111): implement the core components — the VLA model, training loop, and basic evaluation. Day 112: full integration, evaluation, and final reflection.