Learning Modules
| Module | Topic | Lab | Readings | Deliverables |
|---|---|---|---|---|
| Module 1 | From AlphaZero to RLHF
| Alignment cleaning robot challenge warmup activity | Textbook:
Other: | — |
| Module 2 | Reinforcement Learning Foundations
| Tabular value iteration in GridWorld | Textbook:
| — |
| Module 3 | Deep Reinforcement Learning: Value-based Agents
| CartPole with DQN variations using Stable-Baselines3 | Textbook:
Other: | — |
| Module 4 | Deep Reinforcement Learning: Policy Gradients and PPO
| Use PPO on MiniGrid | Textbook:
| — |
| Module 5 | Safety, Generalization, and Exploration
| TBD | Textbook:
| Project 1 |
| Module 6 | Human in the Loop RL
| TBD | — | |
| Module 7 | RLHF Pipeline
| Collect binary preferences, train reward model, and finetune toy LMs | — | |
| Module 8 | Offline and Batch RL
| Decision Transformer implementation | Papers: | Project 2 |
| Module 9 | Model-based RL and World Models
| DreamerV3 introduction | Textbook:
| — |
| Module 10 | Hierarchical RL
| Hierarchical RL experiments | — | |
| Module 11 | Inverse RL and Reward Inference
| Toy Car IRL implementation | Project 3 | |
| Module 12 | Multi-agent RL and Emergence
| PettingZoo or Melting Pot multi-agent experiment | — | |
| Module 13 | RL in Structured & Constrained Domains
| TBD | — | |
| Module 14 | Frontiers in Aligned RL
| TBD | Project 4 |
Learning Path
This course is structured as a 14-module progression from foundational concepts to advanced applications. Each module builds on previous knowledge, with hands-on labs and comprehensive reading materials. Self-paced learners can expect to spend 8-10 hours per module for thorough understanding.