veRL
Preparation
1. Installation
2. Prepare Data (Parquet) for Post-Training
3. Implment Reward Function for Dataset
PPO Example
1. PPO Example Architecture
2. Config Explaination
3. GSM8K Example
PPO Trainer and Workers
PPO Ray Trainer
PyTorch FSDP Backend
Megatron-LM Backend
Advance Usage and Extension
Ray API Design Tutorial
Extend to other RL(HF) algorithms
Add models to FSDP backend
Add models to Megatron-LM backend
veRL
Index
Index