veRL

Preparation

1. Installation
2. Prepare Data (Parquet) for Post-Training
3. Implment Reward Function for Dataset

PPO Example

1. PPO Example Architecture
2. Config Explaination
3. GSM8K Example

PPO Trainer and Workers

PPO Ray Trainer
PyTorch FSDP Backend
Megatron-LM Backend

Advance Usage and Extension

Ray API Design Tutorial
Extend to other RL(HF) algorithms
Add models to FSDP backend
Add models to Megatron-LM backend

veRL

Index

Index

© Copyright 2024 ByteDance Seed Foundation MLSys Team.

Built with Sphinx using a theme provided by Read the Docs.