veRL

Preparation

  • 1. Installation
  • 2. Prepare Data (Parquet) for Post-Training
  • 3. Implment Reward Function for Dataset

PPO Example

  • 1. PPO Example Architecture
  • 2. Config Explaination
  • 3. GSM8K Example

PPO Trainer and Workers

  • PPO Ray Trainer
  • PyTorch FSDP Backend
  • Megatron-LM Backend

Advance Usage and Extension

  • Ray API Design Tutorial
  • Extend to other RL(HF) algorithms
  • Add models to FSDP backend
  • Add models to Megatron-LM backend
veRL
  • Search


© Copyright 2024 ByteDance Seed Foundation MLSys Team.

Built with Sphinx using a theme provided by Read the Docs.