cyberneticlibrary

Scale RL training with VeRL

verl-rl-trainingskillsetup L49,423
Orchestra-Research/AI-Research-SKILLs
What it does

Train LLMs with RL at scale using verl

Best for

Production math/reasoning tasks (GSM8K, MATH) where you need proven RL algorithms at scale.

Inputs
  • · Base model
  • · Training dataset (parquet with prompts)
  • · Reward function or model
  • · Config YAML
Outputs
  • · Trained policy model
  • · Training curves/metrics
Requires
  • · verl>=0.3.0
  • · torch>=2.0.0
  • · ray>=2.41.0
  • · vllm>=0.8.2
  • · transformers>=4.40.0
Preconditions

8+ GPUs; H100 or A100 recommended; models must be HuggingFace compatible

Failure modes
  • · Memory explosion on 671B+ models
  • · Misaligned reward function
  • · Distributed training synchronization failures
Trust signals
  • · Powers Doubao-1.5-pro
  • · EuroSys 2025 HybridFlow paper
  • · Tested up to 671B parameters