Scale RL training with VeRL
verl-rl-trainingskillsetup L4★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Train LLMs with RL at scale using verl
Best for
Production math/reasoning tasks (GSM8K, MATH) where you need proven RL algorithms at scale.
Inputs
- · Base model
- · Training dataset (parquet with prompts)
- · Reward function or model
- · Config YAML
Outputs
- · Trained policy model
- · Training curves/metrics
Requires
- · verl>=0.3.0
- · torch>=2.0.0
- · ray>=2.41.0
- · vllm>=0.8.2
- · transformers>=4.40.0
Preconditions
8+ GPUs; H100 or A100 recommended; models must be HuggingFace compatible
Failure modes
- · Memory explosion on 671B+ models
- · Misaligned reward function
- · Distributed training synchronization failures
Trust signals
- · Powers Doubao-1.5-pro
- · EuroSys 2025 HybridFlow paper
- · Tested up to 671B parameters