cyberneticlibrary

Train large MoE models efficiently

miles-rl-trainingskillsetup L49,423
Orchestra-Research/AI-Research-SKILLs
What it does

Train large MoE models with low-precision RL

Best for

Training 1TB+ MoE models with speculative RL for 25%+ rollout speedup.

Inputs
  • · MoE model (DeepSeek/Qwen)
  • · Prompt dataset (JSONL)
  • · Reward function
  • · GPU cluster config
Outputs
  • · Trained MoE checkpoint (FP8/INT4)
  • · Training metrics
  • · Rollout trajectories
Requires
  • · SGLang>=0.2.3
  • · Ray
  • · torch>=2.0.0
  • · Megatron-LM
  • · Docker
Preconditions
  • · H100/H200 cluster
  • · SGLang router
  • · 1TB+ MoE model
  • · FP8 block scaling enabled
Failure modes
  • · Expert routing inconsistent
  • · FP8 rounding NaN
  • · Stale partial rollout cache
  • · CUDA OOM
Trust signals
  • · Orchestra production fork
  • · Enterprise-grade stability
  • · Megatron parallelism
  • · Zero-copy CUDA IPC