cyberneticlibrary

Train large MoE models efficiently

miles-rl-trainingskillsetup L4★9,423

Orchestra-Research/AI-Research-SKILLs ↗

What it does

Train large MoE models with low-precision RL

Best for

Training 1TB+ MoE models with speculative RL for 25%+ rollout speedup.

Inputs

· MoE model (DeepSeek/Qwen)
· Prompt dataset (JSONL)
· Reward function
· GPU cluster config

Outputs

· Trained MoE checkpoint (FP8/INT4)
· Training metrics
· Rollout trajectories

Requires

· SGLang>=0.2.3
· Ray
· torch>=2.0.0
· Megatron-LM
· Docker

Preconditions

· H100/H200 cluster
· SGLang router
· 1TB+ MoE model
· FP8 block scaling enabled

Failure modes

· Expert routing inconsistent
· FP8 rounding NaN
· Stale partial rollout cache
· CUDA OOM

Trust signals

· Orchestra production fork
· Enterprise-grade stability
· Megatron parallelism
· Zero-copy CUDA IPC