Train large MoE models efficiently
miles-rl-trainingskillsetup L4★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Train large MoE models with low-precision RL
Best for
Training 1TB+ MoE models with speculative RL for 25%+ rollout speedup.
Inputs
- · MoE model (DeepSeek/Qwen)
- · Prompt dataset (JSONL)
- · Reward function
- · GPU cluster config
Outputs
- · Trained MoE checkpoint (FP8/INT4)
- · Training metrics
- · Rollout trajectories
Requires
- · SGLang>=0.2.3
- · Ray
- · torch>=2.0.0
- · Megatron-LM
- · Docker
Preconditions
- · H100/H200 cluster
- · SGLang router
- · 1TB+ MoE model
- · FP8 block scaling enabled
Failure modes
- · Expert routing inconsistent
- · FP8 rounding NaN
- · Stale partial rollout cache
- · CUDA OOM
Trust signals
- · Orchestra production fork
- · Enterprise-grade stability
- · Megatron parallelism
- · Zero-copy CUDA IPC