Train GLM models with SLIME
slime-rl-trainingskillsetup L4★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Train large models with reinforcement learning
Best for
Research-grade RL training with flexible reward functions and algorithm variants.
Inputs
- · Base model
- · Prompt dataset
- · Reward function
- · Training config (batch/lr/epochs)
Outputs
- · RL-trained checkpoint
- · Loss curves
- · Rollout samples
Requires
- · slime library
- · torch
- · transformers
- · Ray (optional)
Preconditions
- · Model in supported format
- · Reward function defined
- · GPU >= 40GB
Failure modes
- · Reward hacking
- · Mode collapse
- · Training divergence
- · Reward noise too high
Trust signals
- · slime ecosystem
- · Flexible backend
- · Published research base