cyberneticlibrary

Train GLM models with SLIME

slime-rl-trainingskillsetup L49,423
Orchestra-Research/AI-Research-SKILLs
What it does

Train large models with reinforcement learning

Best for

Research-grade RL training with flexible reward functions and algorithm variants.

Inputs
  • · Base model
  • · Prompt dataset
  • · Reward function
  • · Training config (batch/lr/epochs)
Outputs
  • · RL-trained checkpoint
  • · Loss curves
  • · Rollout samples
Requires
  • · slime library
  • · torch
  • · transformers
  • · Ray (optional)
Preconditions
  • · Model in supported format
  • · Reward function defined
  • · GPU >= 40GB
Failure modes
  • · Reward hacking
  • · Mode collapse
  • · Training divergence
  • · Reward noise too high
Trust signals
  • · slime ecosystem
  • · Flexible backend
  • · Published research base