Score and gate skill improvements

os-eval-runnerskillsetup L33
richfrem/agent-plugins-skills
What it does

Run stateless skill evaluation iterations

Best for

Autonomously iterating skill improvements using empirical scoring gates instead of human feedback.

Inputs
  • · phase-number
  • · optimization-metric
Outputs
  • · task-completion-status
Requires
  • · Task/Subagent
  • · git
  • · Edit
  • · Python
  • · Bash
  • · Read
Preconditions
  • · Valid git repository initialized
  • · Python 3.8+ available
  • · Evaluation harness installed
Failure modes
  • · Lab and master versions have diverged
Trust signals
  • · Explicit verification gates
  • · Multi-phase execution with gates
  • · Empirical evaluation scoring