cyberneticlibrary

Audit AI agent skills for safety and cost

skill-evalskillsetup L39
aws-samples/sample-agent-skill-eval
What it does

Evaluate AI agent skill for production readiness and capability fit

Best for

When vetting skills for inclusion in agent harnesses or production use.

Inputs
  • · skill definition
  • · test case coverage
Outputs
  • · readiness score
  • · failure mode analysis
  • · recommendation
Requires
  • · evaluation rubric
  • · test harness
Preconditions
  • · skill has README
  • · triggers defined
Failure modes
  • · score misses critical gap
  • · insufficient test coverage
  • · false positive readiness
Trust signals
  • · Rubric weights specified
  • · failure mode categories exhaustive