cyberneticlibrary

Evaluate agent skill quality

skill-evalskillsetup L10
fede0089/skill-eval
What it does

Evaluate agent skill quality and correctness

Best for

Quality-gating skills before agent deployment when you need consistency and safety.

Inputs
  • · [object Object]
  • · [object Object]
Outputs
  • · [object Object]
  • · [object Object]
Requires
  • · LLM for semantic evaluation
Preconditions
  • · Skill syntactically valid
  • · Test cases represent realistic usage
Failure modes
  • · Overfitting to test cases
  • · LLM unable to evaluate domain-specific correctness
Trust signals
  • · Automated test case generation
  • · Semantic similarity checks
  • · Hallucination detection