cyberneticlibrary

Evaluate agent skill quality

skill-evalskillsetup L1★0

fede0089/skill-eval ↗

What it does

Evaluate agent skill quality and correctness

Best for

Quality-gating skills before agent deployment when you need consistency and safety.

Inputs

· [object Object]
· [object Object]

Outputs

· [object Object]
· [object Object]

Requires

· LLM for semantic evaluation

Preconditions

· Skill syntactically valid
· Test cases represent realistic usage

Failure modes

· Overfitting to test cases
· LLM unable to evaluate domain-specific correctness

Trust signals

· Automated test case generation
· Semantic similarity checks
· Hallucination detection