Evaluate agent skill quality
skill-evalskillsetup L1★0
fede0089/skill-eval ↗What it does
Evaluate agent skill quality and correctness
Best for
Quality-gating skills before agent deployment when you need consistency and safety.
Inputs
- · [object Object]
- · [object Object]
Outputs
- · [object Object]
- · [object Object]
Requires
- · LLM for semantic evaluation
Preconditions
- · Skill syntactically valid
- · Test cases represent realistic usage
Failure modes
- · Overfitting to test cases
- · LLM unable to evaluate domain-specific correctness
Trust signals
- · Automated test case generation
- · Semantic similarity checks
- · Hallucination detection