Cross-model review strategy memos
cross-evalskillsetup L2★17,464
alirezarezvani/claude-skills ↗What it does
Cross-evaluate LLM prompts and benchmark their performance
Best for
Choosing between prompt strategies when objective metrics matter more than intuition.
Inputs
- · Prompt variants
- · Test cases
- · Evaluation criteria (correctness, speed, cost)
Outputs
- · Comparative benchmark results
- · Statistical significance analysis
- · Variance metrics
Requires
- · LLM evaluation framework
Preconditions
Prompt variants ready; test cases representative; eval criteria quantifiable; LLM access configured
Failure modes
Test cases too small; eval criteria unmeasurable; variance high; cost delta dominated by token variation
Trust signals
- · Statistical significance testing
- · Variance analysis across runs
- · Cost/latency/quality tradeoff matrix