cyberneticlibrary

Cross-model review strategy memos

cross-evalskillsetup L217,464
alirezarezvani/claude-skills
What it does

Cross-evaluate LLM prompts and benchmark their performance

Best for

Choosing between prompt strategies when objective metrics matter more than intuition.

Inputs
  • · Prompt variants
  • · Test cases
  • · Evaluation criteria (correctness, speed, cost)
Outputs
  • · Comparative benchmark results
  • · Statistical significance analysis
  • · Variance metrics
Requires
  • · LLM evaluation framework
Preconditions

Prompt variants ready; test cases representative; eval criteria quantifiable; LLM access configured

Failure modes

Test cases too small; eval criteria unmeasurable; variance high; cost delta dominated by token variation

Trust signals
  • · Statistical significance testing
  • · Variance analysis across runs
  • · Cost/latency/quality tradeoff matrix