cyberneticlibrary

Evaluate LLM agent responses against rubrics

evaluating-llmsskillsetup L20
ionmidori/SYDBioedilizia
What it does

Write and run ADK-based eval rubrics with LLM-as-Judge patterns

Best for

Validating agent responses against business rules without manual test review.

Inputs
  • · Test case (input/expected pair)
  • · Rubric definition
Outputs
  • · Eval metric scores 0.0–1.0
Requires
  • · Google ADK AgentEvaluator
  • · gemini-3.1-flash-lite
Preconditions
  • · Test data JSON formatted
  • · Rubric ID unique
Failure modes
  • · Rubric threshold too strict
  • · Judge model out of quota
Trust signals
  • · Rubric-based criterion with threshold
  • · 3-sample judge mode