Grade adversarial test corpus with consensus

ix-adversarial-llm-panelworkflowsetup L3★0

What it does

Grade the LLM-deferred tier of the ga-chatbot adversarial corpus with a multi-lens judge panel + hexavalent consensus, scored against expected_verdict

Best for

Complex workflows requiring parallel agents, synthesized judgment, or multi-phase triage.

Inputs

· structured data

Outputs

· analysis results

Requires

· Claude Code agent runtime (parallel/fan-out)

Preconditions

· Claude Code workflow harness

Failure modes

· Subagent produces low-quality output → iteration needed

Trust signals

· Adversarial cross-check phase

Capability

semantic-evaluation → compare alternatives