cyberneticlibrary

Run ablation study on hardest problems

mhpp-10-ablationworkflowsetup L3★4

ejentum/benchmarks ↗

What it does

Ablate 10 hardest MHPP tasks with 3 conditions via agentic harness calls

Best for

Ablation studies where solver agents invoke external tools themselves (agentic-tool pattern, not pre-generation).

Inputs

· HuggingFace MHPP dataset
· 10 selected tasks
· 3 conditions: B/D/A

Outputs

· 30 solution codes
· per-condition pass rates
· results committed to GitHub

Requires

· HuggingFace datasets
· Ejentum /harness/ API
· gh CLI
· hidden test harness

Preconditions

MHPP fetchable; pre-registration committed; Ejentum API key available

Failure modes

· Dataset fetch fails (tries 2 sources + web search)
· Harness call times out → agent continues
· Hidden test fails → quarantine task

Trust signals

· PRE_REGISTRATION.md committed to repo before solve agents run
· Hidden tests enforce protocol
· 30 parallel agents amortize harness cost