cyberneticlibrary

Backfill public contracts data to BigQuery

contracts-finder-backfillworkflowsetup L40
chrisns/uk-tenders-mcp
What it does

Resilient sharded ingestion of UK Contracts Finder OCDS 2016–2026 into BigQuery with resumable partial loads

Best for

Large-scale procurement time-series ingestion where idempotency and resumability are critical.

Inputs
  • · CF API
  • · month window (2016-11 to 2026-05)
  • · BigQuery project
Outputs
  • · BigQuery releases + processes tables
  • · cross-source dedup via process_group
Requires
  • · BigQuery
  • · UK Contracts Finder API
  • · Python ingestion module (uk_tenders_ingest)
Preconditions

GCP project + BigQuery initialized; CF API accessible; Python .venv available; PYTHONPATH configured

Failure modes
  • · Per-month agent timeout (600s) → status=partial_or_timeout, resumable
  • · Dedup call fails → return match result anyway
Trust signals
  • · Streaming per-week within month (partial success survives timeout)
  • · Idempotent UPSERT (no duplicates on retry)
  • · Cross-source dedup via process_group matcher