Load bulk contract data into BigQuery
cf-parallel-harvester-rawloadworkflowsetup L3★0
chrisns/uk-tenders-mcp ↗What it does
Parallel append-only bulk load BigQuery from 2-month shards
Best for
Append-loading UK Contracts Finder historical data (2016-2026) into BigQuery in resumable 2-month shards.
Inputs
- · GCP project, BQ location, Python venv
- · Shard date ranges (from-to, 2-month windows)
- · Bulk harvester data source
Outputs
- · RAWLOAD report (seen=N, appended=M per shard)
- · Error text if failed/timed out
Requires
- · BigQuery (WRITE_APPEND, no compile/DML)
- · Contracts Finder bulk harvester API
- · Python script runner (600s timeout per shard)
Preconditions
- · BigQuery credentials (GCP_PROJECT env var)
- · PYTHONPATH set to ingestion/src
- · Multi-phase: Raw-load (parallel 2-month shards)
- · 2016-11 through 2026-05 (~59 shards)
Failure modes
- · Partial appends OK (resumable on retry)
- · Timeout at 600s may truncate shard
- · No compile/dedup in this phase (separate step)
- · Concurrency safe via WRITE_APPEND (no conflicts)
Trust signals
- · Concurrency-safe WRITE_APPEND strategy
- · 2-month sharding (manageable context per shard)
- · Timeout + resumability pattern
- · Per-shard reporting (seen/appended counts)