Debug scraper-to-analyst pipeline
debug-pipelinecommandsetup L1★136
colossus-lab/openarg_backend ↗What it does
Diagnose data pipeline failures across scraper-embedding-collector-analyst
Best for
Rapid triage of multi-stage data pipeline failures in production with clear diagnostic separation per pipeline stage.
Inputs
- · Problem description ($ARGUMENTS)
- · Current system state (logs, database, Redis)
Outputs
- · Diagnostics checklist results
- · Identified failure point (Scraper/Embedding/Collector/Analyst)
- · Remediation steps
Requires
- · PostgreSQL
- · Redis
- · Flower (Celery monitoring)
- · OpenAI embedding API
- · HTTPX (for timeouts)
- · Pandas (file parsing)
- · GPT-4o-mini / GPT-4o (LLM analysis)
Preconditions
- · PostgreSQL running and migrations applied
- · Redis on port 6381
- · Celery workers active
- · Environment variables set (OPENAI_API_KEY, DATABASE_URL, etc.)
Failure modes
- · PostgreSQL down → pipeline stalls at dataset storage
- · Redis unavailable → Celery broker fails
- · Missing OpenAI key → embedding task fails
- · CKAN portal unresponsive → scraper finds no catalogs
- · No cached datasets → analyst cannot analyze
Trust signals
- · Dedicated subsection per stage with specific failure checklist
- · SQL query examples to verify data presence
- · Flower URL provided for real-time task monitoring
- · Explicit step counts: vector search → data gather → analysis