cyberneticlibrary

Enrich data via web search

web_search_enricherskillsetup L20
TrevorMann/AIDataCleansing
What it does

Resolve missing or ambiguous data fields via web search

Best for

Filling postal code → municipality gaps in data records when deterministic database lookups fail and confidence is low.

Inputs
  • · Record with gaps: postal_code, city, address, state_province, _triage_route, _triage_data_confidence, _unknown_fsa, _municipality_confidence, _gap_hints
Outputs
  • · Enriched record with resolved fields: municipality, _web_search_evidence (audit trail), _decisions
Requires
  • · Tavily API (web search, rate-limited and billed)
Preconditions
  • · _triage_route == 'needs_review'
  • · _triage_data_confidence < trigger_below (default 0.70)
  • · At least one identifiable gap
  • · Per-batch budget not exhausted
  • · Deterministic skills run first
Failure modes
  • · Tavily API rate limit or budget exhausted → no search performed
  • · Parser plugin returns no useful data → low-confidence signal only
  • · Web search results incorrect → output as low-confidence signal, not ground truth
  • · Gap hints ambiguous → skill may over-search
Trust signals
  • · Gating logic prevents wasteful searches (route + confidence + gap checks)
  • · Audit trail logged in _web_search_evidence
  • · Domain-agnostic core with per-domain parser plugins
  • · Web cache reuses results (TTL managed)