cyberneticlibrary

Audit and clean CRM data quality

crm-hygiene-scannerskillsetup L240
Othmane-Khadri/gtm-engineer-playbook
What it does

Audit CRM data quality, detect duplicates, flag stale records, score completeness

Best for

CRM admin or marketing ops cleaning up legacy data before a migration or merge campaign—measure health, detect garbage, plan sprints.

Inputs
  • · CSV export file path (contacts, companies, or deals)
  • · Data type (contacts/companies/deals)
  • · CRM system (HubSpot, Salesforce, Pipedrive, other)
  • · Critical fields to measure (email, phone, company name, deal stage, last activity, owner)
Outputs
  • · Data profile: total records, columns, data types, fill rate per column, date range, unique vs. total
  • · Duplicate groups with confidence levels (HIGH: exact email match; MEDIUM: fuzzy name match or cross-field)
  • · Stale record flags (last activity threshold)
  • · Row completeness percentage
  • · CRM hygiene quality score (0-100)
  • · Prioritized cleanup plan with merge/delete recommendations
Requires
  • · Bash + Python one-liners for CSV parsing and fuzzy matching
Preconditions
  • · CSV export exists at provided path
  • · At least 10 rows of data (optionally samples first 10K rows for large CSVs)
  • · Standard CRM column names recognizable
Failure modes
  • · CSV encoding mismatches (UTF-8 vs. Latin-1) → garbled data
  • · Column mapping wrong (email in 'Email Address', fuzzy names) → duplicates missed
  • · No date columns → cannot detect stale records
  • · Critical field list not provided → defaults to generic columns, missing business context
Trust signals
  • · Four-step methodology with explicit skip warnings
  • · Three types of duplicates with confidence levels and normalization rules
  • · Fuzzy match criteria: company name normalization (strip legal suffixes, punctuation), name variants (Bob/Robert), edit distance ≤2
  • · Large CSV sampling rule (first 10K rows for samples ≥ 10K, extrapolate results)
  • · Recommendation per duplicate group with 'keep most complete + recent' rule