Audit and clean CRM data quality
crm-hygiene-scannerskillsetup L2★40
Othmane-Khadri/gtm-engineer-playbook ↗What it does
Audit CRM data quality, detect duplicates, flag stale records, score completeness
Best for
CRM admin or marketing ops cleaning up legacy data before a migration or merge campaign—measure health, detect garbage, plan sprints.
Inputs
- · CSV export file path (contacts, companies, or deals)
- · Data type (contacts/companies/deals)
- · CRM system (HubSpot, Salesforce, Pipedrive, other)
- · Critical fields to measure (email, phone, company name, deal stage, last activity, owner)
Outputs
- · Data profile: total records, columns, data types, fill rate per column, date range, unique vs. total
- · Duplicate groups with confidence levels (HIGH: exact email match; MEDIUM: fuzzy name match or cross-field)
- · Stale record flags (last activity threshold)
- · Row completeness percentage
- · CRM hygiene quality score (0-100)
- · Prioritized cleanup plan with merge/delete recommendations
Requires
- · Bash + Python one-liners for CSV parsing and fuzzy matching
Preconditions
- · CSV export exists at provided path
- · At least 10 rows of data (optionally samples first 10K rows for large CSVs)
- · Standard CRM column names recognizable
Failure modes
- · CSV encoding mismatches (UTF-8 vs. Latin-1) → garbled data
- · Column mapping wrong (email in 'Email Address', fuzzy names) → duplicates missed
- · No date columns → cannot detect stale records
- · Critical field list not provided → defaults to generic columns, missing business context
Trust signals
- · Four-step methodology with explicit skip warnings
- · Three types of duplicates with confidence levels and normalization rules
- · Fuzzy match criteria: company name normalization (strip legal suffixes, punctuation), name variants (Bob/Robert), edit distance ≤2
- · Large CSV sampling rule (first 10K rows for samples ≥ 10K, extrapolate results)
- · Recommendation per duplicate group with 'keep most complete + recent' rule