cyberneticlibrary

Audit dataset quality and completeness

data-quality-auditorpluginsetup L217,464
alirezarezvani/claude-skills
What it does

Audit datasets for completeness, consistency, accuracy, and validity with DQS scoring

Best for

When preparing data for analysis or ML requires systematic audit of quality issues rather than spot-checking a few rows.

Inputs
  • · dataset file (CSV, Parquet, JSON, database query result)
Outputs
  • · data quality score (DQS) per dimension (completeness, consistency, accuracy, validity)
  • · missing value analysis: MCAR/MAR/MNAR classification
  • · outlier detection: multi-method (IQR, Isolation Forest, Z-score, etc.)
  • · audit report with recommendations
Requires
  • · 3 stdlib-only Python tools (data profiler, missing-value analyzer, multi-method outlier detector)
  • · DQS framework (Gartner reference)
  • · statistical methods (no external ML deps)
Preconditions
  • · dataset is structured (tabular, not unstructured text)
  • · column types identified (numeric, categorical, datetime, etc.)
Failure modes
  • · DQS score masks important outliers in small datasets
  • · MCAR/MAR/MNAR classification unreliable with <50 observations
  • · Multi-method outlier detection produces conflicting flags (human judgment needed)
Trust signals
  • · 3 stdlib-only Python tools (no external deps)
  • · DQS framework (Gartner reference)
  • · MCAR/MAR/MNAR missing-data classification
  • · Multi-method outlier detection (IQR, Isolation Forest, Z-score)