cyberneticlibrary

Track data provenance for publication

data-provenanceskillsetup L235
ammawla/encode-toolkit
What it does

Log comprehensive provenance for every ENCODE analysis operation

Best for

Publishing genomics workflows where 'we aligned with STAR' is too vague—lock down exact versions, genomes, parameters, and accessions for publication-ready methods.

Inputs
  • · ENCODE file accessions
  • · tool/version/reference metadata
  • · script source code
  • · parameters and timestamps
Outputs
  • · experiment_log.json (audit trail)
  • · reproducible methods section (auto-generated)
  • · operation logs (per-step provenance)
Requires
  • · ENCODE portal
  • · bedtools/STAR/liftover versions tracked
  • · custom script storage
Preconditions

ENCODE experiment selected; tool versions installed; script paths defined

Failure modes
  • · vague methods text generated if tool versions not logged
  • · missing reference file version breaks reproducibility
  • · parameters omitted from log → methods incomplete
Trust signals
  • · GENCODE v38 vs v39 annotation difference example
  • · MD5 checksum tracking per file
  • · Publication-grade methods writing standard