Track data provenance for publication
data-provenanceskillsetup L2★35
ammawla/encode-toolkit ↗What it does
Log comprehensive provenance for every ENCODE analysis operation
Best for
Publishing genomics workflows where 'we aligned with STAR' is too vague—lock down exact versions, genomes, parameters, and accessions for publication-ready methods.
Inputs
- · ENCODE file accessions
- · tool/version/reference metadata
- · script source code
- · parameters and timestamps
Outputs
- · experiment_log.json (audit trail)
- · reproducible methods section (auto-generated)
- · operation logs (per-step provenance)
Requires
- · ENCODE portal
- · bedtools/STAR/liftover versions tracked
- · custom script storage
Preconditions
ENCODE experiment selected; tool versions installed; script paths defined
Failure modes
- · vague methods text generated if tool versions not logged
- · missing reference file version breaks reproducibility
- · parameters omitted from log → methods incomplete
Trust signals
- · GENCODE v38 vs v39 annotation difference example
- · MD5 checksum tracking per file
- · Publication-grade methods writing standard