cyberneticlibrary

Analyze genomic intervals at scale

gtarsskillsetup L327,559
K-Dense-AI/scientific-agent-skills
What it does

High-performance interval analysis: overlaps, coverage, tokenization, fragments

Best for

Genomic interval analysis where performance matters (millions of regions) or tokenization for genomic ML models; orders of magnitude faster than bedtools in pure Python.

Inputs
  • · BED file (genomic intervals)
  • · FASTA reference
  • · fragment TSV (single-cell)
  • · pixel art image (tokenization models)
Outputs
  • · overlap index (IGD)
  • · BigWig coverage track
  • · ML tokens (TreeTokenizer)
  • · fragment quality scores
Requires
  • · Rust (Cargo)
  • · Python bindings
  • · UCSC liftOver (optional)
  • · geniml (ML integration)
Preconditions

Genomic regions in BED format; reference genome available; Python 3.8+ for bindings

Failure modes
  • · overlap query on unsorted intervals → may miss edges
  • · coverage resolution too fine → output file huge
  • · tokenizer trained on different genome → tokens misaligned
Trust signals
  • · Rust + Python bindings (gtars-cli for CLI, gtars package for Python)
  • · TreeTokenizer documented
  • · BigWig format output (UCSC standard)