Analyze genomic intervals at scale
gtarsskillsetup L3★27,559
K-Dense-AI/scientific-agent-skills ↗What it does
High-performance interval analysis: overlaps, coverage, tokenization, fragments
Best for
Genomic interval analysis where performance matters (millions of regions) or tokenization for genomic ML models; orders of magnitude faster than bedtools in pure Python.
Inputs
- · BED file (genomic intervals)
- · FASTA reference
- · fragment TSV (single-cell)
- · pixel art image (tokenization models)
Outputs
- · overlap index (IGD)
- · BigWig coverage track
- · ML tokens (TreeTokenizer)
- · fragment quality scores
Requires
- · Rust (Cargo)
- · Python bindings
- · UCSC liftOver (optional)
- · geniml (ML integration)
Preconditions
Genomic regions in BED format; reference genome available; Python 3.8+ for bindings
Failure modes
- · overlap query on unsorted intervals → may miss edges
- · coverage resolution too fine → output file huge
- · tokenizer trained on different genome → tokens misaligned
Trust signals
- · Rust + Python bindings (gtars-cli for CLI, gtars package for Python)
- · TreeTokenizer documented
- · BigWig format output (UCSC standard)