cyberneticlibrary

Standardize and harmonize gene names

skillskillsetup L11
chansigit/stangene
What it does

Harmonize gene identifiers across single-cell transcriptomics datasets

Best for

Cross-dataset gene alignment in single-cell RNA-seq when feature naming standards differ.

Inputs
  • · h5ad, .tsv, or .csv files with feature names; target species (human/mouse)
Outputs
  • · harmonization_table.tsv, summary.json, conflicts.tsv, unmapped.tsv, optional *_harmonized.h5ad
Requires
  • · stangene (Python package), Ollama references (downloads ~15MB human, ~7MB mouse)
Preconditions
  • · stangene installed
  • · References built for target species
  • · Species specified explicitly
Failure modes
  • · Auto-resolved ambiguities without user approval
  • · Overwriting original identifiers
  • · Cross-species harmonization without per-species passes
Trust signals
  • · Conservative merge optional (policy: strict or symbol)
  • · Never auto-resolves — surfaces conflicts for user decision