cyberneticlibrary

Run protein folding and embeddings models

esmskillsetup L227,559
K-Dense-AI/scientific-agent-skills
What it does

Generate protein embeddings and search protein sequence databases

Best for

Computing per-residue embeddings for downstream ML tasks (docking, binding prediction, active-site detection) when you need a pretrained protein encoder.

Inputs
  • · protein FASTA sequence
  • · ESM model variant (ESM1b/ESM2/ESMFold)
Outputs
  • · 768-dim or 1280-dim embeddings per token
  • · folded 3D structure (ESMFold)
Requires
  • · PyTorch
  • · fair-esm package
  • · optional: CPU ok but GPU 10x faster
Preconditions

Python 3.8+; ESM model weights auto-download on first use (~500MB); CUDA optional

Failure modes

Embeddings frozen at model release (not fine-tunable in this context); sequence length limits (>1024 tokens need chunking); memory-hungry on CPU

Trust signals
  • · State-of-the-art protein language model
  • · ESMFold included for structure prediction
  • · Token-level embeddings for fine-grained analysis
  • · ESM2 256M up to 15B parameter variants