Run protein folding and embeddings models

esmskillsetup L2★27,559

What it does

Generate protein embeddings and search protein sequence databases

Best for

Computing per-residue embeddings for downstream ML tasks (docking, binding prediction, active-site detection) when you need a pretrained protein encoder.

Inputs

· protein FASTA sequence
· ESM model variant (ESM1b/ESM2/ESMFold)

Outputs

· 768-dim or 1280-dim embeddings per token
· folded 3D structure (ESMFold)

Requires

· PyTorch
· fair-esm package
· optional: CPU ok but GPU 10x faster

Preconditions

Python 3.8+; ESM model weights auto-download on first use (~500MB); CUDA optional

Failure modes

Embeddings frozen at model release (not fine-tunable in this context); sequence length limits (>1024 tokens need chunking); memory-hungry on CPU

Trust signals

· State-of-the-art protein language model
· ESMFold included for structure prediction
· Token-level embeddings for fine-grained analysis
· ESM2 256M up to 15B parameter variants