Optimize PyTorch model training

ml-training-recipesskillsetup L29,423
Orchestra-Research/AI-Research-SKILLs
What it does

Execute battle-tested PyTorch training recipes across LLM, vision, diffusion, medical imaging domains

Best for

Starting model training quickly with expert-vetted defaults instead of tuning from scratch

Inputs
  • · domain type (LLM, vision, diffusion, etc.)
  • · training config (lr, batch_size, epochs, etc.)
  • · dataset
Outputs
  • · trained model checkpoint
  • · training metrics (loss, accuracy, etc.)
  • · validation results
Requires
  • · torch
  • · pytorch-lightning
  • · transformers
  • · domain-specific libraries
Preconditions

GPU available (NVIDIA/Metal); datasets prepared; hyperparams within sane ranges; memory sufficient for batch_size

Failure modes

NaN loss if learning rate too high; underfitting if epochs too few; overfitting if regularization insufficient; OOM if batch_size too large

Trust signals
  • · Covers 8+ domains (LLM, vision, diffusion, medical, protein, spatial, genomics)
  • · Battle-tested means multiple successful deployments
  • · Recipes include checkpointing strategy