Implement GPT from scratch

nanogptskillsetup L29,423
Orchestra-Research/AI-Research-SKILLs
What it does

Train GPT models from scratch in ~300 lines of readable, hackable PyTorch

Best for

Learning transformer internals, quick prototyping, or experimenting with variants without framework overhead.

Inputs
  • · text dataset
  • · config (batch_size, learning_rate, n_layer, n_head, n_embd)
Outputs
  • · model checkpoint
  • · generated text samples
Requires
  • · torch
  • · numpy
  • · transformers
  • · datasets
  • · tiktoken
  • · wandb
Preconditions
  • · PyTorch installed
  • · CPU or GPU available
Failure modes
  • · CUDA OOM if batch_size not reduced
  • · poor generation if max_iters too low
Trust signals
  • · ~300-line codebase (no abstractions)
  • · reproduces GPT-2 124M
  • · Andrej Karpathy design ethos