cyberneticlibrary

Extend model context windows to 32k-128k tokens

long-contextskillsetup L39,423
Orchestra-Research/AI-Research-SKILLs
What it does

Extend LLM context windows beyond pre-trained limits

Best for

Processing long documents (32k-128k+ tokens) by extending pre-trained models with RoPE/YaRN interpolation.

Inputs
  • · base model
  • · rope_scaling config
  • · long documents (32k+ tokens)
Outputs
  • · extended-context model
  • · RoPE/YaRN embeddings
Requires
  • · transformers
  • · torch
  • · flash-attn (optional)
Preconditions
  • · base model compatible
  • · position encoding updateable
Failure modes
  • · extrapolation artifacts
  • · attention OOM on very long
  • · config incompatible
Trust signals
  • · RoPE: decaying inter-token dependency
  • · YaRN: NTK-aware interpolation
  • · ALiBi: no retraining needed
  • · position interpolation proven