Fine-tune 70B models with <1% parameters

peft-fine-tuningskillsetup L39,423
Orchestra-Research/AI-Research-SKILLs
What it does

Fine-tune large language models with <1% trainable parameters via LoRA

Best for

Fine-tuning large models (70B+) on consumer hardware by training only 0.17% of parameters in a 6MB adapter, enabling cost-effective task-specific customization without full-model training.

Inputs
  • · Pretrained base model (7B-70B parameter LLM)
  • · Training dataset (instruction-response pairs or domain text)
  • · LoRA hyperparameters (rank r, alpha, target modules, dropout)
  • · Training arguments (learning rate, batch size, epochs, optimizer)
Outputs
  • · LoRA adapter weights (6MB-100MB, not full model)
  • · Merged model (base + adapter) for inference or deployment
  • · Training logs and loss curves
  • · Quantized adapter weights (if using QLoRA)
Requires
  • · peft>=0.13.0 (HuggingFace Parameter-Efficient Fine-Tuning)
  • · transformers>=4.45.0
  • · torch>=2.0.0
  • · bitsandbytes>=0.43.0 (for QLoRA 4-bit quantization)
  • · accelerate
  • · datasets
Preconditions
  • · GPU memory: LoRA requires ~2x model size in VRAM; QLoRA reduces to ~25% of model size
  • · Base model available from HuggingFace or local path
  • · Training data in standard format (HF datasets, CSV, JSONL)
  • · Python environment with pip and CUDA 11.8+ (for GPU acceleration)
Failure modes
  • · OOM (out of memory) → increase gradient_accumulation_steps or use QLoRA
  • · Rank too low (r=4) → adapter capacity insufficient, poor downstream accuracy
  • · Rank too high (r=64 on 7B model) → overfitting, slow training, marginal gains
  • · Wrong target_modules (e.g., missing gate_proj for Llama) → misses key parameters
  • · Adapter not merged before inference → requires special loading pipeline
  • · Quantization artifacts (QLoRA) → ~5% quality loss is expected
Trust signals
  • · Backed by HuggingFace PEFT library (actively maintained, 10K+ GitHub stars)
  • · LoRA paper (Hu et al. 2021) shows <1% parameter training matches full fine-tuning in downstream tasks
  • · QLoRA paper (Dettmers et al. 2023) validates 4-bit quantization on 65B models
  • · Used in production by major AI companies (Anthropic, OpenAI leverage similar techniques)
  • · Rank/alpha selection guide based on empirical benchmarks (r=16, alpha=32 is tuned default)
  • · Target modules automatically selected per architecture (Llama, Mistral, Qwen, GPT-2, Falcon)