Fine-tune 70B models with <1% parameters
peft-fine-tuningskillsetup L3★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Fine-tune large language models with <1% trainable parameters via LoRA
Best for
Fine-tuning large models (70B+) on consumer hardware by training only 0.17% of parameters in a 6MB adapter, enabling cost-effective task-specific customization without full-model training.
Inputs
- · Pretrained base model (7B-70B parameter LLM)
- · Training dataset (instruction-response pairs or domain text)
- · LoRA hyperparameters (rank r, alpha, target modules, dropout)
- · Training arguments (learning rate, batch size, epochs, optimizer)
Outputs
- · LoRA adapter weights (6MB-100MB, not full model)
- · Merged model (base + adapter) for inference or deployment
- · Training logs and loss curves
- · Quantized adapter weights (if using QLoRA)
Requires
- · peft>=0.13.0 (HuggingFace Parameter-Efficient Fine-Tuning)
- · transformers>=4.45.0
- · torch>=2.0.0
- · bitsandbytes>=0.43.0 (for QLoRA 4-bit quantization)
- · accelerate
- · datasets
Preconditions
- · GPU memory: LoRA requires ~2x model size in VRAM; QLoRA reduces to ~25% of model size
- · Base model available from HuggingFace or local path
- · Training data in standard format (HF datasets, CSV, JSONL)
- · Python environment with pip and CUDA 11.8+ (for GPU acceleration)
Failure modes
- · OOM (out of memory) → increase gradient_accumulation_steps or use QLoRA
- · Rank too low (r=4) → adapter capacity insufficient, poor downstream accuracy
- · Rank too high (r=64 on 7B model) → overfitting, slow training, marginal gains
- · Wrong target_modules (e.g., missing gate_proj for Llama) → misses key parameters
- · Adapter not merged before inference → requires special loading pipeline
- · Quantization artifacts (QLoRA) → ~5% quality loss is expected
Trust signals
- · Backed by HuggingFace PEFT library (actively maintained, 10K+ GitHub stars)
- · LoRA paper (Hu et al. 2021) shows <1% parameter training matches full fine-tuning in downstream tasks
- · QLoRA paper (Dettmers et al. 2023) validates 4-bit quantization on 65B models
- · Used in production by major AI companies (Anthropic, OpenAI leverage similar techniques)
- · Rank/alpha selection guide based on empirical benchmarks (r=16, alpha=32 is tuned default)
- · Target modules automatically selected per architecture (Llama, Mistral, Qwen, GPT-2, Falcon)