Moderate LLM outputs with LlamaGuard
llamaguardskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Classify text for 6 safety categories
Best for
Production input/output filtering where you need a specialized 7B moderation model instead of general LLM.
Inputs
- · Conversation turns
- · User prompts or bot responses
- · Safety context
Outputs
- · Classification (safe/unsafe)
- · Category (S1-S6)
- · Confidence
Requires
- · transformers
- · torch
- · vllm (optional)
Preconditions
HuggingFace auth token; 8GB VRAM for 7B model
Failure modes
- · False positives (over-blocking)
- · Category ambiguity (S3 vs. S4)
- · Truncated text loss
Trust signals
- · Meta's specialized safety model
- · 94-95% accuracy on safety benchmarks