Detect prompt injection attacks
prompt-guardskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Detect adversarial prompts and jailbreaks
Best for
When you need lightweight client-side jailbreak detection before sending to LLM.
Inputs
- · User prompt
- · System message context
Outputs
- · Classification (benign/jailbreak)
- · Risk score
- · Attack pattern
Preconditions
Prompt text; model-specific training data
Failure modes
- · False positives on legitimate edge-case queries
- · Evasion via paraphrasing
Trust signals
- · Specialized for adversarial prompt detection
- · Fast inference