Detect prompt injection attacks

prompt-guardskillsetup L29,423
Orchestra-Research/AI-Research-SKILLs
What it does

Detect adversarial prompts and jailbreaks

Best for

When you need lightweight client-side jailbreak detection before sending to LLM.

Inputs
  • · User prompt
  • · System message context
Outputs
  • · Classification (benign/jailbreak)
  • · Risk score
  • · Attack pattern
Preconditions

Prompt text; model-specific training data

Failure modes
  • · False positives on legitimate edge-case queries
  • · Evasion via paraphrasing
Trust signals
  • · Specialized for adversarial prompt detection
  • · Fast inference