cyberneticlibrary

Reduce LLM latency and cost with caching

prompt-cachingskillsetup L20
Sheshiyer/skill-clusters
What it does

Cache LLM prompts and responses to reduce cost and latency

Best for

Repeated queries with stable context where latency and cost matter more than freshness.

Inputs
  • · Prompt text
  • · System instructions
  • · Context documents
Outputs
  • · Cached tokens metadata
  • · Cost savings estimate
Requires
  • · Anthropic API
  • · Redis
  • · OpenAI API
Preconditions
  • · Stable prompt prefix or repeated queries
  • · API access key
Failure modes
  • · Cache invalidation on semantic drift
  • · Stale cached responses
Trust signals
  • · Anthropic native cache_control API
  • · Code examples for CAG pattern
  • · Redis integration patterns