cyberneticlibrary

Reduce LLM latency and cost with caching

prompt-cachingskillsetup L2★0

Sheshiyer/skill-clusters ↗

What it does

Cache LLM prompts and responses to reduce cost and latency

Best for

Repeated queries with stable context where latency and cost matter more than freshness.

Inputs

· Prompt text
· System instructions
· Context documents

Outputs

· Cached tokens metadata
· Cost savings estimate

Requires

· Anthropic API
· Redis
· OpenAI API

Preconditions

· Stable prompt prefix or repeated queries
· API access key

Failure modes

· Cache invalidation on semantic drift
· Stale cached responses

Trust signals

· Anthropic native cache_control API
· Code examples for CAG pattern
· Redis integration patterns