Run local inference without cloud costs

local-llm-bridgeskillsetup L33
richfrem/agent-plugins-skills
What it does

Route bounded tasks to local Gemma LLM

Best for

Sub-second bounded tasks using local Gemma without cloud latency

Inputs
  • · Task prompt
  • · Persona
Outputs
  • · Task output file
Requires
  • · llama-server
  • · Python
Preconditions

llama-server running on localhost:8089

Failure modes

llama-server not running; command fails

Trust signals
  • · Measured 2s latency for typical tasks
  • · KV cache orchestration included