cyberneticlibrary

Monitor LLM traces and evals in production

phoenix-observabilityskillsetup L2★9,423

Orchestra-Research/AI-Research-SKILLs ↗

What it does

Trace, evaluate, and monitor LLM applications with observability tooling

Best for

Production LLM systems needing detailed observability without vendor lock-in or cost overhead.

Inputs

· LLM framework code (OpenAI, LangChain, LlamaIndex, Anthropic)
· Evaluation datasets
· Custom evaluator functions

Outputs

· Trace visualizations in web UI
· Evaluation scorecards
· Real-time monitoring dashboards
· Experiment comparison reports

Requires

· arize-phoenix (12.0+)
· OpenTelemetry
· PostgreSQL or SQLite backend
· OpenAI/LangChain/LlamaIndex SDKs

Preconditions

· Python 3.8+
· GPU optional
· Self-hosted or cloud deployment

Failure modes

· Traces lost if server not running
· Incorrect instrumentation skips spans
· Large-scale tracing may slow inference
· Database scaling required for production

Trust signals

· OpenTelemetry standard instrumentation
· Self-hosted control
· MIT licensed
· Used for framework-wide tracing integration