Monitor LLM traces and evals in production
phoenix-observabilityskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Trace, evaluate, and monitor LLM applications with observability tooling
Best for
Production LLM systems needing detailed observability without vendor lock-in or cost overhead.
Inputs
- · LLM framework code (OpenAI, LangChain, LlamaIndex, Anthropic)
- · Evaluation datasets
- · Custom evaluator functions
Outputs
- · Trace visualizations in web UI
- · Evaluation scorecards
- · Real-time monitoring dashboards
- · Experiment comparison reports
Requires
- · arize-phoenix (12.0+)
- · OpenTelemetry
- · PostgreSQL or SQLite backend
- · OpenAI/LangChain/LlamaIndex SDKs
Preconditions
- · Python 3.8+
- · GPU optional
- · Self-hosted or cloud deployment
Failure modes
- · Traces lost if server not running
- · Incorrect instrumentation skips spans
- · Large-scale tracing may slow inference
- · Database scaling required for production
Trust signals
- · OpenTelemetry standard instrumentation
- · Self-hosted control
- · MIT licensed
- · Used for framework-wide tracing integration