Benchmark ATOM vLLM Performance
atom-vllm-benchmark-guidecommandsetup L1★104
ROCm/ATOM ↗What it does
Run and analyze ATOM vLLM benchmark performance across concurrency/throughput metrics
Best for
Performance A/B testing of ATOM vLLM plugin when precise, reproducible metrics across concurrency levels and throughput profiles are required.
Inputs
- · Model path (HuggingFace ID or local)
- · Hardware shape (GPU type/count, tensor_parallel_size)
- · Request mix (input/output sequence lengths, concurrency levels)
- · Comparison target (plugin on/off, upstream vLLM)
Outputs
- · Median throughput, TTFT, TPOT, E2EL metrics per concurrency point
- · Comparative analysis table (candidate vs baseline)
- · Reproducible commands and environment state
Requires
- · vllm serve
- · vllm bench serve
- · Docker (for concurrency isolation)
- · rocm-smi (VRAM verification)
Preconditions
- · ATOM vLLM or upstream vLLM container image available
- · GPU with sufficient VRAM for model + batch size
- · Ollama/localhost:8000 free and reachable
- · Concurrency isolation via fresh containers per point
Failure modes
- · VRAM exhaustion = OOM during benchmark, incomplete results
- · Server not ready = curl http://localhost:8000/v1/models fails
- · Container reuse across concurrency points = timing artifacts, unreliable comparison
- · Log level too verbose = rocm-smi output polluted, metrics hard to parse
- · Recipe not found = fallback to standard config, may not be model-optimal
Trust signals
- · Enforces fresh container per concurrency point (eliminates state carryover)
- · Cites recipe-first policy (use model-specific settings if available)
- · Includes environment checklist (cache clear, VRAM verification, log levels)
- · Provides full benchmark matrix (6 concurrency levels × 3 ISL/OSL combinations)