cyberneticlibrary

Benchmark ATOM vLLM Performance

atom-vllm-benchmark-guidecommandsetup L1104
ROCm/ATOM
What it does

Run and analyze ATOM vLLM benchmark performance across concurrency/throughput metrics

Best for

Performance A/B testing of ATOM vLLM plugin when precise, reproducible metrics across concurrency levels and throughput profiles are required.

Inputs
  • · Model path (HuggingFace ID or local)
  • · Hardware shape (GPU type/count, tensor_parallel_size)
  • · Request mix (input/output sequence lengths, concurrency levels)
  • · Comparison target (plugin on/off, upstream vLLM)
Outputs
  • · Median throughput, TTFT, TPOT, E2EL metrics per concurrency point
  • · Comparative analysis table (candidate vs baseline)
  • · Reproducible commands and environment state
Requires
  • · vllm serve
  • · vllm bench serve
  • · Docker (for concurrency isolation)
  • · rocm-smi (VRAM verification)
Preconditions
  • · ATOM vLLM or upstream vLLM container image available
  • · GPU with sufficient VRAM for model + batch size
  • · Ollama/localhost:8000 free and reachable
  • · Concurrency isolation via fresh containers per point
Failure modes
  • · VRAM exhaustion = OOM during benchmark, incomplete results
  • · Server not ready = curl http://localhost:8000/v1/models fails
  • · Container reuse across concurrency points = timing artifacts, unreliable comparison
  • · Log level too verbose = rocm-smi output polluted, metrics hard to parse
  • · Recipe not found = fallback to standard config, may not be model-optimal
Trust signals
  • · Enforces fresh container per concurrency point (eliminates state carryover)
  • · Cites recipe-first policy (use model-specific settings if available)
  • · Includes environment checklist (cache clear, VRAM verification, log levels)
  • · Provides full benchmark matrix (6 concurrency levels × 3 ISL/OSL combinations)