cyberneticlibrary

Evaluate robot manipulation policies

evaluating-cosmos-policyskillsetup L49,423
Orchestra-Research/AI-Research-SKILLs
What it does

Evaluate NVIDIA Cosmos Policy on robot manipulation simulation tasks headless

Best for

Evaluating vision-language-action models on sim-to-real robotics tasks with detailed latency profiling.

Inputs
  • · Cosmos Policy checkpoint path
  • · LIBERO or RoboCasa task suite name
  • · Dataset statistics JSON
  • · T5 text embeddings pickle file
Outputs
  • · Success rate per task
  • · Inference latency profiles
  • · Action prediction logs
Requires
  • · NVIDIA cosmos-policy from git
  • · RoboCasa simulation environment
  • · LIBERO task suite
  • · CUDA/GPU
  • · MuJoCo 3.0+
Preconditions
  • · GPU with CUDA support
  • · LIBERO or RoboCasa installed and calibrated
  • · Pretrained model weights accessible
Failure modes
  • · EGL rendering headless-only (CPU rendering fails)
  • · Dataset statistics mismatch causes action scaling errors
  • · Inference timeout on slow GPUs
  • · Task environment randomization breaks determinism
Trust signals
  • · Official evaluation script from public cosmos-policy repo
  • · Headless GPU rendering via EGL
  • · Deterministic mode for reproducibility
  • · Configuration includes all hyperparameters