Evaluate robot manipulation policies
evaluating-cosmos-policyskillsetup L4★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Evaluate NVIDIA Cosmos Policy on robot manipulation simulation tasks headless
Best for
Evaluating vision-language-action models on sim-to-real robotics tasks with detailed latency profiling.
Inputs
- · Cosmos Policy checkpoint path
- · LIBERO or RoboCasa task suite name
- · Dataset statistics JSON
- · T5 text embeddings pickle file
Outputs
- · Success rate per task
- · Inference latency profiles
- · Action prediction logs
Requires
- · NVIDIA cosmos-policy from git
- · RoboCasa simulation environment
- · LIBERO task suite
- · CUDA/GPU
- · MuJoCo 3.0+
Preconditions
- · GPU with CUDA support
- · LIBERO or RoboCasa installed and calibrated
- · Pretrained model weights accessible
Failure modes
- · EGL rendering headless-only (CPU rendering fails)
- · Dataset statistics mismatch causes action scaling errors
- · Inference timeout on slow GPUs
- · Task environment randomization breaks determinism
Trust signals
- · Official evaluation script from public cosmos-policy repo
- · Headless GPU rendering via EGL
- · Deterministic mode for reproducibility
- · Configuration includes all hyperparameters