Optimize ML costs across cloud providers

skypilot-multi-cloud-orchestrationskillsetup L39,423
Orchestra-Research/AI-Research-SKILLs
What it does

Orchestrate multi-cloud training jobs across AWS/GCP/Azure with cost optimization

Best for

Running large training jobs by automatically selecting cheapest cloud provider and handling spot preemption.

Inputs
  • · training job YAML (code, resource requirements)
  • · cloud budget constraints
Outputs
  • · job status/logs
  • · cost report by cloud provider
Requires
  • · sky CLI
  • · cloud SDKs (boto3, gcloud)
Preconditions

Cloud credentials configured, SkyPilot installed, job packaged as YAML

Failure modes
  • · insufficient quota in chosen region
  • · spot instance preemption
  • · network latency between clouds
Trust signals
  • · multi-cloud cost comparison shown
  • · spot instance recovery documented
  • · job portability examples