Deploy ML models serverless with GPUs
modal-serverless-gpuskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Deploy ML workloads to Modal serverless GPU endpoints
Best for
Deploying inference-only ML models without managing containers or servers; pay-per-invocation.
Inputs
- · Python function (trained model, inference logic)
- · GPU type required
Outputs
- · HTTPS endpoint URL
- · invocation method (REST/webhook)
Requires
- · modal SDK
- · containerization (implicit)
Preconditions
Modal account with GPU quota, function packaged as Python module
Failure modes
- · cold start latency (spin-up time)
- · timeout on long inference
- · GPU memory exceeded
Trust signals
- · serverless cost model explained
- · cold-start time quantified
- · GPU endpoint example provided