Match images to text semantically
clipskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Classify images and match text to images using zero-shot visual semantics
Best for
Quick zero-shot image classification and semantic search when training data unavailable and model is open-source.
Inputs
- · Images (JPEG/PNG)
- · Candidate text labels (3-100+ options)
Outputs
- · Classification probabilities across labels
- · Ranked image-text similarity scores
Requires
- · OpenAI CLIP model
- · torch
- · torchvision
- · Pillow
Preconditions
- · GPU optional but recommended
- · Pre-trained model weights cached locally
Failure modes
- · Ambiguous labels produce tied scores
- · Zero-shot fails on domain-specific or rare concepts
- · Batch size limited by VRAM
- · Biases from 400M training corpus inherit in classifications
Trust signals
- · OpenAI-trained on 400M image-text pairs
- · Matches ResNet-50 on ImageNet zero-shot
- · MIT licensed
- · 25,300+ GitHub stars