cyberneticlibrary

Match images to text semantically

clipskillsetup L29,423
Orchestra-Research/AI-Research-SKILLs
What it does

Classify images and match text to images using zero-shot visual semantics

Best for

Quick zero-shot image classification and semantic search when training data unavailable and model is open-source.

Inputs
  • · Images (JPEG/PNG)
  • · Candidate text labels (3-100+ options)
Outputs
  • · Classification probabilities across labels
  • · Ranked image-text similarity scores
Requires
  • · OpenAI CLIP model
  • · torch
  • · torchvision
  • · Pillow
Preconditions
  • · GPU optional but recommended
  • · Pre-trained model weights cached locally
Failure modes
  • · Ambiguous labels produce tied scores
  • · Zero-shot fails on domain-specific or rare concepts
  • · Batch size limited by VRAM
  • · Biases from 400M training corpus inherit in classifications
Trust signals
  • · OpenAI-trained on 400M image-text pairs
  • · Matches ResNet-50 on ImageNet zero-shot
  • · MIT licensed
  • · 25,300+ GitHub stars