cyberneticlibrary

Analyze and generate multimodal content

omnimediaskillsetup L20
vanducng/skills
What it does

Process and generate multimodal content

Best for

Processing audio/video/images with Gemini or generating images via Codex when you need multimodal analysis with auto-cascade on rate limit.

Inputs
  • · media file (PNG/JPG/PDF/WAV/MP3)
  • · task (transcribe|analyze|generate)
  • · prompt
Outputs
  • · transcription text
  • · analysis summary
  • · generated image/video
Requires
  • · Google Gemini API
  • · Codex CLI
  • · OpenRouter
  • · MiniMax API
Preconditions
  • · Gemini API key or Codex login
  • · Media file accessible
Failure modes
  • · API rate limit → cascade fallback
  • · Unsupported format → analysis fails
Trust signals
  • · Ships stdlib-only tools
  • · Open source: vanducng/skills