Analyze and generate multimodal content
omnimediaskillsetup L2★0
vanducng/skills ↗What it does
Process and generate multimodal content
Best for
Processing audio/video/images with Gemini or generating images via Codex when you need multimodal analysis with auto-cascade on rate limit.
Inputs
- · media file (PNG/JPG/PDF/WAV/MP3)
- · task (transcribe|analyze|generate)
- · prompt
Outputs
- · transcription text
- · analysis summary
- · generated image/video
Requires
- · Google Gemini API
- · Codex CLI
- · OpenRouter
- · MiniMax API
Preconditions
- · Gemini API key or Codex login
- · Media file accessible
Failure modes
- · API rate limit → cascade fallback
- · Unsupported format → analysis fails
Trust signals
- · Ships stdlib-only tools
- · Open source: vanducng/skills