cyberneticlibrary

Transcribe audio and recognize speech in 99 languages

whisperskillsetup L29,423
Orchestra-Research/AI-Research-SKILLs
What it does

Transcribe speech to text in 99 languages

Best for

Robust multilingual ASR on noisy audio; 72.9k GitHub stars, MIT licensed, 680k training hours.

Inputs
  • · audio file (mp3/wav/m4a)
  • · language (optional)
  • · task (transcribe|translate)
  • · initial_prompt (optional)
Outputs
  • · transcribed text
  • · segments with timestamps
  • · language detected
Requires
  • · openai-whisper
  • · ffmpeg
  • · torch (GPU optional)
Preconditions
  • · audio file valid
  • · Python 3.8-3.11
  • · ffmpeg installed
Failure modes
  • · corrupt audio file
  • · unsupported format
  • · memory exhaustion on large model
Trust signals
  • · 99 languages supported
  • · 6 model sizes (39M-1550M params)
  • · 10-20x GPU speedup
  • · MIT license