Transcribe audio and recognize speech in 99 languages
whisperskillsetup L2★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Transcribe speech to text in 99 languages
Best for
Robust multilingual ASR on noisy audio; 72.9k GitHub stars, MIT licensed, 680k training hours.
Inputs
- · audio file (mp3/wav/m4a)
- · language (optional)
- · task (transcribe|translate)
- · initial_prompt (optional)
Outputs
- · transcribed text
- · segments with timestamps
- · language detected
Requires
- · openai-whisper
- · ffmpeg
- · torch (GPU optional)
Preconditions
- · audio file valid
- · Python 3.8-3.11
- · ffmpeg installed
Failure modes
- · corrupt audio file
- · unsupported format
- · memory exhaustion on large model
Trust signals
- · 99 languages supported
- · 6 model sizes (39M-1550M params)
- · 10-20x GPU speedup
- · MIT license