cyberneticlibrary

Transcribe audio and recognize speech in 99 languages

whisperskillsetup L2★9,423

Orchestra-Research/AI-Research-SKILLs ↗

What it does

Transcribe speech to text in 99 languages

Best for

Robust multilingual ASR on noisy audio; 72.9k GitHub stars, MIT licensed, 680k training hours.

Inputs

· audio file (mp3/wav/m4a)
· language (optional)
· task (transcribe|translate)
· initial_prompt (optional)

Outputs

· transcribed text
· segments with timestamps
· language detected

Requires

· openai-whisper
· ffmpeg
· torch (GPU optional)

Preconditions

· audio file valid
· Python 3.8-3.11
· ffmpeg installed

Failure modes

· corrupt audio file
· unsupported format
· memory exhaustion on large model

Trust signals

· 99 languages supported
· 6 model sizes (39M-1550M params)
· 10-20x GPU speedup
· MIT license