cyberneticlibrary

Build real-time voice AI apps

voice-ai-developmentskillsetup L30
Sheshiyer/skill-clusters
What it does

Build voice AI apps with transcription + synthesis

Best for

Build conversational agents or voice-first UIs without training custom models

Inputs
  • · audio file path
  • · text for TTS
  • · voice/model selection
Outputs
  • · transcript text
  • · synthesis audio
  • · confidence scores
Requires
  • · OpenAI Whisper
  • · ElevenLabs/Google TTS
  • · FFmpeg
Preconditions

Audio format WAV/MP3/OGG supported by API

Failure modes
  • · Transcription fails on audio <0.5s or >600s
  • · TTS API rate-limit after 100+ requests/min
Trust signals
  • · Whisper supports 99 languages
  • · ElevenLabs voice cloning via API