Build real-time voice AI apps
voice-ai-developmentskillsetup L3★0
Sheshiyer/skill-clusters ↗What it does
Build voice AI apps with transcription + synthesis
Best for
Build conversational agents or voice-first UIs without training custom models
Inputs
- · audio file path
- · text for TTS
- · voice/model selection
Outputs
- · transcript text
- · synthesis audio
- · confidence scores
Requires
- · OpenAI Whisper
- · ElevenLabs/Google TTS
- · FFmpeg
Preconditions
Audio format WAV/MP3/OGG supported by API
Failure modes
- · Transcription fails on audio <0.5s or >600s
- · TTS API rate-limit after 100+ requests/min
Trust signals
- · Whisper supports 99 languages
- · ElevenLabs voice cloning via API