cyberneticlibrary

Generate music and sound from text

audiocraft-audio-generationskillsetup L39,423
Orchestra-Research/AI-Research-SKILLs
What it does

Generate music and audio from text descriptions at multiple scales

Best for

Applications requiring controllable, variable-length music generation without licensing constraints.

Inputs
  • · Text descriptions (e.g., 'happy upbeat electronic dance music')
  • · Optional melody conditioning (for MusicGen-melody)
  • · Optional reference audio (for style transfer)
Outputs
  • · Stereo WAV audio files (32kHz for MusicGen, 16kHz for AudioGen)
  • · Variable duration (1-30 seconds)
Requires
  • · audiocraft library
  • · PyTorch 2.0+
  • · Hugging Face Transformers 4.30+
Preconditions
  • · GPU required (CUDA/Metal)
  • · 8GB+ VRAM for large models
  • · Sufficient disk for model weights (3.3B max)
Failure modes
  • · Memory overflow on small GPUs
  • · Generated audio may not match description precisely
  • · Melody conditioning requires careful audio preprocessing
  • · Temperature/cfg_coef tuning needed per use case
Trust signals
  • · Meta-built
  • · Production model sizes from 300M to 3.3B parameters
  • · Stereo support
  • · Multiple conditioning modes documented