Generate music and sound from text
audiocraft-audio-generationskillsetup L3★9,423
Orchestra-Research/AI-Research-SKILLs ↗What it does
Generate music and audio from text descriptions at multiple scales
Best for
Applications requiring controllable, variable-length music generation without licensing constraints.
Inputs
- · Text descriptions (e.g., 'happy upbeat electronic dance music')
- · Optional melody conditioning (for MusicGen-melody)
- · Optional reference audio (for style transfer)
Outputs
- · Stereo WAV audio files (32kHz for MusicGen, 16kHz for AudioGen)
- · Variable duration (1-30 seconds)
Requires
- · audiocraft library
- · PyTorch 2.0+
- · Hugging Face Transformers 4.30+
Preconditions
- · GPU required (CUDA/Metal)
- · 8GB+ VRAM for large models
- · Sufficient disk for model weights (3.3B max)
Failure modes
- · Memory overflow on small GPUs
- · Generated audio may not match description precisely
- · Melody conditioning requires careful audio preprocessing
- · Temperature/cfg_coef tuning needed per use case
Trust signals
- · Meta-built
- · Production model sizes from 300M to 3.3B parameters
- · Stereo support
- · Multiple conditioning modes documented