Generative Audio: State of the Art in Music AI
While AI image generation grabbed the headlines, generative audio has quietly achieved remarkable milestones. We explore the models powering text-to-music and text-to-sfx generation.
Table of contents:
Transformers in Audio
Audio is continuous, high-fidelity data. Models like MusicGen use discrete audio tokens (via EnCodec) to allow standard transformer architectures to operate on sound.
Conditioning on Melody
Text isn't enough to guide creativity. Recent models allow conditioning on existing melodies (humming or MIDI), giving producers precise structural control over generation.
Licensing and Copyright
The legal landscape of generative music is complex. We stick to models trained strictly on licensed or public-domain datasets for our commercial projects.