Generative Audio: State of the Art in Music AI

Generative Audio

While AI image generation grabbed the headlines, generative audio has quietly achieved remarkable milestones. We explore the models powering text-to-music and text-to-sfx generation.

Table of contents:

Transformers in Audio

Audio is continuous, high-fidelity data. Models like MusicGen use discrete audio tokens (via EnCodec) to allow standard transformer architectures to operate on sound.

Conditioning on Melody

Text isn't enough to guide creativity. Recent models allow conditioning on existing melodies (humming or MIDI), giving producers precise structural control over generation.

Licensing and Copyright

The legal landscape of generative music is complex. We stick to models trained strictly on licensed or public-domain datasets for our commercial projects.

Contact

Let's talk.

A direct line to the team behind the work. No account managers, no briefing relay between departments. Tell us about your next project and we'll reply within 24 hours with concrete next steps.

Response Within 24 hours, direct from the team

Available  •  Remote-first, worldwide

Briefing

Send us a short briefing.