The future is here. AI has infiltrated the audio industry from every angle, and no corner of our ecosystem will be impervious to its accelerating influence. The ramifications of this new dawn are far from clear, but it’s high time we took stock of where it’s leading us regardless: economically, artistically and culturally… if that’s even possible.
When your favourite mix can be picked apart like Lego and the vast bulk of every recording can be deconstructed back to its constituent parts, every element of every song ever recorded is potentially a sample library ready for harvesting (or very soon will be). When streaming services no longer play the music of artists from around the world but instead – without any disclosure or disclaimer – deploy AI to manufacture their own content that’s foisted upon an unwitting or otherwise passive or undiscerning audience, the music industry becomes fractured beyond recognition, leading us rapidly into unknown musical, economic and legal territory. And when platforms like ChatGPT and Udio encourage people to forgo the use of their own skills and human faculties, and instead use AI to churn out songs for them, a brave new world opens up that looks decidedly bleak indeed.
I could go on and on… and on.
So where are we now, and where are we headed? Do we know? Does anybody even care?
From where I stand in my corner of the audio industry as part of the wider Murky Way entertainment galaxy, grasping the ramifications of AI is like solving a Rubik’s Cube, featuring an unknown number of sides and colours, blindfolded. You could twist it this way or that, up or down, left or right, but in the end, it remains a scrambled mess.
I don’t pretend to know how many sides or colours this new AI puzzle is comprised of, nor do I have the time to sit in a corner like a Zen master and contemplate the future of our industry in any meaningful or informed way. Like everyone else, I am only partially informed, superficially aware… and yet, like everyone else, I’m adopting AI tools myself, regardless.
Is this a good thing? I’m not sure. There’s an upside, without doubt, but something tells me there’s also a downside that potentially runs deep and dark that we will have no way of rescinding later if we decide we don’t especially like where we’ve ended up. It will be far too little too late by then. Already is… (and I suspect AI is quietly chuckling to itself in a cloud somewhere right now at the very prospect of being told by humans: “Sorry, no, we’ve changed our mind about you…”)
The Upside
From the narrowest, professional audio perspective, the adoption of countless fantastic new AI-powered tools that can separate out sounds, remove unwanted spill, room acoustics and noise from tracks seems like a no-brainer. Tools that time and phase align, pitch correct, balance tone and dynamics in ways that were only a few years ago the work of science fiction, are impossible to ignore or resist. They make the job easier, better, faster and more enjoyable.
To that end, DAW platforms and plug-ins are flooding our audio ecosystems with new AI-powered products in a veritable tsunami of ground-breaking options. Apple Logic 11, for example, offers a new tool called Stem Splitter that’s available by simply right-clicking on an audio file, whereupon the sounds in that file – an old stereo mix perhaps, for which there is no longer a multi-track session – can be deconstructed into Vocals, Drums, Bass and ‘Other’ in mere seconds. And right out of the gate in Version 1, Logic 11 performs this magic trick very well. This is a massive advance (and if you make Karaoke backing tracks for a living, now might be the time to apply for a job in local government).
No doubt, in months, weeks or mere days from now – if it hasn’t happened already, such is the whiplash-inducing accelerated response time of AI – every audio software developer will have such a tool, and the ramifications of this single process will be far-reaching.
Plug-ins like Supertone Clear and Black Box Audio Silencer abound. Software of this ilk uses AI to perform veritable magic tricks with background noise.
Silencer, for instance, gets your messy, spill-laden, phase-incoherent multi-track drum recordings and quasi ‘isolates’ each individual drum from the others. From the snare track you can remove that ugly, overbearing hi-hat spill and horrible kick drum sound in ways you never could before; from the tom mics you can ditch the snare, cymbals and kick drum; from the kick you can lose that phase-incoherent snare sound, and so on and so forth. Silencer does this effortlessly, opening up a whole new world of drum miking, recording and mixing potential. And I suspect there are 100 plug-ins like it that work just as well.
Another similar plug-in, Supertone Clear (formerly known as Goyo), does the same sort of thing with room ambience and background noise: namely, it turns down (or off altogether) the background ambience behind any form of vocal or dialog at the mere turn of a virtual knob. It does it well, does it effortlessly, and represents a game-changer for anyone who lacks a million-dollar studio in their backyard or suffers from a dodgy room acoustic… i.e., everyone. This single concept alone recalibrates my own thinking entirely about tracking a live band in a room. Spill begone; the power of this little plug-in potentially opens up whole new ways of recording, and new expectations around what a mix can later achieve.
This sort of control over separation and spill was literally science fiction up until very recently. Who needs walls for isolation anymore when you can simply dial things out? It begs the question…
What looms from AI in coming days, weeks and months, if already there are new tools as powerful as these, only time will tell. But a word of warning: if you look ahead at what’s coming down the road and you see a shiny robot with menacing red eyes… run!
The Downsides
The known downsides are already taking hold, and they are widespread. What we can’t know, or yet see coming, I’ll get to in a moment (assuming my laptop doesn’t become self-aware and electrocute me).
An artist’s copyright in the broadest of terms includes the way they express themselves via the various artforms in which they work. The songs they pen, the painting they produce, the recordings they make, the sound of their voice etc, are all implicitly owned by the artist. It is precisely these things that any artist, and the mechanisms that might support them, trades upon to derive an income. When these are removed or usurped by some other entity, it’s to the extreme disadvantage of the artist, their supporting industries, and ultimately the wider culture they inhabit.
This is what is happening right now thanks to the innocuous, harmless and yet ferociously monstrous power of AI. Singers are having their voices mimicked by AI and then finding themselves in a battle over who owns the sound of their voice, in a truly brain-melting metaphysical twist of logic. Visual artists are having their work ‘harvested’ by giant, amoral digital platforms like Facebook and Instagram, which are then using these same artists – their human skills, idiosyncrasies and artistry – as a form of learning fodder for their AI algorithms, and for which artists have neither consented nor been compensated. If it were bank notes Meta was harvesting, it would be described as the crime of the century, making Oceans 11 look like someone breaking into their little sister’s piggy bank.
While we’re all celebrating AI’s new power with respect to our workflows, efficiencies and artistic endeavours, in the background so much is shifting that by the time we look up from our computer screens after a day at the controls, the very framework inside which we’re working is metamorphosising and metastasising around us. If we don’t make some new efforts globally to place guardrails around the ownership of our work, our art, and our products we may find that, while AI provides us all these fancy new tools to work with, everything we make with them will be valueless.
Inside this cheery economic reality, the edifice of art and culture as we know it collapses entirely, or at the very least becomes unrecognisable.
The Other Sides
Without wanting to sound like a fearmonger, it’s not hyperbolic or beyond the scope of one’s imagination to suggest that artificial intelligence may turn out to be the 21st century’s equivalent of splitting the atom.
AI may change everything, and rapidly. So much so that conventional clichéd phrases like ‘change occurs generationally’, may have to be superseded by a far shorter timeframe. We may have to describe our youngest generation as Generation Next (Week), or Gen Y (Bother), and terms like ‘back in my day’ will have to be replaced by ‘earlier today’.
And what jobs will be available in the arts and entertainment industries is anyone’s guess. Will it replace mix engineers in the studio in the same way as it’s replacing songwriters? Will mastering engineers be replaced with a fifty-dollar plug-in?
Standing on the edge of this AI tipping point, it’s hard to tell what jobs in our industry will disappear and which roles will survive, but I strongly suspect that the saviour of music in general – the place from which a new counter-culture of sound will surely emerge – will be live performance; a place where, for the foreseeable future at least, AI has no influence. The only problem with this theory, unfortunately, is that the number of venues that young bands can access to showcase their music is shrinking.
Meanwhile, in the studio, computers are seemingly taking over everything, and in this environment, AI is untrustworthy and dangerous. We certainly can’t have much faith in computers showing moral constraint… and weirdly this one is suddenly getting very hot, and….
> > >
Andy Stewart once owned and operated The Mill in Victoria, a world-class production, mixing and mastering facility. He had recently been taken out to a place where he can no longer harm AI or cast aspersions against robots generally. Any pleas for pro audio help should now be directed to borg.org, or you could still try your luck with: andy@themill.net.au