AI Futures: how artificial intelligence will change music

Artificial intelligence is at the heart of a fundamental shift in music’s role in our lives, and for electronic music, the transition will be seismic. But will it result in a harmonious and utopian new landscape for creators and fans, or is intelligent automation the beginning of a new deepfake culture war? In part one of a three-part series running on DJ Mag digital this week, our online tech editor Declan McGlynn looks into how AI has become one of the most exciting developments in music since the advent of sampling

Declan McGlynn

Tuesday, October 5, 2021 - 11:33

For most people, artificial intelligence brings to mind a futuristic, sci-fi scenario of autonomous robots or machines capable of making their own decisions, and more often than not, resulting in the demise of their human counterparts. For now, the applications of AI are less apocalyptic, like helping drones spot dog poo on footpaths, turning Robert DeNiro German and proving who wrote the Dead Sea Scrolls. WIRED’s excellent AI Database is a good place to look for hundreds of examples: some novelty, some sinister, all fascinating.

In this three-part series, which will run over the next three days, we’re going to explore the potential impact AI is having and will have for modelling an artist’s likeness, how producers and engineers work in the studio, what it means for DJing, and how the hyper-personalisation of our online experience could soon migrate to the way we experience music.

Though many of the concepts touched on in this series are already in motion, their cultural impact is largely yet to be felt, leaving us staring down the barrel of a contradiction. Inevitability usually implies certainty, but the nature of the tech means it’s almost impossible to accurately predict the consequences. What we do know is that the genie is out of the bottle.

In music, AI, and more specifically machine learning (ML), is quietly emerging as the Black Box behind almost all of our interactions with music online. In fact, most of us have unknowingly been using AI- and ML-powered technology for years. Music listening platforms like YouTube, Spotify, Apple Music and Pandora use AI to perfect our experience on their respective services, like recommending us the perfect track to play next, eliminating dead air and adjusting volume in real-time.

"I think this is going to be one of the biggest ethical conversations in music going forward, and we’re so not ready" — Holly Herndon

Machine learning is an arm of AI that essentially teaches a machine how to learn. It uses ‘training data’ to spot patterns and uses those patterns to build a model based on that data. Deep learning is a subset of machine learning that relies on large artificial neural networks to build these models, and it's behind most of the breakthroughs we see in modern AI.

Spotify’s Discover Weekly is the most obvious example of this. Apple’s voice assistant Siri is another; its synthesised speech is learned from real-world recordings, and it better recognises your voice over time. (For the purposes of this article, we’ll use ML as a catch-all term for both machine learning and deep learning.)

But it’s not just recommendations that ML can attempt to master. A form of ML has been used since the 1980s to generate music, when composer David Cope trained a computer on Bach’s catalogue in an attempt to beat writer’s block. More recently, artists like Actress and Holly Herndon have been training models and building data sets on their own music, vocals, stems and style, in an attempt to create a virtual collaborator modelled in their own likeness.

Plugins and music-making software have also begun to adopt ML, with iZotope’s Neutron welcoming a new era for how we produce music in the studio. More recently, Splice, Loopmasters and more are using ML to recommend new samples to improve your track, and allow you to scan their libraries of millions of sounds based on more abstract attributes like harmonic profile and tone. Apps like Endel and AIMI have used stems from collaborators like Richie Hawtin, Grimes, Black Loops and Shanti Celeste to create personalised generative music that never repeats, and never ends.

DJing hasn’t been left untouched by ML, either. VirtualDJ and Algoriddim’s djay software have introduced real-time stem separation powered by AI. DJ software’s AutoMix functions also use ML to understand how a song blends with the next, offering the perfect automated DJ for your next after party. Much like Spotify, Splice and Loopmasters, advanced recommendations and search are becoming more commonplace in all DJ software, while Pioneer DJ’s rekordbox recently introduced an AI-assisted vocal detector, to avoid dreaded vocal clashes. Sensorium Galaxy, a new virtual reality platform, has recruited Carl Cox, Eric Prydz and Charlotte de Witte to perform inside its virtual space, with support from AI DJs trained on hundreds of hours of electronic music.

The implications of this technology on how we make, perform and listen to music will be wide-ranging and, at times, dramatic. New, unprecedented legal and ethical questions will arise and deepfakes will alter our perception of what’s real. Music-making instruments, DAWs and tools will be completely re-thought and re-designed. Fundamental mixing and production skills will be automated, virtual DJs will master mixing and track selection, and collaboration — or identity theft — will be possible with any artist, dead or alive.

To explore these issues, we spoke to key artists, developers, start-ups and experts to find out how AI will shape electronic music’s future.

The Music-making industry is about to face a storm of deepfakes. Deepfakes are synthesised media — be it video, audio or image — that use AI to replace one person’s likeness with another’s. There are many examples on YouTube, largely of celebrities or politicians morphing into other celebrities or politicians with staggeringly accurate results. While Forbes proclaimed that “Deepfakes are going to wreak havoc on society” and “we are not prepared”, for music it’s slightly less ominous, but still a concern.

Where sampling introduced an ethical and legal shitshow from the mid-’80s onwards, deepfakes and ML models are likely to trigger the next wave of debates, controversies and lawsuits throughout the 2020s and beyond. While with sampling, there are tools that can pitch shift, tempo change, warp and manipulate audio, the output is always relative to the original sample.

Models built on ML, be it of a vocalist, guitarist, drummer or even a full mix, can identify and replicate highly personal idiosyncrasies in that artist or producer’s style; replicating every detail of a lifetime of musical and technical mannerisms and characteristics. When we can’t tell the difference between a real Beyoncé vocal and a fake one, the can of worms is well and truly opened.

“I think this is going to be one of the biggest ethical conversations in music going forward — and we’re so not ready,” says Holly Herndon. She’s been heavily invested in AI and ML for years, releasing albums and projects that use cutting-edge technology. For her 2019 album ‘PROTO’, she created a voice model named SPAWN. This July, she released Holly+, a project that lets anyone upload audio and have it processed and recreated in the Holly+ algorithm’s interpretation of her voice, all based on hours of machine learning.

“On the last album we made voice models, and we thought it would be really interesting to open that up and let other people play with my voice,” she says, “because we see this as something that’s very much on the horizon.”

How that horizon unfolds is still uncertain. “This could go a number of different ways — it could be a total nightmare scenario where people are making work with your vocal likeness in a way where you’re not really comfortable and you could try and control it and go really DRM-heavy [Digital Rights Management] on it. We decided to go the other way and make the model freely available to everyone.”

Herndon’s take is not to try and restrict the technology, but to embrace its creative potential.

“It’s almost like trying to stop people from Photoshopping your image,” Herdon says. “The cat’s out of the bag.”

“There is a real concern — unless the publishing rightsholders and the artists themselves gain some fluency over this, we could find ourselves with a problem" — Mat Dryhurst

“If tomorrow, we could make a model of Beyoncé’s voice and then release a song credited to ‘Holly Herndon featuring Beyoncé’, we’re probably going to have a lawsuit,” says Mat Dryhurst. He’s a musician, researcher and lecturer at New York University’s Clive Davis Institute of Recorded Music, as well as Herndon’s long-time collaborator. “There’s going to be people who do that shit just for the attention too, which is annoying. It’s way cooler to jump out in front of it and act responsibly, but also celebrate what’s rad about it at the same time.”

“How cool would it be too if Beyoncé actually embraced it,” adds Herndon playfully, “and when you went to your local karaoke bar you could sing ‘Single Ladies’ as Beyoncé?”

Whether the artist in question welcomes the new technology or not, moral, ethical and legal questions remain.

Professor Joe Bennett, a forensic musicologist at Berklee College of Music told Billboard earlier in 2021, “The sound of the voice itself is not covered by copyright law. The reason you can impersonate someone with deepfake audio is because there are only two protected objects in copyright law – the musical work and the sound recording. The musical work refers to the song – notes, chords and lyrics. And the sound recording protection can only be applied to a specific track. This means deepfake audio is a grey area, as the voice isn’t considered [by law] as a part of the composition.”

There is, however, legal precedent to sue over a vocal likeness, as Herndon explained. When US car company Ford tried to license a Bette Midler song in 1988 and she refused, they hired a likeness – her own backing singer – to re-sing it in her style and tone. She ended up winning the case for infringement. The case is likely to be significant as the inevitable legal fightback against deepfakes grows.

While an identical model trained on one singer may be more obvious, it’s when the data used to train the model is less clear that the waters become muddied.

“People describe this ‘black box’ and it’s really hard to pinpoint why an algorithm will output what it does,” says Cherie Hu, a leading voice in music technology trends, an award-winning journalist and researcher and author of the excellent Water and Music newsletter and community. “It’s a general issue in machine learning, and for music, there are very particular implications for copyright ownership and royalty payments.”

If you wanted to create a model based on Bicep, Herbie Hancock and Slipknot melodies, who, if anyone, is compensated for your use of those artists’ intellectual property? While sampling can be obscured into oblivion, it’s at least possible to identify certain aspects of a sound. For modelling, there’s no start or endpoint, no clear references and no way to know what music was used in the training. Even if your intentions are to compensate, it’s impossible to quantify what chord progression or melody is attributed to which artist.

“There is a real concern — unless the publishing rightsholders and the artists themselves gain some fluency over this, we could find ourselves with a problem,” Dryhurst continues. “The analogy we always use is: Google made one of the world’s most powerful companies by scraping the internet with no one’s permission. Google was like, ‘We’re going to index all of it and then sell services on top of the ability to navigate that information’. In this new paradigm of machine learning, the same opportunity exists for creating the largest models. One could imagine a new service, or a new DAW, that ties into one of these mega models, where you can just search for who you want your song to sound like.”

"To the extent that machine learning requires a reproduction of an original work, it’s clear that this is subject to the authorisation of the rightsholder" – Michelle Escoffery, President of the PRS Members' Council

Recently, the conversation around fair compensation for musicians and songwriters has accelerated. This summer, the UK government’s Department for Digital, Culture, Media and Sport published a report concluding that the music streaming model needs a “total reset”. As well as being a new creative opportunity, AI also brings with it a chance to improve artist compensation across the board.

“I do think we have to come up with some kind of framework to right some of the wrongs of the past,” says Herndon. “We often use the example of Gregory C. Coleman. He played the Amen break. He never saw any kickback from that and he died homeless, which is just horrible. Electronic music specifically is fraught with this kind of history — how can we do better this time?”

Where sampling failed so many, could modelling be an opportunity to rewrite the rulebook on remuneration? One innovative solution is Herndon’s own concept around Holly+ and its ownership. The model is owned and governed by a decentralised autonomous organisation (DAO). Part of the profits earned through approved works made with Holly+ will then be funnelled back into the DAO to fund more tools in the future. This July, Herndon published an essay on how this works through a website, holly.mirror.xyz.

Of course, AI isn’t just about creating models capable of identity theft, and the ‘Beyoncé button’ isn’t going to arrive tomorrow. But as Dryhurst and Herndon allude to, helping to educate now could avoid potential problems in the future. The UK’s Performance Rights Organisation agrees; as Michelle Escoffery, President of the PRS Members’ Council told DJ Mag: “To the extent that machine learning requires a reproduction of an original work, it’s clear that this is subject to the authorisation of the rightsholder. As such, it is for those rightsholders to decide on the terms that they are willing to allow their works to be used for these purposes. The industry can certainly do more to educate those innovating with machine learning and music about the need to obtain a licence, and where the necessary rights can be obtained.”

As much as it’s important to raise these points around intellectual property, it’s also important to celebrate the creative opportunities afforded by these models. “The ability to collaborate with other people using only your voice, or perform as other people with their permission, that’s pretty new and that’s pretty cool,” says Dryhurst.

“Or perform as other physical forms,” continues Herndon. “A person who plays the trombone makes very specific musical decisions based on the strange shape of the trombone, and you would make a very different decision if you were playing a violin. So it opens up these physical resonance bodies to everyone else. I could sing through a trombone, or a trombonist could sing through me. I think there’s something really weird and interesting to that.”

If you want to hear more of Herndon and Dryhurst’s enthusiasm about emerging music technologies, you can check out their podcast Interdependence.

For an example of where we are with modelling technology in 2021, you can also check out the OpenAI Jukebox project’s attempts to model certain artists, including Frank Sinatra, Katy Perry and Elvis Presley. It’s not perfect, but as a proof of concept, it’s remarkable.

In part two, which runs tomorrow (6th October), we’ll explore the impact of AI and ML on music production and engineering, from assisted mixing and search to complete AI DAWs.