Spotify Data Scrape: 86 Million Tracks Allegedly Stolen for AI Training
86m Spotify tracks stolen in major data scrape

An activist group has made a startling claim that it has successfully scraped and copied tens of millions of music tracks from the streaming giant Spotify, an act that experts warn could provide a vast new dataset for artificial intelligence companies.

The Scale of the Alleged Scrape

The group, known as Anna's Archive, stated in a blog post that it had obtained 86 million individual music files from Spotify's platform. In addition to the audio itself, the group says it collected a staggering 256 million rows of metadata, which includes crucial information like artist names, album titles, and track listings.

Spotify, which is based in Stockholm and boasts over 700 million users globally, confirmed it had suffered unauthorised access. The company clarified that the leak did not comprise its entire library of more than 100 million tracks. In a statement, Spotify said it had "identified and disabled the nefarious user accounts that engaged in unlawful scraping."

The platform's investigation found that a third party had scraped public metadata and used "illicit tactics to circumvent DRM" – digital rights management protections – to access the audio files. Spotify has since implemented new safeguards and is monitoring for further suspicious activity.

Preservation or Piracy? The Group's Motive

Anna's Archive, which is best known for providing access to pirated books, framed its actions as a cultural preservation effort. The group stated its desire to create a "'preservation archive' for music" to protect humanity's musical heritage from disasters, wars, or budget cuts.

They claimed the scraped files represent 99.6% of all music listened to by Spotify users and announced plans to share the data via torrents, a common method for distributing large files online. "Of course Spotify doesn't have all the music in the world, but it's a great start," the group remarked.

AI Industry Implications and Copyright Fears

The immediate concern raised by campaigners is that this trove of copyrighted music will inevitably be used to train generative AI models. Composer and copyright campaigner Ed Newton-Rex issued a stark warning: "Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models."

He argued this incident underscores why governments must force AI companies to disclose their training data sources. The link to AI training is not theoretical; Anna's Archive itself references LibGen, a pirated book archive allegedly used by Meta to train its AI models despite internal warnings it was pirated.

Yoav Zimmerman, co-founder of AI startup Third Chair, noted on LinkedIn that this data could, in theory, allow someone to create a personal version of Spotify or enable tech firms to "train on modern music at scale." He concluded that copyright law and the threat of enforcement are the primary deterrents.

The Wider Copyright Battleground

This incident intensifies the ongoing conflict between creatives and the AI sector. AI tools like chatbots and music generators are trained on massive datasets scraped from the web, frequently containing copyrighted works without permission or compensation.

In the UK, this tension is playing out in policy debates. The government recently proposed a controversial exception that would let AI companies use copyrighted work unless the owner explicitly opts out. This was met with overwhelming opposition from creative professionals in a public consultation.

Liz Kendall, the Secretary of State for Science, Innovation and Technology, told Parliament there was "no clear consensus" and pledged to release policy proposals on AI and copyright by 18 March 2025.

As Spotify works to secure its platform, the alleged breach by Anna's Archive has thrown a spotlight on the vulnerable position of artistic works in the digital age and the pressing need for clear, enforceable copyright frameworks in the era of artificial intelligence.