![]() |
The digital equivalent of someone selling stolen electronics or meat out of the back of a van, appearing on one corner one day and another the next. |
Recent revelations about Meta’s use of copyrighted material for AI training have reignited concerns about how our creative works are being exploited. While Meta’s actions are troubling, the deeper issue isn’t new. It’s one I’ve spoken about before, and one I’ll continue to emphasize.
The problem lies not just in what corporations like Meta are doing, but in how easily this exploitation is enabled. For decades, authors, artists, content creators, and consumers of our work, have been unknowingly, or carelessly, feeding the beast. The right hand (AI) has our attention right now, but the left hand is still stealing our wallet.
The Problem We Helped Create
I’ve long cautioned about the risks of uploading books, documents, and images to websites and software platforms without fully understanding the terms of service. Many creators have willingly, or inadvertently, handed over their intellectual property without considering how it might be used.
From cloud storage platforms to file-sharing services, we've collectively built an environment where content is freely available, easily scraped, and frequently exploited. Even well-intentioned sharing, like uploading a manuscript draft to a collaboration tool, can leave content vulnerable to misuse.
And yet, this practice continues. For decades, users raced to get everything online in the spirit of democratizing information, often without considering the long-term consequences. Creators attempting to respect copyright frequently rely on images sourced from websites that make no effort to verify whether the uploader had the right or authority to share them.
Meanwhile, AI hunters, perhaps with good intentions, often upload images and documents into verification software without permission from the original copyright holder. Some verification platforms have recently updated their policies to discourage this practice, but these changes are far from universal. More importantly, if users don’t read or fully understand those terms (and most don’t) these improved policies become little more than window dressing. The problem isn’t just what the policies say; it’s whether users recognize what they’re agreeing to in the first place.
Creators must take responsibility for understanding where their work is stored and who has access to it. This isn’t victim-blaming; it’s recognizing that digital complacency has fueled the very systems now threatening our intellectual rights. LibGen and the Rise of Digital Bootleggers
The recent Atlantic investigation into LibGen shed light on how sites like this fuel AI data scraping. But let’s be clear, LibGen was never a Meta project. It was (and still is) a pirate site that has long existed in legal grey zones.
LibGen, which started as an academic resource, expanded to host copyrighted fiction and non-fiction without permission. While legal actions have targeted LibGen multiple times, with U.S. courts ordering its shutdown in 2015 and a $30 million judgment in 2024, the site continues to resurface.
However, LibGen’s presence is far from stable. Its primary domains have been seized or disabled repeatedly, with some remaining offline for extended periods. Yet mirrors and alternative access points persist, creating an ongoing challenge for authorities and rights holders alike. Many LibGen mirrors now rely on encrypted networks, like Tor, further complicating enforcement efforts.
And LibGen isn’t alone. Other pirate sites operate under similar tactics, shifting domains, hosting offshore, or leveraging encrypted networks to stay ahead of authorities. It’s the digital equivalent of someone selling stolen electronics or meat out of the back of a van, appearing on one corner one day and another the next. The goods may be accessible to buyers, but the sellers stay just out of reach of the law.
The Consequences for Creators
The combination of piracy and AI scraping creates a perfect storm for exploitation. As long as pirated books remain easy to find, large-scale scraping operations will continue to harvest these works for unauthorized use, whether by AI developers, content aggregators, or unscrupulous publishers.
Meta’s actions deserve scrutiny, but if we focus solely on AI ethics without addressing the rampant accessibility of pirated books, we’re only fighting half the battle.
What Can We Do?
- Be Mindful of What You Upload: Before uploading your work to a platform, read the terms of service. Understand what permissions you’re granting and whether your content may be scraped or shared.
- Report Pirate Sites: If you discover your books on pirate platforms, take action. Reporting them to search engines, ISPs, and web hosts can help limit their reach.
- Educate Others: Encourage fellow creators to be vigilant about their content. Awareness is key to slowing the cycle of exploitation.
- Support Ethical Platforms: Advocate for services that protect creators’ rights and refuse to scrape or exploit copyrighted content.
- Lobby for Forward-Thinking Laws: Push for legislation that holds search engines and ISPs accountable for enabling access to pirate sites. Harsh penalties for companies that knowingly facilitate piracy could significantly reduce the ease with which these sites operate and thrive.
- While governments have introduced over 800 new AI-related laws in the U.S., many of these focus on broader issues like data privacy, algorithmic bias, ethics, transparency, and security. Far fewer address the urgent need to protect creative works from unauthorized AI scraping. Worse, by the time many of these laws take effect, they are already behind the pace of technological advancement. Future laws must be proactive rather than reactive, addressing both the misuse of AI and the ease with which pirated content is exploited for AI training. Without this, new legislation risks being little more than a bandage on an ever-growing wound.
- Advocate for Automatic Penalties for Piracy Downloads: Support the development of a system that automatically penalizes individuals downloading pirated content. Unlike industries with centralized resources and capital to pursue piracy cases, publishing, graphic arts, and research are far more fragmented. Without a scalable deterrent, there’s little consequence for users who access and exploit stolen material. An automated system would create accountability where traditional legal action falls short.
The Fight Isn’t Over
AI developers may bear responsibility for misusing pirated content, but the underlying problem is far more complex. Until we address the digital black market for creative works, and recognize our own role in feeding it, the exploitation will continue.
The right hand may be drawing our focus with AI developments, but the left hand has been quietly stealing from us for years. It’s time to stop ignoring both.
Comments
Post a Comment