- NEW: LibGen contains millions of pirated books and research papers, built over nearly two decades. From court documents, we know that Meta torrented a version of it to build its AI. Today, @theatlantic.com presents an analysis of the data set by @alexreisner.bsky.social. Search through it yourself:
- Meta initially considered licensing books. New legal docs show that a Llama senior manager felt it was “really important for [Meta] to get books ASAP,” as “books are actually more important than web data.” Paying and waiting wouldn't do. So they sought—and received—permission to torrent LibGen.Mar 20, 2025 11:42
- At this point, everyone suspects that generative AI has been built with stolen material—and in fact, since September 2023 when we published our analysis of Books3 (linked below), we've known for sure that 183,000 books had been pirated for this purpose. But LibGen dramatically increases the scale.
- Proof is different than suspicion. And generative AI remains an obscure technology: Companies do not like to talk about how it works, or what they've taken for it to work, even as they place it into products that millions or billions of people use every day and insist it should change the world.