MacMusic  |  PcMusic  |  440 Software  |  440 Forums  |  440TV  |  Zicos
o'reilly
Search

OpenAI Accused of Training GPT-4o on Unlicensed O'Reilly Books

Wednesday April 2, 2025. 06:45 AM , from Slashdot
OpenAI Accused of Training GPT-4o on Unlicensed O'Reilly Books
A new paper [PDF] from the AI Disclosures Project claims OpenAI likely trained its GPT-4o model on paywalled O'Reilly Media books without a licensing agreement. The nonprofit organization, co-founded by O'Reilly Media CEO Tim O'Reilly himself, used a method called DE-COP to detect copyrighted content in language model training data.

Researchers analyzed 13,962 paragraph excerpts from 34 O'Reilly books, finding that GPT-4o 'recognized' significantly more paywalled content than older models like GPT-3.5 Turbo. The technique, also known as a 'membership inference attack,' tests whether a model can reliably distinguish human-authored texts from paraphrased versions.

'GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O'Reilly books published prior to its training cutoff date,' wrote the co-authors, which include O'Reilly, economist Ilan Strauss, and AI researcher Sruly Rosenblat.

Read more of this story at Slashdot.
https://news.slashdot.org/story/25/04/02/0440222/openai-accused-of-training-gpt-4o-on-unlicensed-ore...

Related News

News copyright owned by their original publishers | Copyright © 2004 - 2025 Zicos / 440Network
Current Date
Apr, Thu 3 - 10:05 CEST