Image Credits: Moor Studio / Getty Images | To what extent has OpenAI trained its models on copyrighted content? And is that training protected as fair use? These are big questions that will be debated as the company faces lawsuits from authors, publishers, and other rights holders. But a new study co-authored by researchers from the University of Washington, the University of Copenhagen, and Stanford may at least shed some light on the issue, according to Kyle Wiggers. The researchers argue that by asking GPT-4 to guess words that are "high surprisal" (namely, uncommon in the context of a larger work), they found evidence that the model has been trained on popular fiction e-books, as well as New York Times articles. Keep reading to see what else TechCrunch is covering this weekend. | | | Image Credits: TechCrunch | | | Be part of AI's future at TechCrunch Sessions: AI, taking place June 5 in Berkeley, California. Connect with 1,200+ leaders, VCs, and experts, and gain insights from top AI minds like Logan Kilpatrick (Google DeepMind), Oliver Cameron (Odyssey), Jae Lee (Twelve Labs), and more. Join main stage talks, roundtables, and top-tier networking. Don't wait — save $210 and register now! | | | What else we're reading 📗 | | | Featured jobs from CrunchBoard | | | Has this been forwarded to you? Click here to subscribe to this newsletter. | | | Update your preferences here at any time | | Copyright © 2024 TechCrunch, All rights reserved.Yahoo Inc. 680 Folsom Street,San Francisco,CA | | | | |