What Next: TBD | Tech, power, and the future cover image

Will A.I. Close Off the Internet?

What Next: TBD | Tech, power, and the future

00:00

Google's C4 Dataset Is Used to Train Large Language Models

The Washington Post looked at Google's C4 dataset, which is used to train a lot of large language models. It referenced what it was training off of Wikipedia, news articles and copyrighted material. This sort of moment with Reddit is another sort of process in that like change as data sets once open are no longer free for people to read.

Play episode from 09:14
Transcript

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app