
Will A.I. Close Off the Internet?
What Next: TBD | Tech, power, and the future
00:00
Google's C4 Dataset Is Used to Train Large Language Models
The Washington Post looked at Google's C4 dataset, which is used to train a lot of large language models. It referenced what it was training off of Wikipedia, news articles and copyrighted material. This sort of moment with Reddit is another sort of process in that like change as data sets once open are no longer free for people to read.
Play episode from 09:14
Transcript


