Justified Posteriors

Seth Benzell and Andrey Fradkin
undefined
15 snips
Nov 4, 2025 • 1h 4min

Evaluating GDPVal, OpenAI's Eval for Economic Value

Dive into the intriguing world of AI evaluations with a focus on OpenAI's new GDPVal metric. This innovative approach contrasts sharply with traditional macro frameworks, assessing AI's economic impact on specific tasks. Surprising findings reveal AI models like Claude achieving near human parity in various tasks. The discussion also uncovers the complexities of task design and the role of prompt engineering in AI performance. Expect insights on potential economic value automation could bring, alongside the challenges of automating knowledge work.
undefined
Oct 21, 2025 • 52min

Will Super-Intelligence's Opportunity Costs Save Human Labor?

Seth and Andrey dive into how AGI might reshape labor, referencing Pascual Restrepo's intriguing paper. They debate whether humans will remain essential in a future dominated by super-intelligences, likening people to ants compared to AGIs. The discussion touches on labor share potentially collapsing to zero and the nature of human tasks as bottlenecks or accessories. They also contemplate the implications of abundant compute and automation, raising concerns about rapid growth and the future of real wages. Plus, there's a light-hearted detour into monetary history involving sheep!
undefined
Oct 7, 2025 • 58min

Can political science contribute to the AI discourse?

Economists generally see AI as a production technology, or input into production. But maybe AI is actually more impactful as unlocking a new way of organizing society. Finish this story: * The printing press unlocked the Enlightenment — along with both liberal democracy and France’s Reign of Terror* Communism is primitive socialism plus electricity* The radio was an essential prerequisite for fascism * AI will unlock ????We read “AI as Governance” by Henry Farrell in order to understand whether and how political scientists are thinking about this question. * Concepts or other books discussed:* E. Glen Weyl, coauthor of Radical Markets: Uprooting Capitalism and Democracy for a Just Society, and key figure in the Plurality Institute was brought up by Seth as an example of an economist-political science crossover figure who is thinking about using technology to radically reform markets and governance. * Cybernetics: This is a “science” that studies human-technological systems from an engineering perspective. Historically, it has been implicated in some fantastic social mistakes, such as China’s one-child policy.* Arrow’s Impossibility Theorem: The economic result that society may not have rational preferences — if true, “satisfying social preferences” may not be a possible goal to maximize * GovAI - Centre for the Governance of AI* Papers on how much people/communication is already being distorted by AI:* Previous episode mentioned in the context of AI for social control:* Simulacra and Simulation (Baudrillard): Baudrillard (to the extent that any particular view can be attributed to someone so anti-reality) believed that society lives in “Simulacra”. That is, artificially, technologically or socially constructed realities that may have some pretense of connection to ultimate reality (i.e. a simulation) but are in fact completely untethered fantasy worlds at the whim of techno-capitalist power. A Keynesian economic model might be a simulation, whereas Dwarf Fortress is a simulacra (a simulation of something that never existed). Whenever Justified Posteriors hears “governance as simulation”, it thinks: simulation or simulacra?Episode Timestamps[00:00:00] Introductions and the hosts’ backgrounds in political science. [00:04:45] Introduction of the core essay: Henry Farrell’s “AI as Governance.” [00:05:30] Stating our Priors on AI as Governance[00:15:30] Defining Governance (Information processing and social coordination). [00:19:45] Governance as “Lossy Simulations” (Markets, Democracy, Bureaucracy). [00:25:30] AI as a tool for Democratic Consensus and Preference Extraction. [00:28:45] The debate on Algorithmic Bias and cultural bias in LLMs. [00:33:00] AI as a Cultural Technology and the political battles over information. [00:39:45] Low-cost signaling and the degradation of communication (AI-generated resumes).[00:43:00] Speculation on automated Cultural Battles (AI vs. AI). [00:51:30] Justifying Posteriors: Updating beliefs on the need for a new political science. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Sep 22, 2025 • 55min

Should AI Read Without Permission?

Many of today’s thinkers and journalists worry that AI models are eating their lunch: hoovering up these authors’ best ideas and giving them away for free or nearly free. Beyond fairness, there is a worry that these authors will stop producing valuable content if they can’t be compensated for their work. On the other hand, making lots of data freely accessible makes AI models better, potentially increasing the utility of everyone using them. Lawsuits are working their way through the courts as we speak of AI with property rights. Society needs a better of understanding the harms and benefits of different AI property rights regimes.A useful first question is “How much is the AI actually remembering about specific books it is illicitly reading?” To find out, co-hosts Seth and Andrey read “Cloze Encounters: The Impact of Pirated Data Access on LLM Performance”. The paper cleverly measures this through how often the AI can recall proper names from the dubiously legal “Book3” darkweb data repository — although Andrey raises some experimental concerns. Listen in to hear more about what our AI models are learning from naughty books, and how Seth and Andrey think that should inform AI property rights moving forward. Also mentioned in the podcast are: * Joshua Gans paper on AI property rights “Copyright Policy Options for Generative Artificial Intelligence” accepted at the Journal of Law and Economics: * Fair Use* The Anthropic lawsuit discussed in the podcast about illegal use of books has reached a tentative settlement after the podcast was recorded. The headline summary: “Anthropic, the developer of the Claude AI system, has agreed to a proposed $1.5 billion settlement to resolve a class-action lawsuit, in which authors and publishers alleged that Anthropic used pirated copies of books — sourced from online repositories such as Books3, LibGen, and Pirate Library Mirror — to train its Large Language Models (LLMs). Approximately 500,000 works are covered, with compensation set at approximately $3,000 per book. As part of the settlement, Anthropic has also agreed to destroy the unlawfully obtained files.”* Our previous Scaling Law episode: This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Sep 9, 2025 • 52min

EMERGENCY POD: Is AI already causing youth unemployment?

In our first ever EMERGENCY PODCAST, co-host Seth Benzell is summoned out of paternity leave by Andrey Fradkin to discuss the AI automation paper that’s making headlines around the world. The paper is Canaries in the Coal Mine? Six Facts about the Recent Employment Effects of Artificial Intelligence by Erik Brynjolfsson, Bharat Chandar, and Ruyu Chen. The paper is being heralded as the first evidence that AI is negatively impacting employment for young workers in certain careers. Seth and Andrey dive in, and ask — what do we believe about AI’s effect on youth employment going in, and what can we learn from this new evidence? Related recent paper on AI and job postings: Generative AI as Seniority-Biased Technological Change: Evidence from U.S. Résumé and Job Posting Data Also related to our discussion is the China Shock literature, which Nick Decker summarizes in his blog post: This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Aug 25, 2025 • 57min

AI and its labor market effects in the knowledge economy

In this episode, we discuss a new theoretical framework for understanding how AI integrates into the economy. We read the paper Artificial Intelligence and the Knowledge Economy (Ide & Talamas, JPE), and debate whether AI will function as a worker, a manager, or an expert. Read on to learn more about the model, our thoughts, timestamp, and at the end, you can spoil yourself on Andrey and Seth’s prior beliefs and posterior conclusions — Thanks to Abdullahi Hassan for compiling these notes to make this possible. The Ide & Talamas ModelOur discussion was based on the paper Artificial Intelligence in the Knowledge Economy by Enrique Ide and Eduard Talamas. It is a theoretical model of organizational design in the age of AI. Here’s the basic setup:* The Setting: A knowledge economy where firms’ central job is solving a continuous stream of problems.* The Players: We have Workers (human or AI) and a higher-level Solver (human manager/expert or AI). Crucially, the human players are vertically differentiated—they have different skill levels.* The Workflow: It’s a two-step process: A worker gets the first shot at solving the problem. If they fail, the problem gets escalated up the hierarchy to the Solver for a second attempt.* The Core Question: Given this hierarchy, what’s the most efficient organizational arrangement as AI gets smarter? Do we pair human workers with an AI manager, or go for the AI worker/human manager combo? * There are also possibilities not considered in the paper, such as chains of alternating managers and employees, something more network-y etc. Key Debates & CritiquesHere are the most interesting points of agreement, disagreement, and analysis we wrestled with:* Is a Solver Really a Manager? We spent a lot of time critiquing the paper’s terminology. The “manager” in this model is really an Expert who handles difficult exceptions. We argued that this role doesn’t capture the true human elements of management, like setting strategic direction, building team culture, or handling hiring/firing.* My Desire vs. Societal Growth: Andrey confessed that while he personally wants an AI worker to handle all the tedious stuff (like coding and receipts), the economy might see better growth and reduced inequality from having AI experts and managers who can unlock new productivity at the highest levels.* The Uber Driver Problem: We debate how to classify jobs like Uber driving. Is this already an example of AI managing the human (high-frequency algorithmic feedback), or is the driver still an entrepreneur who will manage their own fleet of smaller AI agents for administrative tasks?Go DeeperCheck out the sources we discussed for a deeper dive:* Main Paper: Artificial Intelligence and the Knowledge Economy (Ide & Talamas, JPE)* Mentioned Research: Generative AI at Work (Brynjolfsson, Lee, & Raymond on AI in call centers)Timestamps* [00:00] Worker, Manager, or Expert?* [00:06] Who manages the AI agents?* [00:15] Will AI worsen inequality?* [00:25] The Ide & Talamas model explained* [00:40] Limitations and critiques* [00:55] Posteriors: updated beliefsThe Bets: Priors & PredictionsWe pinned down our initial beliefs on two key questions about the future impact of AI agents, the foundation of our “Justified Posteriors.”Prediction 1: Will Managing AI Agents Become a Common Job? What percentage of U.S. workers will have “managing or creating teams of AI agents” as their main job within 5 years?Prediction 2: Will LLM-based Agents Exacerbate Wage Polarization?* Seth’s Prior: 25% chance it WILL exacerbate. Reasoning: Emerging evidence (like the call center study)* Andre’s Prior: 55% chance it WILL exacerbate. Reasoning: Skeptical of short-term studies; believes historical technology trends favor high-skill workers who capture the largest gains.Our Final PosteriorsPrediction 1: Will Managing AI Agents Become a Common Job?The model slightly convinced Seth that the high-skill vertical differentiation story might be stronger than he initially believed, leading to a small increase in his posterior for exacerbation. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Aug 12, 2025 • 1h 2min

One LLM to rule them all?

In this special episode of the Justified Posteriors Podcast, hosts Seth Benzell and Andrey Fradkin dive into the competitive dynamics of large language models (LLMs). Using Andrey’s working paper, Demand for LLMs: Descriptive Evidence on Substitution, Market Expansion, and Multihoming, they explore how quickly new models gain market share, why some cannibalize predecessors while others expand the user base, and how apps often integrate multiple models simultaneously.Host’s note, this episode was recorded in May 2025, and things have been rapidly evolving. Look for an update sometime soon.TranscriptSeth: ​Welcome to Justified Posterior Podcast, the podcast that updates beliefs about the economics of AI and technology. I'm Seth Benzel, possessing a highly horizontally differentiated intelligence—not saying that's a good thing—coming to you from Chapman University in sunny Southern California.Andrey: And I'm Andrey Fradkin, multi-homing across many different papers I'm working on, coming to you from sunny—in this case—Cambridge, Massachusetts.Seth: Wow…. Rare, sunny day in Cambridge, Mass. But I guess the sunlight is kind of a theme for our talk today because we're going to try to shed some light on some surprising features of AI, some important features, and yet, not discussed at all. Why don't people write papers about the important part of AI? Andrey, what's this paper about?Andrey: I agree that not enough work has been done on this very important topic. Look, we can think about the big macroeconomic implications of AI—that's really fun to talk about—but it's also fun to talk about the business of AI. Specifically, who's going to win out? Which models are better than others? And how can we measure these things as they're happening at the moment? And so that's really what this paper is about. It's trying to study how different model providers compete with each other.Seth: Before we get deep into that—I do want to push back on the idea that this isn't macroeconomically important. I think understanding the kind of way that the industry structure for AI will work will have incredible macroeconomic implications, right? If only for diversity—for equality across countries, right? We might end up in a world where there's just one country or a pair of countries that dominate AI versus a world where the entire world is involved in the AI supply chain and plugging in valuable pieces, and those are two very different worlds.Andrey: Yeah. So, you're speaking my book, Seth. Being an industrial organization economist, you know, we constantly have this belief that macroeconomists, by thinking so big-picture, are missing the important details about specific industries that are actually important for the macroeconomy.Seth: I mean—not every specific industry; there's one or two specific industries that I would pay attention to.Andrey: Have you heard of the cereal industry, Seth?Seth: The cereal industry?Andrey: It's important how mushy the cereal is.Seth: Well, actually, believe it or not, I do have a breakfast cereal industry take that we will get to before the end of this podcast. So, viewers [and] listeners at home, you gotta stay to the end for the breakfast cereal AI economics take.Andrey: Yeah. And listeners at home, the reason that I'm mentioning cereal is it's of course the favorite. It's the fruit fly of industrial organization for estimating demand specifically. So—a lot of papers have been written about estimating serial demand and other such thingsSeth: Ah—I thought it was cars. I guess cars and cereal are the two things.Andrey: Cars and cereal are the classic go-tos.Introducing the paperSeth: Amazing. So, what [REDACTED] wrote the paper we're reading today, Andrey?Andrey: Well, you know—it was me, dear reader—I wrote the paper.Seth: So we know who's responsible.Andrey: All mistakes are my fault, but I should also mention that I wrote it in a week and it's all very much in progress. And so I hope to learn from this conversation, as we—let's say my priors are diffuse enough so that I can still updateSeth: Oh dude, I want you to have a solid prior so we can get at it. But I will say I was very, very inspired by this project, Andrey. I also want to follow in your footsteps. Well, maybe we'll talk about that at the end of the podcast as well. But maybe you can just tell us the title of your paper. Andrey,Andrey: The title of the paper is Demand for LLMs, and now you're forcing me to remember the title of the—Seth: If you were an AI, you would remember the title of the paper, maybe.Andrey: The title of the paper is Demand for LLMs: Descriptive Evidence on Substitution Market Expansion and Multi-Homing. So, I will state three claims, which I do make in the paper.Seth: Ooh, ooh.Andrey: And you can tell me your priors.Seth: Prior on each one. Okay, so give me the abstract; claim number one.Andrey: So the point number one is that when a new good model gets released, it gets adopted very quickly. Within a few weeks, it achieves kind of a baseline level of adoption. So I think that's fact number one. And that's very interesting because not all industries have such quick adoption cycles.Seth: Right? It looks more like the movie or the media industry, where you have a release and then boom, everybody flocks to it. That's the sense that I got before reading this paper. So I would put my probability on a new-hot new model coming out; everybody starts trying it—I mean, a lot of these websites just push you towards the new model anyway.I know we're going to be looking at a very specific context, but if we're just thinking overall. Man, 99% when a new hot new model comes out, people try it.Andrey: So I'll push back on that. It's the claim that it's not about trying it, like these models achieve an equilibrium level of market penetration. It's not—Seth: How long? How long is—how long is just trying it? Weeks, months.Andrey: How long are—sorry, can you repeat that question?Seth: So you're pushing back on the idea that this is, quote unquote, “just trying the new release.” Right. But what is the timeline you're looking over?Andrey: It's certainly a few months, but it doesn't take a long time to just try it. So, if it was just trying we'd see us blip over a week, and then it would go back down. And I don't—Seth: If they were highly horizontally differentiated, but if they were just very slightly horizontally differentiated, you might need a long time to figure it out.Andrey: You might—that's fair. Okay, so the second claim is: the different models have very different patterns of either substituting away from existing models or expanding the market. And I think two models that really highlight that are Claude 3.7 Sonnet, which primarily cannibalizes from Claude 3.5 Sonnet.Seth: New Coke,Andrey: Yes, and it's—well, New Coke failed in this regard.Seth: Diet Coke,Andrey: Yeah. And then another model is Google's Gemini 2.0 Flash, which really expanded the market on this platform. A lot of people started using it a lot and didn't seem to have noticeable effects on other model usage.Seth: Right?Andrey: So this is kind of showing that kind of models are competing in this interesting space.Seth: My gosh. Andrey, do you want me to evaluate the claim that you made, or are you now just vaguely appealing to competition? Which of the two do you want me to put a prior on?Andrey: No no no. Go for it. Yeah.Seth: All right, so the first one is: do I think that if I look at, you know, a website with a hundred different models, some of them will steal from the same company and some of them will lead to new customers?Right? I mean with a—I, I'm a little bit… Suppose we asked this question about products and you said, “Professor Benzel, will my product steal from other demands, or will it lead to new customers?” I guess at a certain level, it doesn't even make sense, right? There's a general equilibrium problem here where you always have to draw from something else.I know we're drawing from other AIs, which would mean that there would have to be some kind of substitution. So I mean, yes, I believe sometimes there's going to be substitution, and yes, I believe sometimes, for reasons that are not necessarily directly connected to the AI model, the rollout of a new model might bring new people into the market.Right. So I guess I agree. Like at the empirical level, I would say 95% certain that models differ in whether they steal from other models or bring in new people. If you're telling me now there's like a subtler claim here, which is that the fact that some models bring in new people is suggestive of horizontal differentiation and is further evidence for strong horizontal differentiation.And I'm a little bit, I don't know, I'll put a probability on that, but that's, that seems to be going a little bit beyond the scope of the description.Andrey: Well, we can discuss that in the discussion session. And I think the final part that I make a claim about is that apps, and the users of apps as well, to multi-home across models. So it's not that people are using just one model. It's not like app developers are using just one model for each application. And that's kind of once again pointing to the fact that there isn't just kind of one superior model even within a given model class.And, Seth, go for itSeth: Andrey, you did the thing again. You did the thing again where you said, "Here, Seth, do you want to evaluate this empirical finding?" Or do you want me to now say, “This tells us something about the future of competition in AI'?"Andrey: Yes, yes, yes. All right, go for it.Seth: The empirical claim, right? Is—give me the narrow claim? One more time? Give it to me.Andrey: The apps are multihoming.Seth: The people multi-home. Okay. The narrow claim is we've got these apps; maybe we'll give the user, the listeners, a little bit of context of what a sample app would be.Andrey: Yeah, so I think about two types of apps here. One is a coding app, so Klein and RU coder are two quite popular coding apps. And we see that users of those apps are multi-homing. And then—those apps are multi-homing; we don't know as much about the users—and then we have kind of various chat-persona apps. And then we have some kind of utility appsSeth: Yeah. We'll call them, like—let's call that second group role-play apps.Andrey: Yeah, yeah. We have kind of like PDF extractor and apps like that, that are also on the—Seth: Very tool-ly. Okay, cool. Alright, so we've got all these apps out, and now you're going to tell me, Professor Benzel, "I think you would be surprised to find out that RU coder, for example, has both the Claude model underpowering it and an OpenAI model powering it." And that one is probably the thing I'm most surprised by.Right? I definitely would not be surprised at all to know that RU coder can send its cloud tokens to one data center versus another data center; that makes perfect sense. But the fact that you would sustainably have many different contemporaneous models on the same platform feels like a stage in a process rather than where we're going to end up.What do I mean by that? So why would you want to keep an old legacy model inside of your RU coder? So I've got—I'm, or Silly Tavern, is one that I like. So Silly Tavern is just, you can do role play and talk to characters and pretend you're going on adventures. Right?It seems like that Claude 3.7 should just be better than 3.5 at that, right? I really don't—my intuition is that they're not strongly horizontally differentiated. Why would you keep both? It would be for legacy reasons, for backward compatibility. Maybe there's a specific interaction or scenario that you had that you had working in the old version of the app, and you want to make sure that that's still around for new users.So, how would I think about this? I would think about if you want to say that this is like evidence of multi-homing. This multi-homing evidence is evidence of competition because the same app wants to use multiple versions. I kind of disagree, right? The way I think about it is maybe more like, you know, you're a car, and you can either use the old muffler or the new muffler, and some people have upgraded to the new muffler, but some people are still using the old muffler, and so that car has two different kinds of mufflers.Andrey: Yeah, we can discuss that, you know, that claim as well. I guess, do you want me to address what I think?Seth: Well, give me a taste, and then let's go to the evidence. Give me a taste.Andrey: The multi-homing is not happening on an old and a new version of a model.It's happening on, let's say, 3.7 and Gemini 2.5, which are both relatively new models. The other thing I'd say is that you read Reddit; there are some users that still like 3.5 better than 3.7.Seth: On the internet, they will prefer one plain white cotton T-shirt to another plain white cotton T-shirt entry.Andrey: Who are you to question the preferences? The consumer.Seth: Right? But I guess, all right, so this is my last comment on the priors, and then we'll get into the evidence, which is. This sort of speculation about what people will actually want in the long run is the bridge that gets us from this cross-sectional evidence about 20 April, 2025, to what the world's going to look like in 2027 and 2028. So that's why I'm pushing back a little bit.Andrey: Yeah, I don't want to make claims that are too great about 2027 based on this cross section. Yes,Seth: you know, GDP girl's gonna be at 30%Andrey: That's true.Seth: And all of you in labor will be automated.Andrey: There is going to be a lot of market expansion. I hear.Seth: Oh, babe, listen to our Epic AI episode. We'll post that before this one so you can see what we're laughing at.Andrey: All right.Seth: So tell me, Andrey, I can think of no one better suited to walk us through the evidence of this paper than Professor Fradkin of Boston University.Andrey: Look, it's very simple paper. It's essentially a few graphs, and the graphs are event studies, where we see what happens to a selected set of models around the time of the release of one of the new models. So for the release of Claude 3.7, we see a very obvious drop in the usage of 3.5. You know, if you ballpark it, it's about 80% cannibalization. And the adoption happens within a few weeks, so it's fairly fast. We also look at Flash 2.0. We see very fast adoption, and in terms of tokens used, Flash 2.0 is the biggest model very quickly. And then, Gemini Pro is another model that that gets released in this time period. And it also sees a very fast adoption curve that doesn't seem to cannibalize other models at this time period. So that's kind of the evidence on cannibalization and market expansion and then the evidence on multi-homing. So there, there's some intricacies with the scraping of the data here. So, actually—let's take a step back. Where does this data come from?Seth: What is Open Router?Andrey: We haven't discussed what Open Router is. All right. Look, one of the challenges with studying these issues is a lot of the data sits in these fortresses of data where you cannot extract anything from,Seth: And we're trying for you listeners; we're banging at that gate. We're banging at that gate every day trying to get in for you.Andrey: Yes. Yes. So people who are using OpenAI know through the chat app, through the direct open API calls, we're not going to get a lot of visibility into that data. We might get some auxiliary data from credit card providers, payment processors, and the like, but it's hard to know how usage is changing and how specific model usage is changing particularly. One thing that exists is this service called Open Router, and there are other companies that are similar to it. And it's built for, I'd say, a sophisticated user who might be like a software developer who knows that, Hey, you know, I want to use a mix of models, or I might want to change my code to use a different model as—Seth: Andrey, what's the S word that I'm thinking of here?Andrey: Substitution; What?Seth: Selection, you're so this. We're looking under the light of the cult plate, not under the light of the people who want to multi-home.Andrey: Yes. 100%. But I will say—we're looking—let me just explain what Open Router is, and then we'll talk about selection and whether we care about that or not.Seth: Oops.Andrey: Okay. So, so it's a very handy service that allows you to call many different types of models. It also allows you to set rules too. Or like which model to use as a function of things that you might not be thinking about if you're just a chat user, like latency, throughput, uptime, specific pricing, and how it affects prompt tokens versus reasoning tokens versus completion tokens. So it's just a really useful service for this, for the app developer.Seth: I mean, can I—just to interrupt for a split second here, right? Honestly, I feel like you gave me more evidence for horizontal differentiation in this market just by listing those four different features than you did with almost anything else, right? Because all right, I could see why you would need to balance between latency, price, throughput, quality, et cetera, et cetera.Andrey: Yeah. So, and there is actually an interesting feature of this market that many do not know: there are multiple companies that serve specific models. So this is obviously true with open-source models, where anyone can serve them. So we have a lot of servers of your Llamas and your Deepseeks. But it's also true of the closed-source models.For example, Microsoft might serve an OpenAI model, and OpenAI might serve the OpenAI model, and there might be differences in how well they're serving these models.Seth: Does that mean that Microsoft has to know the model weights, or are theyhidden in some way from them?Andrey: That's above my pay grade. I—Seth: We will find out for you.Andrey: I mean, Microsoft owns a lot of OpenAI, so they have some access.Seth: Okay.Andrey: Yeah. So, that's kind of an interesting feature of—Seth: Mm-hmm.Andrey: Anyway. One thing that this company does is they publish a lot of data about model usage and how the model usage is changing over time, and also about how specific apps use different models.In particular for each model, they put the top 20 apps using that model and their usage numbers. So you piece these together, and you can get some pretty good information about popular apps and what models they're using and how much they're using.Seth: Mm-hmm.Andrey: And even over time, if you're scraping it continuously—Seth: Do we know if this is for the apps that list themselves on Open Router? Is this the universe of tokens going through those apps? Do we know that?Andrey: I think it's a universe of tokens going through those apps, but not all apps are—Seth: Obviously? Yeah.Andrey: publicly disclosing it. Even if they are using Open Router.Seth: Well, it's a fascinating data set, so it's going to show us the price of tokens. It's going to show us which apps are using which tokens, and we're going to get dynamics on that over time. So it seems like a perfect data set. Andrey, your next big contribution is just noticing the data set.Andrey: It's, you know, to be clear, the ML community knows about this data set as well. You know, in this question of how do we evaluate which models are good and which are not, you know, what we all love is revealed preference.Seth: Oh, ooh.Andrey: Use? And an open router has one such ranking, right? That's publicly available. It seems pretty hard to game it, although we can talk about ways one could try to game it. and, that should tell us something about which, which model is better, the very least, which model is on the Pareto frontier? Um. And so has the machine learning community; the AI community has been noticing this. So yeah.Seth: And then they told you, so then your contribution was the translation to economics.Andrey: I don't know who told me. The other thing I should say is that now certain companies are releasing stealth models on open router as a way to test themSeth: Oh,Andrey: That's also an interesting dynamic to explore. In particular, OpenAI has stealth released some models through there.Seth: And these would be so if I was running Silly Tavern; it would become apparent to me that there's a GPT-4o version too, and I could play around with it as an option.Andrey: And there's a new model called Optimus AlphaSeth: Oh God, did let Elon Musk name this one? Oh my God. Somebody took too much testosterone this morning.Andrey: Yeah. So, all right. That model gets released for a few weeks. People play around with it, and then it's the new OpenAI model.Seth: Got it, got it. And then, but but theoretically, normal app users of Silly Tavern might be interacting with this model for a little bit before the official release is thereforeAndrey: Yeah.Seth: Got it. Okay. Cool.Andrey: Yeah, so what? What questions do you have, Seth?Seth: What questions do I have? Andrey, it occurs to me this population of LLM users might not be representative of the model of the market as a whole. How do you respond to that limitation?Andrey: So, I acknowledge it. I think that's—let me kind of push a little bit. So there are different populations of, what shall we say, heavy LLM users that we can think about. One type of user is your basic consumer, and that type might have a ChatGPT subscription or might even use, you know, the free version or Claude, even though really most of the action is in ChatGPT; we're not talking about that. I think that's very clear. Then, it's a consumer product. We know consumers suffer from very large default effects.Seth: Right?Andrey: They're not going to be switching very actively in aggregate. So I don't think this paper is about that at all. The second type of use case that we know a lot about, or we're aware that there's a big use case for, is in programming. Right?Seth: Mm-hmm.Andrey: And here I think this is a bit of a more representative sample in a lot of ways. Why, Kline and RU code are are serious programming apps.Seth: Even though they have silly names.Andrey: Yes, 100%, and they have features that are essentially at parity with features of VS Code, the programming, the copilot, and VS Code and Cursor, even though, as far as I'm aware, Cursor and Copilot use their own software to route the model calls.You can also model, you know; you can also do the same things in those apps. So I'd say the coverage I. And the user bases of these apps are quite similar; you might say client and Recode users are a little more sophisticated, but I actually don't think it's that big of aSeth: They're just a little weirder.Andrey: They're a little weirder.Seth: So you think this is very representative of the market for AI tokens? For coding?Andrey: yes, with, with exception, with a—Seth: Mm-hmm.Andrey: The exception is that some companies place severe limitations on the types of models their employees can use. So imagine you're working at Google. I imagine if you're working at Google,Seth: Gotta use it; you gotta eat your own dog food.Andrey: You cannot use O3for programming, I assume.Seth: You cannot generate images of German Nazis. They have to be all-right. That's a callback joke, guys. All right?Andrey: So then there are these other apps, and there, you know, it's hard, it's hard, you know, to say look, I, if you're, if you're an app developer and you have a single-use app, like a PDF text extractor or something like that, I imagine that you are actively, considering different models, especially trying to optimize your costsSeth: Mm-hmm.Andrey: And you may or may not use an open router. I'm not sure; certainly, there might be some selection, and if some apps are less, if there are developers who are less sensitive to these issues, they might not feel the need to use open router.Seth: But for freelance coding, we think this is representative. All right. Now talk about these other settings, like the tools and the role-playing.Andrey: Talking about this example, let's say you have a service where you send it a PDF, and it gives you back the structured text.Seth: Mm-hmm. Mm-hmm.Andrey: Which is a type of app that you can find on OpenRouter. I doubt that whoever's writing these types of apps is very different whether they use open route or not. I imagine they're considering many models.Seth: Right. Well, I mean, I guess we're in; we're kind of like in the talk-about-it section, but like you could see a lot of this stuff getting backward built into the platform, right? There's this story, you know, about iPhones. When you started off with an iPhone, there was like a light bulb app that you had to install to get the light to go, but then they built it into a feature of it, right? So, I mean, in the long run is there even a place for something like Open Router, or are these all features that are going to be built right into OpenAI or built right into Anthropic?Andrey: I guess the feature of being able to use the other models is a feature. I doubt that they'll build into it, but you know, who knows, right?Seth: Right, but they might give you different versions. There would be the within OpenAI version and then the within Claude version, and they could give you a selection of models.Andrey: Sure, sure. So if you're like, and I think a lot of big companies do this, if they sign an enterprise contract with OpenAI or Google or Anthropic, they're going to use their models. They might even have forward-deployed engineers that kind of show them how to use the model in the best possible way, how to fine-tune it, and so on.So I think there's a lot of, if something, if an application requires really close cooperation between the foundation model provider and the application layer, I think we'll see that essentially the different competitors are splitting off into cooperating with different model providers.Seth: Right. So you think that is one possible future, which is that we end up with much more fragmentation than open router. So there would be, in that universe, multi-homing across models, but not multi-homing across companies.Andrey: Yeah. I think multi-homing across models versus multi-homing across providers—yeah, we should be kind of clearer about that. And I think the evidence that I have is at least not—it's not just multi-helping within, you know, within OpenAI or within Llama or—Seth: Ooh. Ooh. We'll have to see about that. All right. Okay. Alright. Other questions I have about this are, you know, not all tokens are created equal, either. I mean, how large a range in prices are people paying for these tokens? Like, what I know is you have a little table of a maximum and minimum, but give the audience a sense of how expensive intelligence can get and how cheap it can get.Andrey: How expensive and how cheap can it get? so it can be close to free, especially for pretty small models. And it can get pretty expensive. So, there's an output price of 18 per million tokens that exists on this platform. At the time I was looking at it, for example.Seth: It's still cheaper than my ghostwriter.Andrey: Yeah, I mean, a million tokens is not nothing for sure. And then, there are differences in input prices and output prices. And there's also something that I haven't captured very well in this data, which is there might be discounts for something called NGS. Things get more complicated the more I look at it in detail.Seth: Right. And the question is, do these kinds of details suggest concentration, or do the details suggest disillusionment and horizontal differentiation?Andrey: Yeah.Seth: Hmm.Andrey: let's talk a little bit about just some very basic economics ofSeth: What the f**k is competition? Why do we want it?Andrey: Yeah. So I think first let's first think about the utility, the consumer app developer utility part of this, right? Let's imagine that they have some utility for the different models, but they also have to, you know, pay a price for it. So, the way we think about it is, how much are people willing to pay for the better model? And if we think that things are pretty vertically differentiated, everyone will want to pay more for the same types of models. If we think that things are horizontally differentiated, then different developers will want to pay more for different types of models. And then there's also this question about the scaling thing. Like, yeah, maybe there's a model that's a little bit better than the other model, but it's a lot more expensive, and people are not willing to pay for that. So that might be something going on.Seth: Hmm.Andrey: Prices, obviously, are a very important variable to think about, and especially when you can think about them in the following way. Say you have a hard problem. One way to approach it is you throw it to the best model. Another way to approach it is to call a slightly worse model 10 times and then pick the best answer, right? So there's some implicit kind of substitutability that might be present in this.Seth: But that. Oh man. So now that's so interesting because the story you just told is not a story about horizontal differentiation. Right.Andrey: yes,Seth: But it is a reason why you might want lots of different vertically differentiated models.Andrey: Yes. Yeah.Seth: Ah huh. So maybe we don't have direct evidence on horizontal differentiation here.Andrey: For what it's worth. I'm not sure how often these, this pattern, are being used, but it'sSeth: Okay.Andrey: It's certainly possible. Yeah. And then there's another kind of thing to mention, which is this famous Jevons paradox, which is a paradox.Seth: I mean, no. Paradox is really a paradox according to my book, Slight of Mind, about why paradoxes are dumb and you should just know all the right answers.Andrey: Yes. Alright. So, let's say we have an efficiency improvement in our model serving, and we kind of lower our prices by a bit. The response to that might be so large that the total number of tokens used might go up.Seth: Right?Andrey: Essentially, the dynamic at hand or the total revenue can go up.Seth: And so, I mean, it seems like that's happening constantly in this data, which is where we're releasing better and better models and demand just goes up.Andrey: Yeah. Yeah,Seth: Which is which provides another challenge for thinking about substitutability because we don't have individual-level data. This is not a static market.People are entering this market all the time. You gotta be; I mean, the figures you make are quite compelling, like stuff is happening the instant these models are released. But it's also the case that, you know, compositionally, who's in this data is changing and pretty fluid.Andrey: Yeah. Yeah. it's something I do hope to have more to say about, as I've been scraping at the time, because at least within an app, you might say that theSeth: It's homogeneous within an app. Yeah. Or maybe you loop together all the coding apps and all the, you know, silly taverns. Okay, cool. Alright. I mean, how much are you in, and how much do you feel like you have to make a claim about horizontal differentiation here?Andrey: Look, it's hard for me to see multihoming and no and think that there is no horizontal differentiation here.Seth: Other than price, quantity, differentiation, or price quality,Andrey: But there, no, no. Sure. But I guess, I guess a point that, you know, you can see in, in, in these figures is that you have, these are pretty similarly priced models in many ways that are being multi-homed.Seth: The latency is a little bit different. Maybe I'm going to switch back and forth based on latency. There are a lot of different little things here, right?Andrey: Sure, sure. That's fair. Without having the individual usage data, it's really hard for me to make these finely green claims. I certainly have begged for this data from the CEO of OpenRouter, but so far no cigar.Seth: Okay, let me push. Let's talk about that a little bit more, right? Which is, if the multi-homing is driven by fluctuations in latency, let's say, right? Like, I don't have strong preferences between Claude and ChatGPT; I just want to call the one that's lower latency. You can definitely get multi-homing there without it being driven by any difference amongst the models.Andrey: Sure. I guess I think this is very empirically testable. I haven't—the latency is at a five-second level, and just see how much it changes over time.Seth: There we go.Andrey: Yes.Seth: Ooh, ooh. I've given you some more homework, it sounds like.Andrey: So, I guess if we think that the latency is highly variable across time or the throughput is highly variable over time, then we might see that sort of pattern. If we don't see it being very highly variable over time, then maybe that's less—maybe that's some evidence that it's not quite what's driving it, but yeah.Seth: Let me tell you what my prior is, so maybe this is like the key part here, right? I have this really strong prior that I did not have; I was not born with it, but I have been trained by talking to AI expertsAndrey: Mm-hmm.Seth: There’s no such thing as the AI that's good at military stuff versus the AI that's good at writing humanities papers.That it's all intelligence—you get more of it or less of it. Sure. At the margin there's fine-tuning, there's vibes, but with the right sort of prompt and, you know, with a sufficiently unlocked model, you should be able to; it should be just pure vertical differentiation. That's kind of it; when I've been in rooms with technologists, that's the claim they make.Now, maybe that's because they're at OpenAI and they're at Anthropic, and it's their incentive for this to be a universe where there's only two big boys. But serious people I've talked to have suggested there isn't such a thing as significant LLM horizontal differentiation.Andrey: Yeah. I don't believe that. Let's see what they—let's see what they actually do.Seth: Mm-hmm.Andrey: OpenAI is constantly updating its default model in ChatGPT. And sometimes they're optimized for one metric, and then they realize that they face a trade-off. So, for example, if your ChatGPT is a little too nice to you, that might lead you to use ChatGPT more, but it might feel ethically dubious for ChatGPT to be encouraging your addiction, given that you totally deserve to be addicted to your phone. So, there's clearly a Pareto frontier of different things that these models can be made to do. Right? So do I. So and so, a lot of experimentation by the companies is the form. is, how do we play on this pato frontier? The existence of Pato Frontier suggests that there isn't just one dimension on which things differ.Seth: Right. But I guess where I come at this from is, okay, imagine there's like a continuum of steps of delivering the token to the consumer, right? The first step is a $500 billion pre-training run. We, you know, make the giant pre-trained model. The second step is we're going to fine-tune it. We do the RLHF and give my model its particular personality, and it knows it's not allowed to work for terrorists or whatever.And then there's the third step, which is we're now going to plug that fine-tuned model into an app, and it's going to be deployed in something functional that a consumer can interact with. I guess the way I see it is like as we move down that continuum, this becomes more and more horizontally differentiated, and at the beginning it seems really not horizontally differentiated, and by the end it really is very, you know, you don't want the silly tavern AI, you know, helping you convert PDFs.Right. So I guess when I hear LLMs are horizontally differentiated, I'm thinking about that pre-training step.Andrey: Mm-hmm.Seth: Maybe you want to make a claim about how the usage of AI in apps is horizontally differentiated, which is at the far other end.Andrey: Sure. Yeah. I, I think that's true. We don't, you know, and you know, we've talked about unhobbling on the show before, and I certainly believe that lots of these models have capabilities that we haven't figured out how to get out of them. Right. They know soSeth: Right. I've tried really hard to make OpenAI do some of those things, and it's not—it's not as nice as Grok when you ask him to, orAndrey: Yeah. So, so I think that's right, right? How the application and how these models are used in the application layer can be differentiated even if we think that at the foundational level it's just a ball of clay and some of these balls are bigger clay balls than other balls.Seth: Oh, right. And when you have smaller clay balls, you can't build the Mona Lisa of play balls. Right. So it's like a capacity thing. Yeah, I mean, it just brings us back to there being a vertical aspect and a horizontal aspect, and the question is like, in the market competition for AIs, where do those two come in? Right? Because in terms of app deployment, you wouldn't expect vertical. I mean, everyone's just going to use the best; they're going to use bottles that are on the Pareto frontier. So you'd expect the horizon, the vertical differentiation, to be less apparent in that last stage. Right?Andrey: Yeah. I mean it; I do it. It seems to me that models like Gemini 2.5 Pro and 3.7 Sonnet are both on the frontier, but. Some people just like one, and some people like the other. And, and that, that is horizontal differentiation to me.Seth: Right. And, and now, now you're referring to, like—Andrey: It's like maybe there's this, like there's a cost difference, and there might be latency differences, and that's really what's driving, you know, the usage patterns.Seth: Or maybe the prices are identical, and I'm Epsilon horizontally differentiated, and that's enough.Andrey: Yeah.Seth: I guess the last thing is that I think my instinct is that horizontal differentiation will become less important over time. Right. So if you think about these balls of clay getting bigger and bigger and bigger, right?Sculpting them exactly the way you want is going to get easier and easier as you have more and more clay to discard. Do you buy that argument?Andrey: I think we'll get better at sculpting things over time. I think that it's certainly true. Yeah, and I think that comes back to your question about whether we are going to have horizontal differentiation in the sculpting step. And then the question is, who's going to be sculpting it? Is it going to be app developers sculpting it? Is it still going to be the big labs that sculpt it in various specific ways? Yeah, that.Seth: Right. I mean, it makes it like if we, if we're doing the sculpting at the app stage, right, there's just a lot more room for horizontal differentiation, right? Because there's a lot more players who are going to be involved, and, you know, that's, that's the domain where, yeah, it does make mean, you know, a dollar to a consumer, whether the interface is blue versus pink and like even stupid s**t like that can support an industry, no offense to, you know, app developers out there.Okay. So one question that is kind of like the implicit background question in this paper, in my opinion,Andrey: Okay.Seth: But it is a prior, which we did not put a probability on, but I just kind of want to ask you, can you come at this with having done this research? It doesn't—you don't have to do it in a prior way, which is like, do you think the market for AI will be, you know, relatively competitive or relatively concentrated in four or five years?Because I mean, my reading of this paper was like, it's a shot for, it's going to be less concentrated and more competitive than you think.Andrey: I think it depends a lot on the complementarity of other things.Seth: There you go. There you go. Speaking of Catherine Tucker, we had her asking her about AI competition. She's like, "Well, you know, I'm Catherine Tucker." Catherine Tucker thing.Andrey: That is not how she talks.Seth: She does not talk like that. So I'm not going to try to do my Catherine Tucker voice. But like, her point was like, we know how to do antitrust. It has to do with networks of complementarities and substitution abilities. There's nothing special about AIs. Is that kind of your take?Andrey: I don't think I'm going to make the claim that we know how to do antitrust of AI. That seems premature, to say the least. I will say that the concentration of the industry is very likely to be determined by complementary integration assets. So how important is it to have that Anthropic engineer sitting at, you know, SAP, the specific molded version of Claude, or a particular application or not? Is it something where. at SAP will just call Open Router, and it's just going to be good enough that way. And they don't have to do specific SaaS contracts with Anthropic or anything like that. and that's hard for me to answer right now. But you know, if I had, if I were a betting man, I would say that there'd be a handful of models that are pretty competitive with each other.But I don't think there'll be like a thousand models that are competitive with each other.Seth: Right. That frontier, there's just not, there's not enough room at the top, at the frontier. Just because these trading runs will be so, so expensive. I guess that's kind of—as I was reading this paper, in the back of my head, I'm thinking, you know, like, how many people are going to come up with $500 billion to pre-train their own models?Right. It—it just seems like there's a maximum of how competitive this industry can get.Andrey: But I guess so. I would say like five; five is often enough to get a very competitive dynamic. Why do we want competition? It's not just because we want a bunch of competitors, for competitors' sake. We actually want there to be the correct incentives to innovate and then to price fairly, right?So those are kind of the two things we're trading off. And in industrial organization, there are some results that in certain cases where you want even less than five competitors for the incentives. So that still seems quite competitive, even if there is a lot of concentration.Seth: Right. I—it's all maybe another way of thinking about this is, suppose we could wave a magic wand and either make AI more horizontally differentiated or make it less horizontally differentiated. Right. We could choose which world we're in.Andrey: Mm-hmm.Seth: A world where they're less horizontally differentiated is probably one with faster growth and, you know, fewer implementation costs and less friction. Right.Andrey: Yeah, I'm not sure. It depends; it depends on how we think about, like, the specific innovation production function. Don't; it's not obvious to me that there's, like, one answer, right? Because you can imagine that in a horizontally differentiated world, more players are going to be able to try to innovate, and because there are more, there are going to be more rents. But if you think that it's all about just that huge run, that one big run,Seth: Right,Andrey: Maybe it's that you want it to be vertically differentiated and kind of a winner-take-all dynamic. But, one where the winner can change to from time to time.Seth: Right. You want a comp, so then we're in a universe where it's competition for the market rather than competition in the market. And that brings its own set of antitrust concerns. Andrey, you know, believe it or not, I took a minute to look at the same data and ask questions right along these lines of, like, how concentrated is this market exactly?Because reading your paper, it's a paper that's supposed to give me some hints about the competitiveness of the industry. The first thing people ask about an industry is, well, how concentrated is it? Right? So Andrey, what's your sense? Are these models more or less concentrated than a typical industry?Andrey: Um.Seth: Industry? And actually I want you to tell me, all right? So I've got three. I'll leave my test on the table here. I've got four HHI indices I'm looking at right now. I've got open wrap. This is for the week, the first week of May. we've got. The number of tokens is called at the AI company level, so it aggregates up to companies.We got the number of tokens called at the AI app level, so that's like a silly tavern, et cetera, et cetera. Then we've got the number of tokens called at the model level, and then I would like you to compare these two to inequality in motor vehicles and breakfast cereals. So I want you to rank those five from most equal to least equal.Andrey: Yeah, so I will push back on. You count already; you count like the Met Lamas as being Metas, right? Because Meta is not the one who's serving them. Right. But.Seth: Ooh. Ooh. Well, I could do providers too. That would be a fourth way to split it.Andrey: Yes. But generally, yeah. Look, it's more concentrated than these other industries.Seth: It's pretty concentrated.Andrey: I'd say more so than I, for I, for all of them, with the model-specific one. Even with that, I'd say it's probably more concentrated than the—Seth: That one is actually pretty low. So the model, so just, I'll put some numbers out there. Just, ballpark, motor vehicles have an HHI of about 2,500; breakfast cereals are just below that.Andrey: Mm-hmm.Seth: The number of tokens at the company level has an HHI of 2960, so it's a little bit higher than those guys. But if we go to the app level, we're at 2160, so that's kind of more competitive than motor vehicles and breakfast cereals, which we think have a decent amount of competition.And then the model level, so we're going to treat 3.5 and 3.7 differently. We're pretty equal. We're at the 1500 level, which is considered pretty, pretty competitive.Andrey: competitive. Yeah.Seth: All right. Does that change your progress, Andrey?Andrey: Well, I guess I wouldn't have used those industries as a comparison set.Right? Like, I think a lot of digital infrastructure types of industries have a lot more concentration. So you think about cloud computing or search or phones, right?Seth: mm-hmm.Andrey: I think so. Relative to those kinds of industries, it is less concentrated. But certainly compared to physical goods products, it's more, it seems, more concentrated, I guess. I assume that you didn't calculate that HHI per car. Right? So it's kind—Seth: No, it was not. That was at the company level.Andrey: Yeah. I mean—you know, disclosure, you know, this, this definitely has been on my to-do list. I just have not gotten around to it. But I don't.Seth: All right,Andrey: I don't think that, this changes my, my priors very much, ifSeth: Okay, well, I've got a second fact for you. Second stylized fact. All right, so now I want you to imagine, oh man, I don't know if we have time to start talking. We'll see the power law and probability distributions for the next episode. But let me give you four different things that might be more or less concentrated.Right? Here's another four things to think about. The concentration of one is 2023 US CompStat companies. One is the open router, AI at the company level. The second is Hugging Face. You know, our hugging face is another website where people will post AI models. This is for free downloads, so these are like public models.So I have downloads of Hugging Face AI models. And then finally I have all-time movie box office. So you tell me which of these is going to be the most concentrated: hugging-faced AI downloads, open router, AI tokens, 2023 US publicly traded companies, or movie box offices. All the time.Andrey: This is by the open router one. That's by the model creator.Seth: I believe that, yeah, at the company level.Andrey: Okay. Um. I think Open Router is the most concentrated of these.Seth: Correct. Second mostAndrey: hugging face?Seth: hugging face, second most, third mostAndrey: I don't know how to think about CompStat HHI. That seems like how—what's the product market? Sorry.Seth: the product. Oh, CompuStat. It's publicly traded corporations. So it's everything together.Andrey: oh, you're just combining all the—?Seth: Yeah, yeah, yeah.Andrey: Just revenue by revenue.Seth: No, it's market value. So, you know, implied market,Andrey: Yeah, I think that'll be three. And then the movies are four.Seth: Dude, you don't even need data. You got this down.Andrey: How about those priors?Seth: Who needs evidence? But okay. What, you see what I'm trying to get out here, Andrey? Right? Which is, you can give me evidence that people are willing to move back and forth, but if it's the most concentrated industry I can find, it seems pretty concentrated.Andrey: you like a bunch of industries that are more concentrated.Seth: Alright? Okay, so now we go. All right, so listen, this is going to be a special two-part episode of Justified Posteriors. In the next episode, Professor Benzel will bring his own evidence and analysis to bear on the data from Open Router, and you'll be the judge. Is AI competitive? Is it not competitive?It's the future you're going to have to live with one way or the other. Andrey, are we ready to talk about our priors a little bit?Seth: All right. What's yours? So tell us, you had three claims here. I guess you're a hundred percent convinced of all the claims. Again, you wrote them down.Andrey: Look, my claims are empirical, right?Seth: Right.Andrey: no, I'm not saying that they're right, but I, you know, I thinkSeth: They're descriptive.Andrey: They're quite descriptive. Unless I made a scraping error or something like that, I think they're, you know, they are what they are, but the interpretation is obviously up for debate.Seth: Mm-hmm. Do you want to take a shot at it? Do you want to give me a percentage chance that in two years—I don't know how to say this—let's say AI, the AI industry, will be more or less competitive than the average tech sub-industry? Is that a fair comparison?Andrey: I don't know what an average tech sub-industry is.Seth: I know or choose one search. Let's just search. How about searching? That's really unequal. Alright. Alright. So yeah, that's the question.Andrey: It's going to be more competitive than search. I have no doubtSeth: Okay. All right. Let's check that in a couple of years.Andrey: And also more competitive than phone operating systems.Seth: Yeah, we got two big boys there. That's fair. Okay.Andrey: Is it going to be more concentrated two years from now than today? I think that's an interesting question.Seth: You want to take a—is that 50/50 for you? Or, I think it's pretty; I put 90—ninety's too strong—85% of that is more concentrated in the future than now.Andrey: I do, so it depends on whether we're measuring by revenue or by token.Seth: Let's do tokens at the company level. Oh, I guess we should do revenue, right? Revenue's the more economical thing you can do with either one.Andrey: the reason I was asking is, like, I still imagine there's still going to be a ton of use cases for small, cheap models and,Seth: Yeah. So the most down. Yeah.Andrey: A very competitive market, right? Like in the sense that it's, that's, people are going to roll up their, put in, in principle, roll up a very good, small model.It's the big model that we're really worried about right in.Seth: Right, right. So yeah, so it's like the value-weighted is the one where you'd be really worried about concentration, given that there might be a lot of small toy ones that people f**k around with. But I think—Andrey: Talk, I don't. I'm not even talking about f*****g around. There are so many—Seth: Yeah.Andrey: Like, you could have the model call; you would, right?Seth: Mm-hmm.Andrey: you know, every email you're writing in GmailSeth: Mm-hmm.Andrey: For the line of code that you're going through, why not call a cheap model just as a first pass? That might even be the model used to determine whether you want a, you know, more fancy model or something like that.Seth: Right, right. And you can imagine a universe in which, like those super low-level AI observations, intelligence calls aren't even captured in data because I might be running that locally on my own laptop, right? Yeah—So yeah, so maybe there's some sort of size cutoff above which this, like, becomes interesting and tractable.Andrey: I mean, I can, yeah. I don't have strong priors on this, I have to say. I could see arguments either way. Maybe 60/40 towards becoming more concentrated in terms of revenue.Seth: All right. Well, I'm going to try to get Andrey's answer up in the next half of this two-part episode on Concentration in Competition in the AI Industry: Evidence from Open Router. This time it's personal.Andrey: All right.Seth: All right. Like, share, and subscribe.Andrey: Yeah. If you have better data, we're very—Seth: Give it to us, please. Yo, we'll be your friend. We'll co-author you.Andrey: Yeah. Just, you'll get such great exposure for your company on this podcast.Seth: Mm-hmm. Right? We will. And we'll also use your AI to write copy if you have an AI model yourself. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Jul 28, 2025 • 1h 12min

What can we learn from AI exposure measures?

In a Justified Posteriors first, hosts Seth Benzell and Andrey Fradkin sit down with economist Daniel Rock, assistant professor at Wharton and AI2050 Schmidt Science Fellow, to unpack his groundbreaking research on generative AI, productivity, exposure scores, and the future of work. Through a wide-ranging and insightful conversation, the trio examines how exposure to AI reshapes job tasks and why the difference between exposure and automation matters deeply.Links to the referenced papers, as well as a lightly edited transcript of our conversation, with timestamps are below:Timestamps:[00:08] – Meet Daniel Rock[02:04] – Why AI? The MIT Catalyst Moment[04:27] – Breaking Down “GPTs are GPTs”[09:37] – How Exposed Are Our Jobs?[14:49] – What This Research Changes[16:41] – What Exposure Scores Can and Can’t Tell Us[20:10] – How LLMs Are Already Being Used[27:31] – Scissors, Wage Gaps & Task Polarization[38:22] – Specialization, Modularity & the New Tech Workplace[43:43] – The Productivity J-Curve[53:11] – Policy, Risk & Regulation[1:09:54] – Final Thoughts + Call to ActionShow Notes/Media Mentioned:* “GPTs are GPTs” – Rock et al.’s paper* https://arxiv.org/abs/2303.10130* “The Future of Employment: How susceptible are jobs to computerization?” - Frey and Osborne (2013)* https://www.oxfordmartin.ox.ac.uk/publications/the-future-of-employment* “AI exposure predicts unemployment risk: A new approach to technology-driven job loss”— Morgan Frank's paper* https://academic.oup.com/pnasnexus/article/4/4/pgaf107/8104152* "Simple Macroeconomics of AI" – By Daron Acemoglu.* https://economics.mit.edu/sites/default/files/2024-04/The%20Simple%20Macroeconomics%20of%20AI.pdf* “The Dynamo and the Computer” – Paul A. David* https://www.almendron.com/tribuna/wp-content/uploads/2018/03/the-dynamo-and-the-computer-an-historical-perspective-on-the-modern-productivity-paradox.pdf* “Productivity J-Curve” – Erik Brynjolfsson and Chad Syverson* https://www.nber.org/system/files/working_papers/w25148/w25148.pdf* “Generative AI for Economic Research: Use Cases and Implications for Economists”– Anton Korinek’s paper* https://www.newyorkfed.org/medialibrary/media/research/conference/2023/FinTech/400pm_Korinek_Paper_LLMs_final.pdf* Kremer’s O-ring Theory* https://fadep.org/wp-content/uploads/2024/03/D-63_THE_O-RING_THEORY.pdf* 12 Monkeys (film) – Directed by Terry Gilliam* Generative AI for Economic Research - Anton Korinek.* https://www.aeaweb.org/content/file?id=21904Transcript:Andrey: Welcome to the Justified Posteriors Podcast, the podcast that updates its beliefs about the economics of AI and technology. I'm Seth Benzell, exposed to and exposing myself to the AI since 2015, coming to you from Chapman University in sunny southern California.Andrey: I'm Andrey Fradkin, riding the J curve of productivity into infinity, coming to you from Cambridge, Massachusetts. Today, we're delighted to have a friend from the show, Daniel Rock, as our inaugural interview guest.Daniel: Hey, guys.Andrey: Daniel is an assistant professor of operations, information, and decisions at the Wharton School, University of Pennsylvania, and is also an AI 2050 Schmidt Science Fellow.So he is considered one of the bright young minds in the AI world. And it's a real pleasure to get to talk to him about his work and spicy takes, if you will.Daniel: Well, it's a pleasure to get to be here. I'm a big fan of what you guys are doing. If I had my intro, I'd say I've been enthusiastic about getting machines to do linear algebra for about a decade.Andrey: Alright, let's get started with some questions. I think before—Seth: Firstly, how do you pronounce the acronym? O-I-D (Note, OID is the operations, information, and decisions group at Wharton).Daniel: This is a big debate between the students and the faculty. We always say O-I-D, and the students say OID.Seth: So our very own. OID boy. All right, you can ask the serious question.Andrey: Before we get into any of the specific papers, I think one of the things that distinguishes Daniel from many other academics in our circle is that he took AI very seriously as a subject of inquiry for social sciences very early, before almost anyone else. So, what led you to that? Like, why were you so ahead of everyone else?Daniel: I'm not sure. Well, it's all relative, I suppose, but there's the very far back answer, which we can talk about later as we talk about the kind of labor and AI. And then, there is the sort of Core Catalyst Day. I kind of remember it. so back at the M-I-T-I-D-E, where we've all spent time and gotten to know each other in 2013,Seth: What is the M-I-T-I-D-E?Daniel: The MIT Initiative on the Digital Economy, Erik Bryjnolffson’s research group. I was one of Erik's PhD students. My first year, we had a seminar speaker from the Computer Science and Artificial Intelligence Lab, CSAIL. John Leonard was talking about self-driving cars, and he came out there, and he said, “Look, Google's cheating. They're putting sensors in the road. We're building the real deal: cars that can drive themselves in all sorts of different circumstances. And let me be real with all of you. This is not going to be happening anytime soon. It will be decades.”And there were other people who were knowledgeable about the subject saying, “No, it's coming in like 5 to 10 years.”And at that point I thought to myself, “Well, if all these really brilliant people can disagree about what's going to happen, surely there's something cool here to try to understand.”As you're going through econometrics classes, I wouldn't say econometrics is the same thing as AI. We could debate that, but there's enough of an overlap that I could kind of get my head around the optimization routines and things going on in the backend of the AI models and thought, “Well, this is a cool place to learn a lot and, at the same time, maybe say something that other people haven't dug into yet.”Andrey: Yeah. Very cool. So, with that, I think maybe you can tell us a little bit about your paper GPTs, which is a paper that has had an enormous amount of attention over the years and I think has been quite influential.Daniel: Yeah, we've been lucky in that sense.Seth: In two years.Andrey: that's not—I mean—some version of it was out earlier… No…. Or is it? Has it only really been two years?Daniel: It has been the longest, , Andrey. If you and I weren't already sort of bald, , it might've been a time period for us to go bald. Yeah, we put it out in March of 2023. I had a little bit of early access to GPT-4. My co-authors can attest to the fact that I rather annoyingly tried to get GPT-4 to delete itself for the first week or two that I had it rather than doing the research we needed to. But yeah, it's only been about two and a half. Okay, so the paper, as I describe it, at least recently, has kind of got a Dickensian quality to it. There is a pessimistic component, there's an optimistic component, and there's a realistic component to it.So I'll start with the pessimistic, or I'll— why don't I just start with what we do here first? So we go through O*Net's list of tasks., There are 20,000 tasks in O*NET, and for each one of those tasks, we ask a set of humans who are working with OpenAI; they kind of understand what large language models in general are capable of doing.What would help you cut that time in half? So could you cut the time to do this task in half with a large language model with no drop in quality? And there are three answers. One answer is of course not; that's like flipping a burger or something. Maybe we get large language models imbued into robotics technologies at some point in the future, but it's not quite there yet.Another answer is, of course, you can. This would be like writing an email or processing billing details or an invoice.And then there's the middle one, which we call E2. So, E0 is no, E1 is yes, and E2 is yes, you could, but we're going to need to build some additional software and systems around it.So there's a gain to be had there, but it's not like LLMs are the only component of the system. And the reason we pick other software is because there's a pretty deep literature on how software and information technologies generally require a lot of co-invention, a lot of additional processes, and tangible capital. It makes it difficult to deploy those technologies fruitfully.And we figured, okay, by comparing that E1 category, the yes you can, with an LLM out-of-the-box, to the E2 category, how much do additional systems and innovation get us? We could say something about whether generative, pre-trained transformers, GPTs, are general-purpose technologies. They'll be pervasive, they improve over time, and they necessitate that kind of complimentary innovation. They change the direction of innovation.If we can say yes to those three things, then we're in a situation where we get to the pessimistic version of the story. You just can't know what the long-term equilibrium is going to be across different markets as a result of these tools.So the prognostications that, ‘Oh yes, AI is coming to annihilate all the jobs. That the Machine God is imminent—or at least the Economic Machine God is imminent. I think those are a bit premature if you look and say this is general-purpose technology because historically general-purpose technologies have been hard to predict at the outset.So the optimistic side of things is that that impact potential is pervasive. There's a lot of benefit to be had in changing how people work. We use this exposure measure—I'm sure we'll get into this—but exposure is not automation. Exposure is potential for change, and if there's potential for fruitful change, we get more value in lots of different places in the economy.That's a good story we found—and if the reviewer is listening to this, thank you very much. One of our reviewers suggested looking at science and innovation tasks and research and development tasks and seeing how those compare to other areas. We found high levels of exposure in those areas, which means there's potential to turbocharge growth, at least temporarily, hopefully longer term, in the economy.There’s a temporarily, and an optimistic component on the realistic component. We compare the yes, you can do it temporarily, and better with an LLM here to the yes, you can, but you need more building, the set of tasks that get exposed if you build additional systems. If you were to snap your fingers and say, “Hey, we've got everything we need.”That's much, much bigger than the stuff that's just exposed to LLMs on its own. So the realistic story is we have a lot of work to do as a society in the global economy to bring about the gains of these tools. And it'll probably take a few decades for it all to play out. As much as we think that the changes have been very quick, it has been a fast two years, or slow, depending on who you ask.Seth: This has been great. Andrey and I are both bursting with questions. I'll let Andrey go first.Andrey: I want just a quantification. Like, so what percentage of tasks are exposed according to the first definition? What percentage of tasks are according to the second definition, approximately?Daniel: Yeah, if I recall correctly, about 14% of tasks, or 15% of tasks, (depending on if you're looking at the human ratings or the GPT-4 ones). GPT-4 and humans tend to agree, by the way. There's some noise there, but if you look at [the] GPT-4 ones, it's about 14% of tasks for E1, the level where it's just LLMs that can help. Now, if you snapped your fingers again and said, Now it's E2 and E1, that's about 46% of tasks. I might have my numbers slightly off there, but that's roughly what the numbers were.Andrey: And did you calculate what share of occupations have 100% of their tasks?Daniel: There were very few, if any, occupations that were a hundred percent exposed. I think data scientist was up there, and it depends on the measure, so we actually have three different combinations of these scores. The most conservative is saying it's just E1, and then that's it, and the least conservative is E1 and E2.We score each task that has either one of those labels as one and E0 as zero. And then there's this kind of intermediate one that I like, but my co-authors don't like as much, where E1 gets a one and E2 gets a 0.5. So it depends on what you look at. Mathematicians were highly exposed. My co-author, Pamela, has gotten some angry emails from mathematicians saying, “No, that can't be.”I will say I use it for building theory now. I use the language models for building theoretical models, and they do a pretty good job. They make some pretty terrible mistakes occasionally, so you do have to check their work, but to go from a verbal sketch of what you're trying to prove to some math that roughly shows what the setup should be, it makes it easier to be a reviewer instead of a doer, as they say.Seth: Sure. All right. Okay. A couple questions from me. The first question is: are we talking literally when we are doing these E1 ratings? Are we talking literally about ChatGPT-4, or are we talking kind of generally about LLMs of approximately that quality? Or are we projecting forward to kind of near-future LLMs?Daniel: Yeah. It was more the latter. We had a sense of where LLM tools were going to go. I think even looking at this set of tools we have now and GPT-4, they're very similar. There are expanded capabilities. It's kind of been a deepening of their capabilities, but the going of the somewhat foreseeable future, especially for my colleagues who had been and co-authors who had been in the weeds with this.But that does bring up an important weakness of this approach, which is as soon as you see something really qualitatively different or new capabilities showing up, you have to update the rubrics and the method; you have to rerun stuff. I think arguably the reasoning model paradigm is getting to the point where you probably have to rerun things.Andrey: Are you considering rerunning things? Is this like an ongoing endeavor or—Daniel: I'm not sure I'm going to return to writing an academic paper. I feel like I've gone to the well one too many times already with this. But if someone else wants to do it, I'm happy to help them out with it. Eric, Mitchell, and I did something in roughly 2016 looking at supervised machine learning and shared some slightly different conclusions, but now that I've been through this twice, I'm not sure that I want to do it just yet.Andrey: So this is a question that I wanted to kind of raise. 'Cause certainly you guys are not the first to do this sort of exercise, and you've done it before. Frey & Osborne have done it. I remember when I was thinking about these exercises; when I first saw them back in 2017-2018, I was like, “This is an accounting exercise. This is actually useful.” How do you determine in what sense this type of work—Seth: To throw another critique of this whole research agenda out there. We talk about Frey and Osborne coming out with one of these a decade ago. You talk about your own SML experiences. I know Morgan Frank has a new paper at PNAS Nexus out that compares about 10 different people's different exposure measures.Daniel: Mm-hmm. Which I'll do different things. Yeah,Seth: And they're all too; they're all completely different. How should I think about the diversity of these indices?Daniel: Well, there are different principle components underlying a lot of these different measures. Certainly SML and the GPT scores are very different. And Frey and Osborn—the way they constructed that effectively was—.Seth: Basically.Daniel: educated guess vibes with CS professors for a training set.I think their goal is to measure which jobs, as a whole, could be computerized. Actually, let me answer Andre's question a little bit more directly. Like, when you look at these, what are they useful for? Let me start by saying what they're not useful for. because actually some folks have put words in their mouths on this.Seth: Including Nobel laureates.Daniel: No Nobel laureates that I know of, but there are some places and some folks who have who said things like, “If you're exposed, you're hosed.” And this is what the authors tend to value, I will say—Seth: with the word hosed. You set them up for that.Daniel: It's possible that that is the case, but I have not seen any data to conclude that that is the case.So let me state clearly for the record things you do not want to predict with exposure scores. Things that exposure scores are not designed to do: economically meaningful outcomes like wages or employment are not things. I'm not trying to say exposure scores will create unemployment. I'm not saying it'll cause wage loss, and I view it as a risk measure. I'm a recovering finance guy. I think there's a risk that can be good. It can be bad. Like we don't really know. It just means there's an opportunity, technically speaking, to change the types of tasks that people are doing and how they do them. So exposed and hosed are possibly orthogonal ideas.Nevertheless, I think it's worth tracking now. What else is it not useful for? Besides failing to predict labor market equilibrium. it's not useful for—Seth: Breakfast?Daniel: Can what make you breakfast?Seth: You're—Daniel: Scores?Seth: Do you want to list all the things? It's not useful for, excuse me,Daniel: Exhaustively, yes, we should. You can't eat the scores either. I wouldn't say it's especially useful for saying for sure that this is going to happen, right? Like, if a technical thing that could help someone do a role does not necessarily mean it's appropriate socially, legally, or politically.There's a whole bunch of different places where using LLMs might be inappropriate. One example, a famous one, is Jeff Hinton, who predicted that radiology demand would drop. And I think we are seeing, say, an appropriate example of where a multimodal model would be helpful in radiology.It could probably pick up a broken bone, but radiologists as data-enabled doctors have a lot of other components to their work, and they interpret difficult cases. If you're going to tell someone about a condition that they've gotten, it's challenging. That's not the sort of thing where you want an LLM just spitting out, “You have this wrong.” That would be terrible bedside manner.So even if it's theoretically possible, that doesn't necessarily mean it's going to happen. So turning now to where are they useful then? One is for testing this hypothesis. Are we limited in what we can say? which is my favorite application of them. In the sense that we see pervasiveness and complementarity and necessitating exposure throughout the economy.So we should dial back our confidence in terms of predictions of what will happen that I think were useful for answering a very specific hypothesis that we had. But then, underneath that—Seth: So you were able to—the hypothesis is that they are GPTs of GPTs? They're going to affect everything.Daniel: Yeah. So the only one of the three conditions that we punt on is whether they are GPTs that improve over time? Because that one was obvious. We do have some evidence, but we are mostly getting beyond that. I think about the first-order changes and where they're most likely to happen. I didn't know that this would be the case when we wrote the paper, but I think those measures that we built tended to predict where people would start adopting large language models, and there have been a few papers validating that empirically.Seth: That makes perfect sense, right? So it's maybe not a good model of what's going to happen to your job, but it's a good model of where the OpenAI salesman should show up and knock on the door?Daniel: Yeah, potentially. So you guys discussed this paper earlier on the podcast, but the Anthropic Economic Index, the areas where they thought people were or where they were showing people were using Claude, lined up reasonably well with the areas we thought GPTs and LLMs would show up.Andrey: Except managerial tasks.Daniel: Except managerial tasks. Those are happening; it's just not clear. I'm not sure what's going on in that dataset. In my work as a startup co-founder, I use all sorts of large language models for managerial tasks all the time. So we'll see what happens there.Andrey: I used a large language model for managerial tasks earlier today, so I agree with you.Daniel: Mm-hmm.Seth: Right. Seems like these AIs are being used. If you look at the philanthropic index, it really does focus on people using it in these kinds of hobby contexts, which is one of our big takeaways from that episode. So I mean, people don't manage as a hobby, so if a lot of Claude usage is hobby usage, you wouldn't expect that. You would expect that to be underrepresented.Daniel: You're saying that with the exception of the technical folks, software engineers, and data scientists, it's just like ripping with this stuff, right? Like, because that's not necessarily a hobby.Andrey: Ripping with it and the cursor, I mean. Now we're getting—Daniel: Sure. Yeah. API use, yeah. Yeah.Seth: Right, that's the giant use case right now.Daniel: Yeah, and that one's a great one. It's kind of ironic given our focus on software, but to some extent you can keep doing what you were doing, but just do it way better in software development with these tools. You don't actually have to transform the structure of software engineering too much to just get a very quick benefit, but I think there is a new mode of working and developing with AI-driven tools that has an analogy in that famous computer in the Dynamo paper. The paper mentioned electric power conversion; you think of it like the steam engine, right? For the listeners who aren't aware, this giant thing in the middle of the factory and all these pulley levers and belts come off of that thing, and it powers the whole factory. And then over the next few decades, they realize, ‘let's modularize that power.’ When we convert to electric power, the first thing to do with electric power is to do the same thing, but like, a little bit better.Take a giant dynamo, stick it in the middle of the room, and we're off and running. But eventually they were like, “Well, what if we make that really small?” And then we have lots of little machines all powered by their own little engine. Sort of similar, and I'm seeing this with some large companies: you start with a really monolithic, large technology function in the middle of the company that kind of like powers off. Lots of subgroups build technology for them, and then something kind of magical happens with these AI models.You can sit down with a subject matter expert, a product person, or a senior developer to make sure that these people don't hurt themselves as they're building something. And you create these like modular, , the Jeff-Bezos-two-pizza-team version of work where people have input into a process, and then rather than throwing that process over the wall to the dev team, you wait three weeks and see them come back with something that doesn't fit. You just develop together and watch the models go, and it really ups your cadence, but it opens up all sorts of best practice shortfalls that can happen.Like, have you hardened for security properly? The devs know what questions to ask there. So going from a specification to a finished product can be way, way quicker. If you redesign how the work goes, it's kind of similar to that steam-power-to-electric thing.Andrey: I guess maybe a natural place to go here with is there's kind of this distinction between the micro-level exposure of a task-level implication. So, should we be thinking about that? And certainly people have used your micro-level exposure metrics in macroeconomic models and so…Seth: Tell us about what that experience was like.Daniel: People use them in different ways. There are papers that you guys have discussed on the podcast before. If you look at the Simple Macroeconomics of AI paper by Daron Acemoglu, he uses our sort of experimental automation score. Which it is not. Could you use an LLM to improve your task output?Here it's like, could you use an LLM to just straight up do this task without a person involved? It's a really small proportion of tasks in the economy; that's a five-point scale. So our fourth or fifth most intensive automation risk scores. I don't love those scores, to be honest, but they are in a pretty narrow area.So it's not surprising that we find, or that we read in his paper, I should say, a seven-basis-point-a-year outcome. The OECD is a version where they use the exposure scores, and they get to something like 70 basis points of productivity growth per year. So it's all of one MLA's gains right there.But per year, I think these are a public good, these scores in some sense, and people bring their models and their priors, too; they're trying to discipline what they believe will happen with the economy with these scores. And they're noisy. I wish there were something more useful for these people to deploy in their models.But to the extent that we can be helpful, we're really happy that this thing is out there. I just caution folks against viewing exposure automation, which is a common failure mode, or even leaning on things like automation and augmentation as the choice that we have ahead of us at the macro level.Like, and Andrey, to your point, the macro-level conclusions, yes. Labor markets are how we share the gains from economic activity primarily across society. And then, when you get down to a micro-level task and you're asking a worker or a manager or a worker combo. Are you upset if we automate this task or augment this task?Either one. It's anything goes. It's about the labor market and the unit of work that's being purchased in the labor market. I could automate something I hate doing and be thrilled with it 'cause I could go spend my time doing other stuff. I could automate my whole job and make myself really sad. Well, maybe really sad, but I'd have to find another job.I could augment someone and make them thrilled and pay them more, or I could augment them such that they take the jobs, they do the work of 10 different people, and then nine people get fired. So I think this augmentation automation, micro-question, really does boil down to just exposure and changing work.And we can't say much more than that. And I don't think, even though automation and augmentation are like an elegant mathematical framing in these models, I don't think it's, I don't think it's something that we can lean on from a policy perspective at the micro-level. It's just like you're going to change what people do.Seth: Yeah, I'm going to push back on the idea that it's an elegant micro idea, right? Because for exactly the reasons you—,Daniel: Macro-idea, I should say. It's an elegant macro idea. I don't think it's an elegant micro-idea. Yeah.Seth: Right. But even then, it's kind of it, let me put it this way. To me, when people want to distinguish between augmenting and automating technologies, they want to talk about them as somehow separate from the rest of the economy. But as you've been implying, the real reason you can't say a certain technology is automating or augmenting is because that production is embedded in an entire economy.And that's going to tell you whether, as productivity goes up, you want more or less of that thing. The way I would put it is to use the metaphor of Marshall Scissors, right? So there's a story that's told of the famous economist Marshall from the University of Cambridge, who was the advisor of John Maynard Keynes. And somebody asked him one day whether it was supply or demand that was more important in setting the price for a certain good.Seth: Marshall said it's like asking what blade of the scissor is doing the cutting, right?Daniel: Mm-hmm.Seth: You can't talk about one without talking about the other. If you want to know what the outcome is and what I see, your paper is one blade of the scissor, right? It's the one blade of the scissor that's coming in telling you this job can be changed, but you need to know everything else about the rest of the economy to understand how the job will be changed.Daniel: That's right.Seth: And we've, we've talked about examples. There are countless famous examples, from the ATMs to, I like this example of the cotton gin of jobs getting automated and then demand for that form of labor going up.Daniel: Right. Yeah. Couldn't agree more. Yeah.Seth: Now Dan, I do have a micro-take, and I'm interested in whether you buy this, take this prediction about what exposure scores will do to an occupation. This is a somewhat out-of-equilibrium take. This is a partial equilibrium dynamic take, and maybe it'll be smoothed out in the long run, but in the short run, my prediction is that in occupations that are more exposed, there will be more wage polarization at middle-tier firms for that job and less wage polarization at extremely good or extremely bad firms that use that job. Alright, so I've got a kind of a framework here. Are you ready? Can you see where I'm going with this, or are you ready for me to give the reason why?Daniel: I have some hypotheses about how that could work, but I—yeah—don't leave me hanging here.Seth: Right. Okay. So should I start with the general equilibrium first, or should I start with the micro level first? Let's work from the bottom. So imagine, you've got, a job that uses two tasks, right? Task one and task two. They can be gross compliments in production, but it's actually not important.But you need them there; there can be gross compliments as long as they're not perfect substitutes, right? They can be gross substitutes. That's also fine. I'm a doctor. I need to spend so much time having bedside manner, so much time recognizing the x-ray. I know that's not a perfect example, right? Okay, imagine a technology comes out that allows you to automate one of the two tasks. Okay, well then obviously people who are worse than the technology at automating the automatable task automate it. And the people who are better than the technology at automating don't automate. I know this is already going to get a little bit off of the way that maybe you think about how things are, but grant me that for a second.Okay, what happens? People who are bad at task one but good at task two see a big improvement. Whereas people who are good at task one and bad at task two see no improvement. Right? Whereas, it kind of depends on how good the thing is. If you're equally good at both. Kind of depends. Okay. All right, so that's the first step. So where would you get wage polarization from? Automation. You would tend to get it in jobs when people's skills are anti-correlated. Right, because as we just said, if you're good at one and bad at two, we automate one. It doesn't help you. But if you're bad at one and good at two and we automate one, it helps you a lot. So you would expect to see wage polarization, wage distribution, and expansion for jobs where people's skill levels are anti-correlated. Okay? So now you might say, Sure, Professor Benzell, that sounds cool, but why would we ever expect in certain settings for wages and skill levels to be anti-correlated?Okay, and now I'm going to bring in the O-ring, right? So Kremmer has a general equilibrium theory of the economy: the productivity of a firm or whatever is somehow bounded by the kind of limited, the worst agent in the system, right? So this comes from the space shuttle Challenger explosion; the space shuttle explodes. We think it's because of this one faulty part, the faulty O-ring. Okay. What's the general equilibrium implication of this model?It's basically that you should get people of different skill levels all concentrated at the same type of firm. So there should be super good firms that have all the high-skilled people, mediocre firms that have all the mediocre people, and bad firms that have all the bad people. How do you get a mediocre person? Most mediocre people are mediocre 'cause they're good at one thing and bad at another thing. So now we come back to my hypothesis—which is that exposure should lead, And in fact, I'd love to bring this to some experimental evidence, some kind of working with Kyle Myers, a great economist friend of the show at HBS, on this—can we predict the experimental outcomes if you introduce AI to a place, and it's exposed to some of the tasks? Do you get that polarization in productivity and wage, and when do you seem to just kind of boost everyone by the same amount?Daniel: Okay. So some quick reactions there. So just to immediately hop from automation to exposure, we're like, —Folks, I guess I'm going to ask you a question that, funnily enough, I was asked by Joe St. Diglett as a grad student. I was lucky enough to get to sit next to him at a lunch. He was like, why do jobs exist?Like, why are certain tasks bundled together? And honestly, I don't have a great answer other than to gesture sort of vaguely at coordination costs. but within the task, shifting that you're discussing, you've got this mediocrity or sort of middling productivity that comes from the fact that.Some of the things they're good at, some of them they're not. It's still really hard to kind of blow apart the job and then reconstitute it with specialization. So I think where it's coming from is like, people are overall high productivity, and then there's a low productivity component, and then there's kind of this middle thing where you've got some CES aggregator that says, “This person is going to be slightly worse than the average of their components.”Exposure might lift them in some cases and might not affect them in others. So I kind of buy that piece. To move it to the equilibrium framing, though, I think what'll probably happen in a lot of cases is like a mini Bamel cost disease across everything that we do. The areas where we're least productive are going to be the ones that absorb most of our time.And in the beginning, there'll be a lot of confusion about that because LLMs will make it unclear what the least productive thing is now that you might be really bad at something. Right now, I know I'm really bad at writing, like spec docs for software. Well, now I have a process with Claude where I can write much better spec docs, and I'm not as terrible at it.So, but, once you get out of this sort of equal, disequilibrium condition, you might end up in a situation that looks a lot like the one we have right now as things settle. But then, the job boundaries have changed. And there are new names for things. I'll give you a small example.There's a new hot job in Silicon Valley called the Forward Deployed Engineer, where we've got some of these—Seth: Hazard pay?Daniel: This is a role at Helix. We've got a forward-deployed engineer looking for more Win Ma shout-outs. She just started.Seth: Are they waiting for them to call in air support? What's going on?Daniel: You send them to the customer's site, and they work with customers.You need really strong interpersonal skills, but you also need engineering skills. That's like a new configuration of work.Seth: Wasn't that called being a consultant?Daniel: No, no. Uh,Andrey: no, no.Andrey: If they’re a consultant, then you wouldn't be able to pay them as a forward-deployed engineer. Seth, what do you mean? This has nothing to do with what McKinsey would ever do.Daniel: I'm not sure that calling someone a consultant will—I'm not sure which end of that ends up being cheaper, but for the firm. But the critical thing here is that's a different mixture of work.Daniel: Those are some initial reactions.Andrey: I have reactions too. I think on one level, I'm always a little skeptical of intricate theories like this, when—Seth: I just have two parts. It has two parts you have to give me.Andrey: No, no, I mean more so that the like order question is even about income inequality, right? Like, it's hard to answer, and then you're trying to answer this even more sub-sub question. And I guess where I'll push back on is in terms of what the highest firms are, right?Like, production could be an O-ring within a person, or production can be an O-ring across people, right?Seth: It turns out that the prediction does not rely on whether ordering is within people as long as they're not, as long as the tasks aren't perfect. Substitutes what I just described goes through.Andrey: But I guess what I would think is that if we have specialists in 10 different tasks at a high-end firm, and then one of those tasks gets automated. Surely, one of those people's jobs will get fully automated, and I know Daniel is not liking automation already. but, that person'sDaniel: I do believe it exists.Andrey: That person's wage will go down. Right? Creating inequality.Seth: Yeah. But I have a theory of one of your tasks being automated, not a theory of all of your tasks being automated.Andrey: That's where my point is. I mean, it's an interesting question. High-end firms have a lot of specialization, maybe perhaps more specialization than lower-end firms. And so then the person is so specialized that if their specialty is very hard, then we might expect a bigger labor market effect for them.Seth: You might imagine if tasks were organized differently at large firms, this theory would run into issues. Of course, there are admitted variable problems up the wazoo, but I'm intrigued by the idea of looking into whether people's skills in these tasks, which make up their task bundle, which is their job, and their skills in those subtasks are positively or negatively correlated. And I do think that that will tell you a lot about what happens when you automate part of the task or part of the job. So now bringing that to the dwere is complicated, but that's my insight.Andrey: Saying one more thing, just how much do we expect new firm entry to be the key margin with all of this? Right? We know that organizations are very friction-filled, and adoption decisions even—Seth: New organizations, new jobs, right? If you slice out half of the task from a job, in the long run it is probably a new job.Andrey: Yeah, I think both of those. So then, in terms of thinking about existing firms, it's a little for me in general. Or, at least I expect, I'll be wrong; I expect a lot more entry and growth from new companies that are kind of taking advantage of this new production process from the ground up. That's kind of the lesson of the supply-side disruption theory.Daniel: Yeah, I'd agree with that. I think one of the reasons it takes such a long time for the benefits of sufficiently transformative technologies to show up is that it usually takes a while for the firms that are deploying them well to become economically meaningful. And then they sort of set a standard.Seth: Right? That's not the margin on your margin. The firms that figure out how to do it grow faster, which is another margin.Daniel: And I think, agreeing with Andre, that a lot of them are new entrants. Then it's not like an incumbent will always figure out the answer, or do they have to a lot of the time? Where I would ask you a question then, Seth. Just on the idea that the bundled tasks have some spectrum from super negatively correlated to perfectly correlated individual task productivities.Why do you think those tasks are bundled together? Because there's some coordination and cost benefit? Do you think there's probably some lower bound on how negatively correlated your productivity can be because, like, across these different tasks?'Cause, if you really suck at half your job, you probably can't do that job. I think you probably need weak, positive correlation everywhere.Seth: Ooh, man. I think for the sorting to happen. So let's take, we're going to take a thousand people who are all doctors, and I agree that you kind of want to think about the step before that, where before we get the thousand doctors, but I'm saying now that we have a thousand doctors good at task one, and some of them are going to be better at task two. And then you're going to get negative correlation across those abilities in the mediocre firms. Now, you're right; there might be some censoring. You can't be so bad at one of the tasks; you don't become a doctor, but I'm saying conditional on you have become one,Daniel: Oh, okay. I could see that. Yeah. The thinking is like a Dr. House situation: everybody hates him, but he is really, really good at the diagnostic side of things. But like if he weren't, then no one would put up with that. He would've just been fired.Seth: Right? He'd have a higher-paying job and be more productive if he was able to be nice for 10 minutes.Daniel: He’d probably be an investment banker or something.Andrey: There's a mirroring here too, like a general phenomenon in digitization, which is like the ability for specialization, for more niche content to do really well, right? So, if you’re only good at a task, and now that all the complementary tasks have been automated away, then you shouldn't be bound by your firm anymore.Like, you should be able to essentially create your own small business or join the most productive firm as the specialist in that specific area because all your other characteristics don't really matter that much anymore. So Dr. House would be able to essentially, officially run a business, even though he is really bad at organizational things, because all that stuff comes out of the box.Seth: I think that's why I talked about this theory as being kind of a short-term partial equilibrium theory 'cause in the long run you're reinventing businesses.But, you said something really interesting, Dan. And maybe I will start to transition us now about the idea that it's going to take time for people to figure out how to use these GPTs, right? The general (that is, chatbots or LLMs), excuse me. What sort of macroeconomic implications does that have? I understand you've written a little bit on this topic.Daniel: Yeah, right. Then, we call this the Eric and Chad Seavers, and I call this the productivity J-curve. I think the dynamic is when you see pretty much any kind of investment, there's an initial outlay period where things are expensive, and then there's a harvesting period later.There's the famous Robert Solo quote: You see computers everywhere, except in the productivity statistics. People were already starting that. With AI, I've seen a number of news articles that say there's no ROI for this. I think the way you kind of square the circle here is, well, in the beginning of a new technology, when everyone realizes, Okay, we're going to take the plunge; you're actually going to invest in this.You spend a lot of time kind of reconfiguring work, building new business processes, trying to figure out what new products to build, and collecting information—a whole bunch of really expensive stuff that's really hard to quantify. so it doesn't end up in GDP, to the extent that it could, but that's building up a capital asset.So, output is going to be understated. In the meantime, while we have this, it's going to look like we're putting in more to get less out. Then later that intangible asset is actually there, but not measured, and now it's an input instead of an output. And when it starts to spit off money, then everyone's going to say, “Oh, hey, look at how productive we're being, because it looks like you're getting more as an output for less as input.” Really, it's just that thing paying off. So that tension between the growth rate of investment in this new type of capital and the growth rate of the capital stock that you're missing, that difference depending on its share and the overall economy can be meaningful. And if you do, we use the stock market to measure it because investors aren't dumb.On average, they price these assets, or companies wouldn't invest in them, and under a roughly efficient markets hypothesis version of the world. But, if you're pricing those assets, then you can kind of back out roughly the magnitude of that adjustment you should be making to productivity growth.So it's kind of a fun spin on growth accounting, which I know isn't the reason everybody gets out of bed in the morning—to go account for where the growth is. But—Seth: Don't underestimate our audience, Dan.Andrey: Look, I mean, big political debates hinge on the measured rate of GDP growth. So, it's important. How big of an effect did you find in that paper?Daniel: Oh, I don't remember the exact numbers anymore. It's been a little while. I should look it up. But it's a lot. If I recall correctly, it might be something like 75 basis points a year for some period of time. The overall view is: look, we have good news and bad news. The good news is that the productivity growth rate level is actually a bit higher than we had thought once you account for these hidden assets. The bad news is that the slowdown from 2005 is even bigger than we thought because they were building intangible assets back then too. so,Andrey: Well, how do you compare the intangible asset investment? I think this is kind of the keySeth: Yeah. What's bigger? The invisible teapot or the invisible elephant?Andrey: Because right now we're getting a lot of intangible investment into learning new production processes with AI, or is the answer just to look at how much the stock market has gone up? Is that the answer?Daniel: Oh, that's basically it, Seth; you're not too far off. We do a hedonic regression. If we were to look at, say, the R&D assets, because this one's kind of mature, you don't really see too much from R&D on its own, but we can see if a dollar of R&D investment capitalized is actually worth a dollar and 10 cents in market value. We assume that there is 10 cents of intangible correlate value there.Or if you really wanna be pedantic about it, it's 10 cents of intangible correlate combined with quasi rents from the fact that you can integrate R&D investment better for productive purposes than your competitors could. And then I'm going to wave my hands and say, But that's actually an asset, so it's an intangible asset too.Seth: Right. It's the, the, this is, this is something. I mean, I remember us spending lots of time back in the day in the M-I-T-I-D-E break room, having a cup of coffee looking out over the Charles Jerome, walking by the Aour, locked in these intense conversations about just how do you measure these intangible assets?They seem so essential to everything, yet they are literally the latent vaporware. They're our generation's. TFP, if you will.Andrey: I don't know. I think the principle I obviously agree with, right? Like you have these investments that are not easily measurable. and they surely should be counted in some way. But it's not obvious to me. If the rate of intangible investment were constant over time, then it's a constant adjustment, and we don't really have to think very much about how the world works. But then I think measuring the intangibles—that's kind of tricky because I think about market cap, which is something that not only you're already talking about rents, but to me competition is so important there, right? You don't gain market cap just because you're doing investment. You gain market cap because you have market power in the future.Seth: Yeah, but now you have to think about it. Why would you ever pay an adjustment cost in a perfectly competitive economy? You never make the adjustment cost, right?Andrey: Well, I would say that there are different degrees of market power that can exist, or you can have your kind of standard monopolistic competition model where everyone's kind of keeping up to keep up, but then you can have companies like your Googles and whatever, who clearly don't think that the right model of the world is that.Yeah, and I guess the other thing is I will not always be skeptical of firm value regressions. I think the endogeneity issues are fatal, but I don't know.Daniel: Yeah, I disagree with you there, that it is just—Seth: You just died. You were just killed.Daniel: I feel so devastated.Andrey: Yeah.Daniel: No, I think where I disagree is, I think Tim Bresnahan put it this way. He is just like, “Well, everything's an asset here, including the capacity to generate rents, so it's just an interpretation question more than anything else.”And you can bind things, right? Like, it's not when you go and run some of these regressions; you're not saying, I think that an additional unit of AI investment causes this market cap. They're the endogeneity; it's predictive. It's like, “Here's a price on this thing; it's not at all saying if you are.”Seth: Here's a model: there's only room for one social media platform. So whoever got there first planted their flag on that land. They didn't make an intangible investment. They just planted their flag first.Daniel: Right. That's what I'm saying too. It's like they planted the flag first, and now it's worth 10 bucks. but I'm 'm not saying if you were to just go up—-Seth: 10 bucks. Which seems marginal…Daniel: Oh, yeah. Oh, you're talking about the marginal versus inframarginal differences. And the way you deal with that, as opposed to how you do in any structural models, is you assume it away and say that marginal equals average queue for some of these.But it's not like when you run these regressions that you get coefficients of a thousand; you get coefficients of like somewhere between 4 and 12. So, is it unsatisfying—Seth: That—you get 4 and 12—what?Daniel: Oh, if I were to say… regress market value on measures of IT capital, the multiplier, I get, and this has been sort of stable in weird ways for 20 years; the coefficients you got are somewhere between 1 dollar of IT investments correlated with like 4 dollars of market value on the low end to like 12 dollars of market value on the high end. and it's that which bounds the debate. It's not saying this is infinitely valuable. There's this enormous intangible asset that's the entire economy.And then it's also not saying it's nothing. So I think that imposing some assumptions, which you can absolutely question, and I think we all should to try to get better models, imposing some assumptions and doing the best you can is a way to learn something as opposed to, like, just throwing our hands up.But yeah, I agree with you that the causal interpretation of these things is not correct. so.Seth: You then—so okay, the useful question—are we in the bad part of the J-curve?Daniel: Which part's good and which part's bad?Seth: The good part is when you're going to get more growth down the line than it looks like you have now.Daniel: We are in the hard work investment stage of the J-curve.Seth: Okay.Daniel: I don't think we're in the—we're anywhere close, at least not for AI. I don't think we're anywhere close to the harvesting side yet.Seth: But you think the GDP is on the underestimated side, which is what I mean by the good side.Daniel: Yeah, I would say very modestly, GDP is underestimated right now.Seth: Very modestly, 1%, 2%,Daniel: I think that's because I'm probably ambitious. But what's GD?Seth: Order of magnitude, 1%.Daniel: Yeah. So where it's tough is like the parts of AI investment that are happening right now, I think, are actually fairly well captured by GDP seeing a huge amount of CapEx, and data centers, GPUs, and those things are priced pretty well.But eventually people are going to question, how do you make someone responsible for hallucinations that the models might make or come up with good policies that get people to create good outcomes there? That's a hard thing to do. I don't think we're like anywhere close to scratching the surface with that.Andrey: I guess the intangible investments now are more about how we go about teaching using ChatGPT. 'cause that's not going to be measured in a change in labor inputs, but it's something that is not going to materialize until we actually figure out how to teach people more effectively.No, it's not clear that that was ever a GPT build. But, if we were a regular for-profit firm at the university, that's—Daniel: Yeah. So, that stuff will take a while… I don't know… I don't think even if we stopped—Seth: Of all the people who actually do work in the economy, are the people you're referring to—Daniel: Right. And in particular the AI researchers—if AI researchers stopped building new LLM tools and making these things better today, we would still have quite a while to actually integrate this and put them to their best use. So that's kind of a bummer.Seth: Then let me ask it that way. So if you don't wanna give me a percentage rate of intangible investments either—below average—do we need to spend a hundred percent of GDP over the course of the next 20 years in order to take these advantages cumulatively? How many intangible investments do we have in front of us? Do you have a sense of the order of magnitude of that?Daniel: I don't know how deep the well goes. No. But it might be quite a lot.Seth: One thing related to this, I was thinking about when we were talking about part one, is you've got these two measures of jobs: AI exposure, one of which is “just the LLM” and one of which is the LLM plus software tools. Didn't you tell us that you can use LLMs to make software tools?Daniel: Oh yeah. It's, it's totally recursive. But the reason we pick up on software tools is because it also requires the changing of business practices and these organizational things.Seth: So that's the way to do it. Can we play that game then? Can we look at the wedge between E1 and E2 as telling us something about the size of the adjustment costs needed or the intangible assets needed?Daniel: I don't think it gives you that, to be honest. Sorry, Seth. I know my tools are unsatisfying here. That's a good research question, though. I think actually, the market value regressions that Andre hates are more likely to get you a ballpark for that.Seth: Do any sorts of policies or ideas come out of the J-curve? Should we be somehow subsidizing intangible investment? Do you think this is happening at a socially suboptimal rate? I mean, you would expect that, like any innovation, you'd expect there to be positive externalities as people copy and learn from each other.Daniel: I don't have any evidence to suggest it happens at all, that there's an externality here that needs some sort of correction. Where I could see some policy considerations, and obviously I'm not in charge of any of these things, so take what I have to say with a grain of salt, as you would for anything else I say.Daniel: I think when it comes to monetary policy and thinking about how quickly how hot or cold the economy is, it may be helpful to know how much intangible asset creation is happening because it's a compositional shift. And you might think that the economy is in a recession when it's actually doing quite well, at least in certain pockets.There’s a distribution of gains question here that's pretty important. Like who creates the intangible capital versus who benefits from it versus who's just like, shut out of that part of the economy altogether. But I think on average you might want to know if your growth rate is actually, in real terms, two-and-a-half percent versus one-and-a-quarter percent or something.Andrey: And I guess you would look at the stock market. So if we have kind of this case where the stock market is going up, but GDP is not going up as much. Maybe, you'd be like, “That's okay on some margin.”Daniel: The stock market is an increasingly less useful tool, sadly, because there are fewer public firms, and there are other reasons that those large firms would be different than the rest of the economy. It's just a quick thing to do. So it's easy to get those market values and start to pull that info.But I think the ideal thing to do is to have an actual sense of how these assets are priced. Like you could look at M&A and costs for whole software firms. Sadly, you can't shave off a tiny piece of your digital culture and market it and sell it to someone to get a little bit of a value indication.But I think much more complete data would give you a sense of what these assets are being valued at. It could be helpful that that's if you're willing to buy into an enterprise that I more or less do, which is that on the margin, either these asset markets or securities markets are doing a pretty good job.If you think that there's some sort of bias in them that prevents you from actually sorting 'em out. Like, let's say everything is priced in terms of e-commerce, and I mean, obviously there's no hype factor in crypto, but yeah. Let's assume a wild assumption, for a second, that crypto is not priced at its actual long-term fundamental value and you were using crypto prices to back out the value of all illicit trade around the world. You might mistake illicit trade assets as being super valuable in that case. if those crypto coins are a claim on future, illicit trade value, so—Andrey: What—what?Daniel: I'm probably saying too much?Seth: The stock market may look really good, but the companies are building evil products, so don't—Daniel: Right?Seth: —welfare growth.Andrey: Well, this is—Daniel: Yeah.Andrey: Deone has the point of view that all the AI innovation is for making social media more addictive. o.e,Daniel: All right. Which is, in my view of the world, an asset.Andrey: What about what the GPT or GPTs do? Does that have any policy implications or, I guess, any follow-on work that you have on that? .Seth: I understand you've looked at how firms differ by these exposure measures.Daniel: One of the conclusions there. So, if you were to look at the exposure of firms against their quantities of tech workers, there's a little bit of a mechanical relationship here because tech workers are highly exposed. But, there is a difference across companies, like whatever exposure you, exposure measure you want to use.And the reason we do that, Seth, is precisely 'cause of what you brought up. You can use these tools to build better technology. So in some sense those companies might have a good reason to run away, and performance. But like the differences from low to high exposure and entity measures across firms are not nearly as big as the differences from E1 to E2 to E1 + E2.Those are really big. So, every company could benefit if they went and started actually trying to transform if they knew what a good direction to transform would be. So that was kind of one of the points I think from a policy perspective. I have a hard time separating what Tyler Cowen, whom we call mood affiliation, from what I think are good policies, but I'll just spit them out as some things I think are good to do.I would, but there are a few risks with these tools that scare me. The virology community, I think, should be fairly concerned about using turbocharged models to manufacture COVID or something. Or like, God forbid, some degrowth person decides that they want to kill half of humanity and go full Thanos.Seth: That’s the plot to 12 Monkeys.Daniel: It is, but so would 12 monkeys, which would be a bad reality to face. But aside from that, I think there's just so much drudgery, so much additional work that these things could do for us, and a lot of gains to be had. So my preference is not to regulate these models in any kind of aggressive way; I think it's to figure out what they're good for and to develop with them.Not to say you can't mitigate other risks like bias—that Mecca Hitler thing with Grock was terrible. There are going to be bumps in the road along the way, but they're not the kind that would say to me. Oh, we should do like a six-month pause of development. None of that really scares me yet.Seth: Not in favor of bombing the data centers?Daniel: No, I'm not, but I'm not a fan of Harry Potter fanfiction either. So I don't know. Maybe it's just correlated beliefs.Seth: So you brought up bioterror in particular—Daniel: Yeah.Seth: As we speak, AI is being used en masse in warfare for identifying targets for terminals and target acquisition by missiles and drones. Increasingly in Ukraine, we're seeing use of automated ground vehicles for transporting resources to the front and for evac. People often go to these super sort of—I'm not gonna say 12 Monkeys is bizarre, but it's a pretty weird movie if you've ever seen it. Why do we have to appeal to that rather than just using AI to make murder bots?Daniel: I mean, to some extent the murder bot thing doesn't scare me that much. It’s human beings doing those things is also bad. I think the issue people have with those applications often will be scaling evil individuals, which is a serious concern, or just issues with war in general, which I understand.But, if it's gonna happen, we're kind of caught in a prisoner's dilemma there, which is what freaks me out.Seth: Near-term AI worry is: I have a drone hanging out downtown—a suicide drone that just hangs out somewhere in Manhattan and waits for the particular person to walk out. And then I target assassinate people untraceably, right? That seems like here as opposed to “I use AI to build a lab to make a super disease, blah, blah, blah.” That's got a lot of steps in it.Andrey: Untraceable, Seth? I guess my presumption is these sorts of actions do tend to be traced. In fact, AI is a way to trace people, right? So this is kind of one where, as with many AI questions, it's a defensive and an offensive technology.Seth: So it favors the offense or the defense. We had thought, it seems like intuitively you would think that AI would favor the offense, right? We think about these super weapons like Daniel brought up. But if you actually look at Ukraine, it seems to create this transparent battlefield where no one can even march to the front and in some ways seems to favor the defense. It's gonna take a long, long time to play out.Daniel: Yeah, you guys would know the answer to this. I'm gonna butcher this quote, but who's that sci-fi writer who said that like, the job of a sci-fi storyteller is not to predict the driving cars but to predict the traffic jam or whatever? I think that—Andrey: I don't remember who it is.Daniel: Yeah, I think that's kind of the idea here. I think here that we want to predict what the traffic jams are. I think the—Seth: Frederik PohlDaniel: There we go—I should remember that. The reason the bio-risk stuff scares me so much is 'cause we just had a test of this and what one virus does to society and how damaging that can be.And I think, Seth, what you're bringing up is what I alluded to; it's like the scaling. One really bad long-term trend in technology is just like making individuals more powerful.Seth: Andrey and I just read a book. We just read a sci-fi novel that's masquerading as his political economy. That argument that AI is all about individual disempowerment, that we're gonna get the God machine that's built by the state in the project, and it's going to 1984 us constantly—that's radical human disempowerment.Daniel: Right. So if our response to individuals becoming much more powerful with technology is to expand their surveillance and control capacities of the state, and we get a loss of freedom, I think that's a genuine worry. In a general equilibrium framework, those things do freak me out for sure. But writing emails with LLMs just does not.There's somewhere in between that we should, where we, we start worrying, and I don't think I'm at that point yet.Andrey: What about things like transparency requirements that you oftentimes hear written about, reporting requirements, and registrations with the state? Do you have any opinions about those types of policies?Daniel: I don't like 'em. I'll shop my book here a little bit. Like they're terrible for startups, right? Like any compliance burden you stick on startups, even if we might be okay, specifically the ecosystem suffers as a result, and they do a lot of the work to discover things. So, there's a big trade-off, and this happens in the privacy debate too with GDPR and what Europe's trying to do politically; no one's willing to acknowledge that there is a compliance burden and competition trade-off. So if you're willing to hold firms to account in really expensive ways, you're gonna get monopoly power.And that may be okay. You may decide we don't want competition with this super private data that could get out to everybody—unwise with LLMs or AI regulation. If you don't want this to be an oligopoly situation, you probably need to make it so it's easy for people to build and develop.And I'm fine with whatever choice policy makers wanna make, so long as they're taking that trade-off into account. I mean, they're elected officials. They're trying to make those choices on behalf of all of us. If we don't like them, we can vote them out.Seth: Using the AI to manipulate us to have the beliefs that they want us to have.Andrey: Is there anything you wanna tell us before we wrap that up?Daniel: No, I thought this was a great discussion with you guys, as always. It's a pleasure to get to join you, especially as your first conversation-based guest. But, as a fan, it's kind of exciting for me as well. So please keep it up. Listen to Justified Posteriors, folks.I would say the message I would have for listeners and economists, maybe in the audience as well, is just that I think these tools are really valuable in our work. I kind of joke—I got a model that I'm building where it shows that lower types are going to use LLMs more for assignments.And then, of course, I'm using LLMs to help me build the model. So infer what you want about my type from that, but I think it.Seth: You've got this. You're assuming everybody has to be equally good at everything, but you can just be good at one thing and bad at another.,Daniel: Yeah, I would never claim to be a good modeler, but it does help me get my thoughts straight.Seth: I think you could be a modelerDaniel: I'll leave that one alone. But I would just encourage folks to kind of be their own R&D department. As Ethan Mollick says, “Play around with these things.” I think when I talk with computer scientists, they get upset with me because I'm a little bit too pessimistic about what the models will do long-term. When I talk with economists, the modal disagreement point is the other direction, where folks don't think it's gonna be a big enough deal. So I would say, get out there, play with these things, and learn how they work. And Anton Korineck has got a great paper on using AI in your own work, so check that one out too.Andrey: All right. Well, awesome.Seth: I can't think of a better place to end itAndrey: Listeners, please do comment and subscribe and stay tuned for more exciting episodes.Daniel: Thanks, guys.Seth: And if you are a super fan, you too. Might one day be a guest on the Justified Posteriors podcast. This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Jul 14, 2025 • 1h 10min

A Resource Curse for AI?

In this episode of Justified Posteriors, we tackle the provocative essay “The Intelligence Curse” by Luke Drago and Rudolf Laine. What if AI is less like a productivity booster and more like oil in a failed state? Drawing from economics, political theory, and dystopian sci-fi, we explore the analogy between AI-driven automation and the classic resource curse.* [00:03:30] Introducing The Intelligence Curse – A speculative essay that blends LessWrong rationalism, macroeconomic theory, and political pessimism.* [00:07:55] Running through the six economic mechanisms behind the curse, including volatility, Dutch disease, and institutional decay.* [00:13:10] Prior #1: Will AI-enabled automation make elites less responsive to ordinary people by 2050?* [00:21:00] Prior #2: Will we get a new social contract (e.g., large-scale UBI or constitutional change) by 2050? * [00:26:31] Chapter-by-chapter breakdown.* [00:43:50] What about property rights? Can they insulate us from AI-induced tyranny? Or will they be eroded in the name of efficiency?* [00:46:01] Critiques* [00:52:00] Policy "solutions":* [01:04:44] Final posteriors and Seth’s economic-philosophical reflections: Can immortality + perfect patience = AI capital monopolies?Mentioned in the Episode📖 “The Intelligence Curse” by Luke Drago and Rudolf Laine📚 I Have No Mouth and I Must Scream📚 There Is No Antimemetics Division📚 The Naked Sun by Isaac Asimov🎮 90s point-and-click horror game based on “I Have No Mouth...”📈 Sachs & Warner (1995) and Frankel (2012) on the resource curse.🔁 The Gatsby Curve📽️ Gattaca, 1984, Gulliver’s TravelsSupport the show: Please like, share, subscribe! This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com
undefined
Jun 30, 2025 • 1h 1min

Robots for the retired?

In this episode of Justified Posteriors, we examine the paper "Demographics and Automation" by economists Daron Acemoglu and Pascual Restrepo. The central hypothesis of this paper is that aging societies, facing a scarcity of middle-aged labor for physical production tasks, are more likely to invest in industrial automation.Going in, we were split. One of us thought the idea made basic economic sense, while the other was skeptical, worrying that a vague trend of "modernity" might be the real force causing both aging populations and a rise in automation. The paper threw a mountain of data at the problem, from international robot counts to US patent filings. Listen to find out how we updated our priors!Timestamps:(01:45) The Central Question(04:10) Stating the Priors(10:45) Looking to the Future.(22:30) What is a Robot, Anyway?.(25:20) Reading the Footnotes.(30:45) The Most Compelling Evidence.(42:00) The Mechanism at Work.(52:20) The Final Verdict (Backward-Looking).(57:30) The Future of Automation & AI.🗞️Subscribe for upcoming episodes, post-podcast notes, and Andrey’s posts:💻 Follow us on Twitter:@AndreyFradkin https://x.com/andreyfradkin?lang=en@SBenzell https://x.com/sbenzell?lang=en This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com

The AI-powered Podcast Player

Save insights by tapping your headphones, chat with episodes, discover the best highlights - and more!
App store bannerPlay store banner
Get the app