

EA Forum Podcast (Curated & popular)
EA Forum Team
Audio narrations from the Effective Altruism Forum, including curated posts and posts with 125 karma.
If you'd like more episodes, subscribe to the "EA Forum (All audio)" podcast instead.
If you'd like more episodes, subscribe to the "EA Forum (All audio)" podcast instead.
Episodes
Mentioned books

Feb 2, 2026 • 15min
[Linkpost] “Are the Costs of AI Agents Also Rising Exponentially?” by Toby_Ord
This is a link post. There is an extremely important question about the near-future of AI that almost no-one is asking. We’ve all seen the graphs from METR showing that the length of tasks AI agents can perform has been growing exponentially over the last 7 years. While GPT-2 could only do software engineering tasks that would take someone a few seconds, the latest models can (50% of the time) do tasks that would take a human a few hours. As this trend shows no signs of stopping, people have naturally taken to extrapolating it out, to forecast when we might expect AI to be able to do tasks that take an engineer a full work-day; or week; or year. But we are missing a key piece of information — the cost of performing this work. Over those 7 years AI systems have grown exponentially. The size of the models (parameter count) has grown by 4,000x and the number of times they are run in each task (tokens generated) has grown by about 100,000x. AI researchers have also found massive efficiencies, but it is eminently plausible that the cost for the peak performance measured by METR has been [...] ---Outline:(13:02) Conclusions(14:05) Appendix(14:08) METR has a similar graph on their page for GPT-5.1 codex. It includes more models and compares them by token counts rather than dollar costs: ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/AbHPpGTtAMyenWGX8/are-the-costs-of-ai-agents-also-rising-exponentially
Linkpost URL:https://www.tobyord.com/writing/hourly-costs-for-ai-agents
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 10min
[Linkpost] “Evidence that Recent AI Gains are Mostly from Inference-Scaling” by Toby_Ord
This is a link post. In the last year or two, the most important trend in modern AI came to an end. The scaling-up of computational resources used to train ever-larger AI models through next-token prediction (pre-training) stalled out. Since late 2024, we’ve seen a new trend of using reinforcement learning (RL) in the second stage of training (post-training). Through RL, the AI models learn to do superior chain-of-thought reasoning about the problem they are being asked to solve. This new era involves scaling up two kinds of compute: the amount of compute used in RL post-training the amount of compute used every time the model answers a question Industry insiders are excited about the first new kind of scaling, because the amount of compute needed for RL post-training started off being small compared to the tremendous amounts already used in next-token prediction pre-training. Thus, one could scale the RL post-training up by a factor of 10 or 100 before even doubling the total compute used to train the model. But the second new kind of scaling is a problem. Major AI companies were already starting to spend more compute serving their models to customers than in the training [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/5zfubGrJnBuR5toiK/evidence-that-recent-ai-gains-are-mostly-from-inference
Linkpost URL:https://www.tobyord.com/writing/mostly-inference-scaling
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 15min
[Linkpost] “The Extreme Inefficiency of RL for Frontier Models” by Toby_Ord
This is a link post. The new scaling paradigm for AI reduces the amount of information a model can learn from per hour of training by a factor of 1,000 to 1,000,000. I explore what this means and its implications for scaling. The last year has seen a massive shift in how leading AI models are trained. 2018–2023 was the era of pre-training scaling. LLMs were primarily trained by next-token prediction (also known as pre-training). Much of OpenAI's progress from GPT-1 to GPT-4, came from scaling up the amount of pre-training by a factor of 1,000,000. New capabilities were unlocked not through scientific breakthroughs, but through doing more-or-less the same thing at ever-larger scales. Everyone was talking about the success of scaling, from AI labs to venture capitalists to policy makers. However, there's been markedly little progress in scaling up this kind of training since (GPT-4.5 added one more factor of 10, but was then quietly retired). Instead, there has been a shift to taking one of these pre-trained models and further training it with large amounts of Reinforcement Learning (RL). This has produced models like OpenAI's o1, o3, and GPT-5, with dramatic improvements in reasoning (such as solving [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/64iwgmMvGSTBHPdHg/the-extreme-inefficiency-of-rl-for-frontier-models
Linkpost URL:https://www.tobyord.com/writing/inefficiency-of-reinforcement-learning
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 35min
[Linkpost] “Inference Scaling Reshapes AI Governance” by Toby_Ord
This is a link post. The shift from scaling up the pre-training compute of AI systems to scaling up their inference compute may have profound effects on AI governance. The nature of these effects depends crucially on whether this new inference compute will primarily be used during external deployment or as part of a more complex training programme within the lab. Rapid scaling of inference-at-deployment would: lower the importance of open-weight models (and of securing the weights of closed models), reduce the impact of the first human-level models, change the business model for frontier AI, reduce the need for power-intense data centres, and derail the current paradigm of AI governance via training compute thresholds. Rapid scaling of inference-during-training would have more ambiguous effects that range from a revitalisation of pre-training scaling to a form of recursive self-improvement via iterated distillation and amplification. The end of an era — for both training and governance The intense year-on-year scaling up of AI training runs has been one of the most dramatic and stable markers of the Large Language Model era. Indeed it had been widely taken to be a permanent fixture of the AI landscape and the basis of many approaches to [...] ---Outline:(01:06) The end of an era -- for both training and governance(05:24) Scaling inference-at-deployment(06:42) Reducing the number of simultaneously served copies of each new model(08:45) Reducing the value of securing model weights(09:30) Reducing the benefits and risks of open-weight models(10:05) Unequal performance for different tasks and for different users(12:08) Changing the business model and industry structure(12:50) Reducing the need for monolithic data centres(17:16) Scaling inference-during-training(28:07) Conclusions(30:17) Appendix. Comparing the costs of scaling pre-training vs inference-at-deployment ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/RnsgMzsnXcceFfKip/inference-scaling-reshapes-ai-governance
Linkpost URL:https://www.tobyord.com/writing/inference-scaling-reshapes-ai-governance
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 20min
[Linkpost] “Is there a Half-Life for the Success Rates of AI Agents?” by Toby_Ord
This is a link post. Building on the recent empirical work of Kwa et al. (2025), I show that within their suite of research-engineering tasks the performance of AI agents on longer-duration tasks can be explained by an extremely simple mathematical model — a constant rate of failing during each minute a human would take to do the task. This implies an exponentially declining success rate with the length of the task and that each agent could be characterised by its own half-life. This empirical regularity allows us to estimate the success rate for an agent at different task lengths. And the fact that this model is a good fit for the data is suggestive of the underlying causes of failure on longer tasks — that they involve increasingly large sets of subtasks where failing any one fails the task. Whether this model applies more generally on other suites of tasks is unknown and an important subject for further work. METR's results on the length of tasks agents can reliably complete A recent paper by Kwa et al. (2025) from the research organisation METR has found an exponential trend in the duration of the tasks that frontier AI agents can [...] ---Outline:(05:33) Explaining these results via a constant hazard rate(14:54) Upshots of the constant hazard rate model(18:47) Further work(19:25) References ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/qz3xyqCeriFHeTAJs/is-there-a-half-life-for-the-success-rates-of-ai-agents-3
Linkpost URL:https://www.tobyord.com/writing/half-life
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Feb 2, 2026 • 17min
[Linkpost] “Inference Scaling and the Log-x Chart” by Toby_Ord
This is a link post. Improving model performance by scaling up inference compute is the next big thing in frontier AI. But the charts being used to trumpet this new paradigm can be misleading. While they initially appear to show steady scaling and impressive performance for models like o1 and o3, they really show poor scaling (characteristic of brute force) and little evidence of improvement between o1 and o3. I explore how to interpret these new charts and what evidence for strong scaling and progress would look like. From scaling training to scaling inference The dominant trend in frontier AI over the last few years has been the rapid scale-up of training — using more and more compute to produce smarter and smarter models. Since GPT-4, this kind of scaling has run into challenges, so we haven’t yet seen models much larger than GPT-4. But we have seen a recent shift towards scaling up the compute used during deployment (aka 'test-time compute’ or ‘inference compute’), with more inference compute producing smarter models. You could think of this as a change in strategy from improving the quality of your employees’ work via giving them more years of training in which acquire [...] ---
First published:
February 2nd, 2026
Source:
https://forum.effectivealtruism.org/posts/zNymXezwySidkeRun/inference-scaling-and-the-log-x-chart
Linkpost URL:https://www.tobyord.com/writing/inference-scaling-and-the-log-x-chart
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jan 30, 2026 • 16min
[Linkpost] “The Scaling Paradox” by Toby_Ord
This is a link post. AI capabilities have improved remarkably quickly, fuelled by the explosive scale-up of resources being used to train the leading models. But if you examine the scaling laws that inspired this rush, they actually show extremely poor returns to scale. What's going on? AI Scaling is Shockingly Impressive The era of LLMs has seen remarkable improvements in AI capabilities over a very short time. This is often attributed to the AI scaling laws — statistical relationships which govern how AI capabilities improve with more parameters, compute, or data. Indeed AI thought-leaders such as Ilya Sutskever and Dario Amodei have said that the discovery of these laws led them to the current paradigm of rapid AI progress via a dizzying increase in the size of frontier systems. Before the 2020s, most AI researchers were looking for architectural changes to push the frontiers of AI forwards. The idea that scale alone was sufficient to provide the entire range of faculties involved in intelligent thought was unfashionable and seen as simplistic. A key reason it worked was the tremendous versatility of text. As Turing had noted more than 60 years earlier, almost any challenge that one could pose to [...] ---
First published:
January 30th, 2026
Source:
https://forum.effectivealtruism.org/posts/742xJNTqer2Dt9Cxx/the-scaling-paradox
Linkpost URL:https://www.tobyord.com/writing/the-scaling-paradox
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.

Jan 29, 2026 • 2min
“If EA ruled the world, career advisors would tell some people to work for the postal service” by Toby Tremlett🔹
EA thinking is thinking on the margin. When EAs prioritise causes, they are prioritising causes given the fact that they only control their one career, or, sometimes, given that they have some influence over a community of a few thousand people, and the distribution of some millions or billions of dollars. Some critiques of EA act as if statements about cause prioritisation are absolute rather than relative. I.e. that EAs are saying that literally everyone should be working on AI Safety, or, the flipside, that EAs are saying that no one should be working on [insert a problem which is pressing, but not among the most urgent to commit the next million dollars to]. In conversations that sound like this, I've often turned to the idea that, if EAs controlled all the resources in the world, career advisors at the hypothetical world government's version of 80,000 Hours would be advising some people to be postal workers. Given that the EA world government will have long ago filled the current areas of direct EA work, it could be the single most impactful thing a person could do with their skillset, given the comparative neglectedness of work in the [...] ---
First published:
January 16th, 2026
Source:
https://forum.effectivealtruism.org/posts/MZ5g33fXuxd6bSgJW/if-ea-ruled-the-world-career-advisors-would-tell-some-people
---
Narrated by TYPE III AUDIO.

Jan 29, 2026 • 9min
“Why Isn’t EA at the Table When $121 Billion Gets Allocated to Biodiversity Every Year?” by David Goodman
There is an insane amount of money being thrown around by international organizations and agreements. Nobody with any kind of power over these agreements is asking basic EA questions like: "What are the problems we're trying to solve?" "What are the most neglected aspects of those problems?" and "What is the most cost-effective way to address those neglected areas?" As someone coming from an EA background reading through plans for $200-700 billion in annual funding commitments that focus on unimaginative and ineffective interventions, it makes you want to tear your hair out. So much good could be done with that money. EA focuses a lot on private philanthropy, earning-to-give (though less so post-SBF), and the usual pots of money. But why don't we have delegations who are knowledgeable in international diplomacy going to COPs and advocating for more investment in lab-grown meat, alternative proteins, or lithium recycling? It seems like there would be insane alpha in such a strategy. An example: The Global Biodiversity Framework The Kunming-Montreal Global Biodiversity Framework (GBF) was adopted in 2022 to halt biodiversity loss. It has 23 targets, commitments of $200 billion annually by 2030 and $700 billion by 2050, and near-universal adoption from [...] ---Outline:(01:12) An example: The Global Biodiversity Framework(02:13) What Is That Money Actually Being Spent On?(03:02) The Elephant in the Room Literally Nobody is Talking About: Beef(04:21) The Absolutely Insane Funding Gap(05:26) The Leverage Point Were Ignoring(06:47) What Would EA Engagement Look Like? ---
First published:
January 20th, 2026
Source:
https://forum.effectivealtruism.org/posts/Peaq4HNhn8agsZY3z/why-isn-t-ea-at-the-table-when-usd121-billion-gets-allocated
---
Narrated by TYPE III AUDIO.

Jan 27, 2026 • 15min
“5 ways to better charity work in 2026” by NickLaing
I've started a substack, so a few more people might encounter my spicy takes - I'll still mostly be here. USAID is gone. Direct country aid to low income countries is down 25%. So now's a great time to share five ways I think development charity can be done better in 2026. To state the obvious... none of these ideas will be the best approach all of the time, there's plenty of grey area and nuance. I start a little playful, then get a little more serious. 1. Ditch the Cars Close your eyes and picture the first thing that comes into your head when I say “NGO”. It might be………… a shiny white Landcruiser The view from the front window of my hut But owning cars doesn’t usually make economic sense in low income countries. The ‘real’ market makes this clear. Business rarely buy cars, instead they use public transport or motorbikes. When companies do own cars, its more Corolla than Landcruiser as well. Cars are often more expensive dollar-for-dollar than in richer countries, fuel cost are high and many NGOs hire drivers, all while public transport is dirt cheap. To move 100km in Uganda [...] ---Outline:(00:43) 1. Ditch the Cars(02:49) 2. Fund Solutions not Projects(07:07) 3. Fund cost effective solutions(08:06) 4. Fund Bimodal - Test and Scale(11:59) 5. Pay workers less ---
First published:
January 19th, 2026
Source:
https://forum.effectivealtruism.org/posts/LvE3s6kCJk4Jck2ww/5-ways-to-better-charity-work-in-2026
---
Narrated by TYPE III AUDIO.
---Images from the article:Apple Podcasts and Spotify do not show images in the episode description. Try Pocket Casts, or another podcast app.


