

The Nonlinear Library
The Nonlinear Fund
The Nonlinear Library allows you to easily listen to top EA and rationalist content on your podcast player. We use text-to-speech software to create an automatically updating repository of audio content from the EA Forum, Alignment Forum, LessWrong, and other EA blogs. To find out more, please visit us at nonlinear.org
Episodes
Mentioned books

Mar 18, 2024 • 34min
LW - Measuring Coherence of Policies in Toy Environments by dx26
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Measuring Coherence of Policies in Toy Environments, published by dx26 on March 18, 2024 on LessWrong.
This post was produced as part of the Astra Fellowship under the Winter 2024 Cohort, mentored by Richard Ngo. Thanks to Martín Soto, Jeremy Gillien, Daniel Kokotajlo, and Lukas Berglund for feedback.
Summary
Discussions around the likelihood and threat models of AI existential risk (x-risk) often hinge on some informal concept of a "coherent", goal-directed AGI in the future maximizing some utility function unaligned with human values. Whether and how coherence may develop in future AI systems, especially in the era of LLMs, has been a subject of considerable debate.
In this post, we provide a preliminary mathematical definition of the coherence of a policy as how likely it is to have been sampled via uniform reward sampling (URS), or uniformly sampling a reward function and then sampling from the set of policies optimal for that reward function, versus uniform policy sampling (UPS). We provide extensions of the model for sub-optimality and for "simple" reward functions via uniform sparsity sampling (USS).
We then build a classifier for the coherence of policies in small deterministic MDPs, and find that properties of the MDP and policy, like the number of self-loops that the policy takes, are predictive of coherence when used as features for the classifier. Moreover, coherent policies tend to preserve optionality, navigate toward high-reward areas of the MDP, and have other "agentic" properties.
We hope that our metric can be iterated upon to achieve better definitions of coherence and a better understanding of what properties dangerous AIs will have.
Introduction
Much of the current discussion about AI x-risk centers around "agentic", goal-directed AIs having misaligned goals. For instance, one of the most dangerous possibilities being discussed is of mesa-optimizers developing within superhuman models, leading to scheming behavior and deceptive alignment. A significant proportion of current alignment work focuses on detecting, analyzing (e.g. via analogous case studies of model organisms), and possibly preventing deception.
Some researchers in the field believe that intelligence and capabilities are inherently tied with "coherence", and thus any sufficiently capable AI will approximately be a coherent utility function maximizer.
In their paper "Risks From Learned Optimization" formally introducing mesa-optimization and deceptive alignment, Evan Hubinger et al. discuss the plausibility of mesa-optimization occurring in RL-trained models. They analyze the possibility of a base optimizer, such as a hill-climbing local optimization algorithm like stochastic gradient descent, producing a mesa-optimizer model that internally does search (e.g.
Monte Carlo tree search) in pursuit of a mesa-objective (in the real world, or in the "world-model" of the agent), which may or may not be aligned with human interests. This is in contrast to a model containing many complex heuristics that is not well-defined internally as a consequentialist mesa-optimizer; one extreme example is a tabular model/lookup table that matches observations to actions, which clearly does not do any internal search or have any consequentialist cognition.
They speculate that mesa-optimizers may be selected for because they generalize better than other models, and/or may be more compressible information-theoretic wise, and may thus be selected for because of inductive biases in the training process.
Other researchers believe that scheming and other mesa-optimizing behavior is implausible with the most common current ML architectures, and that the inductive bias argument and other arguments for getting misaligned mesa-optimizers by default (like the counting argument, which suggests that there are many more ...

Mar 18, 2024 • 14min
LW - Community Notes by X by NicholasKees
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Community Notes by X, published by NicholasKees on March 18, 2024 on LessWrong.
I did an exploration into how
Community Notes (formerly Birdwatch) from X (formerly Twitter) works, and how its algorithm decides which notes get displayed to the wider community. In this post, I'll share and explain what I found, as well as offer some comments.
Community Notes is a fact-checking tool available to US-based users of X/Twitter which allows readers to attach notes to posts to give them clarifying context. It uses an
open-source
bridging-based ranking algorithm intended to promote notes which receive cross-partisan support, and demote notes with a strong partisan lean. The tool seems to be pretty popular overall, and most of the criticism aimed toward it seems to be about how Community Notes fails to be a
sufficient replacement for other, more top-down moderation systems.[1]
This seems interesting to me as an experiment in social technology that aims to improve group epistemics, and understanding how it works seems like a good place to start before trying to design other group epistemics algorithms.
How does the ranking algorithm work?
The full algorithm, while open-source, is quite complicated and I don't fully understand every facet of it, but I've done a once-over read of the
original Birdwatch paper, gone through the
Community Notes documentation, and read this
summary/commentary by Vitalik Buterin. Here's a summary of the "core algorithm" as I understand it (to which much extra logic gets attached):
Users are the people who have permission to rate community notes. To get permission, a person needs to have had an account on X for more than 6 months, be verified, and have committed no violations of X's rules. The rollout of community notes is slow, however, and so eligible account holders are only added to the Community Notes user pool periodically, and at random.
New users don't immediately get permission to write their own notes, having to first get a "rating impact" by rating existing notes (will explain this later).
Notes are short comments written by permitted users on posts they felt needed clarification. These are not immediately made publicly visible on X, first needing to be certified as "helpful" by aggregating ratings by other Community Notes users using their ranking algorithm.
Users are invited to rate notes as either "not helpful," "somewhat helpful," or "helpful." The results of all user-note pairs are recorded in a matrix r where each element run{0,0.5,1,null} corresponds to how user u rated note n. Users only rate a small fraction of notes, so most elements in the matrix are "null." Non-null elements are called "observed" ratings, and values of 0, 0.5, and 1 correspond to the qualitative ratings of "not helpful," "somewhat helpful," and "helpful" respectively.
This rating matrix is then used by their algorithm to compute a helpfulness score for each note. It does this is by learning a model of the ratings matrix which explains each observed rating as a sum of four terms:
^run=μ+iu+in+fufn
Where:
μ: Global intercept (shared across all ratings)
iu: User intercept (shared across all ratings by user u)
in: Note intercept (shared across all ratings of note n) This is the term which will eventually determine a note's "helpfulness."
fu, fn: Factor vectors for u and n. The dot product of these vectors is intended to describe the "ideological agreement" between a user and a note. These vectors are currently one dimensional, though the algorithm is in principle agnostic to the number of dimensions.
For U users and N notes that gets us 1 + 2U + 2N free parameters making up this model. These parameters are estimated via gradient descent every hour, minimizing the following squared error loss function (for observed ratings only):
run(run^run)2+λi(i2u+i2n+μ2)+λf(||fu||2...

Mar 18, 2024 • 2min
AF - AtP*: An efficient and scalable method for localizing LLM behaviour to components by Neel Nanda
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: AtP*: An efficient and scalable method for localizing LLM behaviour to components, published by Neel Nanda on March 18, 2024 on The AI Alignment Forum.
Authors: János Kramár, Tom Lieberum, Rohin Shah, Neel Nanda
A new paper from the Google DeepMind mechanistic interpretability team, from core contributors János Kramár and Tom Lieberum
Tweet thread summary, paper
Abstract:
Activation Patching is a method of directly computing causal attributions of behavior to model components. However, applying it exhaustively requires a sweep with cost scaling linearly in the number of model components, which can be prohibitively expensive for SoTA Large Language Models (LLMs). We investigate Attribution Patching (AtP), a fast gradient-based approximation to Activation Patching and find two classes of failure modes of AtP which lead to significant false negatives.
We propose a variant of AtP called AtP*, with two changes to address these failure modes while retaining scalability. We present the first systematic study of AtP and alternative methods for faster activation patching and show that AtP significantly outperforms all other investigated methods, with AtP* providing further significant improvement. Finally, we provide a method to bound the probability of remaining false negatives of AtP* estimates.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org.

Mar 18, 2024 • 1min
EA - Carlo: uncertainty analysis in Google Sheets by ProbabilityEnjoyer
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Carlo: uncertainty analysis in Google Sheets, published by ProbabilityEnjoyer on March 18, 2024 on The Effective Altruism Forum.
I've been working on Carlo, a tool that lets you do uncertainty and sensitivity analysis with Google Sheets spreadsheets.
Please note Carlo is an (expensive) commercial product. The pricing is aimed at professionals making important decisions.
Some of the key features that set Carlo apart are:
Works with your existing Google Sheets calculations
Gold-standard sensitivity analysis
Our sensitivity analysis offers a true metric of variable importance: it can tell you what fraction of the output variance is due to each of the inputs and their interactions.
Unusually flexible input
Inputs can be given using novel, convenient probability distributions that flexibly match your beliefs.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Mar 18, 2024 • 6min
EA - EA "Worldviews" Need Rethinking by Richard Y Chappell
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: EA "Worldviews" Need Rethinking, published by Richard Y Chappell on March 18, 2024 on The Effective Altruism Forum.
I like Open Phil's worldview diversification. But I don't think their current roster of worldviews does a good job of justifying their current practice. In this post, I'll suggest a reconceptualization that may seem radical in theory but is conservative in practice.
Something along these lines strikes me as necessary to justify giving substantial support to paradigmatic Global Health & Development charities in the face of competition from both Longtermist/x-risk and Animal Welfare competitor causes.
Current Orthodoxy
I take it that Open Philanthropy's current "cause buckets" or candidate worldviews are typically conceived of as follows:
neartermist - incl. animal welfare
neartermist - human-only
longtermism / x-risk
We're told that how to weigh these cause areas against each other "hinge[s] on very debatable, uncertain questions." (True enough!) But my impression is that EAs often take the relevant questions to be something like, should we be speciesist? and should we only care about present beings? Neither of which strikes me as especially uncertain (though I know others disagree).
The Problem
I worry that the "human-only neartermist" bucket lacks adequate philosophical foundations. I think Global Health & Development charities are great and worth supporting (not just for speciesist presentists), so I hope to suggest a firmer grounding. Here's a rough attempt to capture my guiding thought in one paragraph:
Insofar as the GHD bucket is really motivated by something like sticking close to common sense, "neartermism" turns out to be the wrong label for this. Neartermism may mandate prioritizing aggregate shrimp over poor people; common sense certainly does not. When the two come apart, we should give more weight to the possibility that (as-yet-unidentified) good principles support the common-sense worldview.
So we should be especially cautious of completely dismissing commonsense priorities in a worldview-diversified portfolio (even as we give significant weight and support to a range of theoretically well-supported counterintuitive cause areas).
A couple of more concrete intuitions that guide my thinking here: (1) fetal anesthesia as a cause area intuitively belongs with 'animal welfare' rather than 'global health & development', even though fetuses are human. (2) It's a mistake to conceive of global health & development as purely neartermist: the strongest case for it stems from positive, reliable flow-through effects.
A Proposed Solution
I suggest that we instead conceive of (1) Animal Welfare, (2) Global Health & Development, and (3) Longtermist / x-risk causes as respectively justified by the following three "cause buckets":
Pure suffering reduction
Reliable global capacity growth
Speculative moonshots
In terms of the underlying worldview differences, I think the key questions are something like:
(i) How confident should we be in our explicit expected value estimates? How strongly should we discount highly speculative endeavors, relative to "commonsense" do-gooding?
(ii) How does the total (intrinsic + instrumental) value of improving human lives & capacities compare to the total (intrinsic) value of pure suffering reduction?
[Aside: I think it's much more reasonable to be uncertain about these (largely empirical) questions than about the (largely moral) questions that underpin the orthodox breakdown of EA worldviews.]
Hopefully it's clear how these play out: greater confidence in EEV lends itself to supporting moonshots to reduce x-risk or otherwise seek to improve the long-term future in a highly targeted, deliberate way. Less confidence here may support more generic methods of global capacity-building, such as improving health and (were there any ...

Mar 18, 2024 • 17min
LW - On Devin by Zvi
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: On Devin, published by Zvi on March 18, 2024 on LessWrong.
Introducing Devin
Is the era of AI agents writing complex code systems without humans in the loop upon us?
Cognition is calling Devin 'the first AI software engineer.'
Here is a two minute demo of Devin benchmarking LLM performance.
Devin has its own web browser, which it uses to pull up documentation.
Devin has its own code editor.
Devin has its own command line.
Devin uses debugging print statements and uses the log to fix bugs.
Devin builds and deploys entire stylized websites without even being directly asked.
What could possibly go wrong? Install this on your computer today.
Padme.
The Real Deal
I would by default assume all demos were supremely cherry-picked. My only disagreement with Austen Allred's statement here is that this rule is not new:
Austen Allred: New rule:
If someone only shows their AI model in tightly controlled demo environments we all assume it's fake and doesn't work well yet
But in this case Patrick Collison is a credible source and he says otherwise.
Patrick Collison: These aren't just cherrypicked demos. Devin is, in my experience, very impressive in practice.
Here we have Mckay Wrigley using it for half an hour. This does not feel like a cherry-picked example, although of course some amount of select is there if only via the publication effect.
He is very much a maximum acceleration guy, for whom everything is always great and the future is always bright, so calibrate for that, but still yes this seems like evidence Devin is for real.
This article in Bloomberg from Ashlee Vance has further evidence. It is clear that Devin is a quantum leap over known past efforts in terms of its ability to execute complex multi-step tasks, to adapt on the fly, and to fix its mistakes or be adjusted and keep going.
For once, when we wonder 'how did they do that, what was the big breakthrough that made this work' the Cognition AI people are doing not only the safe but also the smart thing and they are not talking.
They do have at least one series rival, as Magic.ai has raised $100 million from the venture team of Daniel Gross and Nat Friedman to build 'a superhuman software engineer,' including training their own model. The article seems strange interested in where AI is 'a bubble' as opposed to this amazing new technology.
This is one of those 'helps until it doesn't situations' in terms of jobs:
vanosh: Seeing this is kinda scary. Like there is no way companies won't go for this instead of humans.
Should I really have studied HR?
Mckay Wrigley: Learn to code! It makes using Devin even more useful.
Devin makes coding more valuable, until we hit so many coders that we are coding everything we need to be coding, or the AI no longer needs a coder in order to code. That is going to be a ways off. And once it happens, if you are not a coder, it is reasonable to ask yourself: What are you even doing? Plumbing while hoping for the best will probably not be a great strategy in that world.
The Metric
Devin can sometimes (13.8% of the time?!) do actual real jobs on Upwork with nothing but a prompt to 'figure it out.'
Aravind Srinivas (CEO Perplexity): This is the first demo of any agent, leave alone coding, that seems to cross the threshold of what is human level and works reliably. It also tells us what is possible by combining LLMs and tree search algorithms: you want systems that can try plans, look at results, replan, and iterate till success. Congrats to Cognition Labs!
Andres Gomez Sarmiento: Their results are even more impressive you read the fine print. All the other models were guided whereas devin was not. Amazing.
Deedy: I know everyone's taking about it, but Devin's 13% on SWE Bench is actually incredible.
Just take a look at a sample SWE Bench problem: this is a task for a human! Shout out to Car...

Mar 18, 2024 • 2min
EA - Personal fit is different from the thing that you already like by Joris P
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Personal fit is different from the thing that you already like, published by Joris P on March 18, 2024 on The Effective Altruism Forum.
This is a
Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked.
This draft lacks the polish of a full post, but the content is almost there. The kind of constructive feedback you would normally put on a Forum post is very welcome.
I wrote most of this last year. I also think I'm making a pretty basic point and don't think I'm articulating it amazingly, but I'm trying to write more and can imagine people (especially newer to EA) finding this useful - so here we go
Last week[1] I was at an event with a lot of people relatively new to EA - lots of them had recently finished the introductory fellowship. Talking through their plans for the future, I noticed that many of them used the concept 'personal fit' to justify their plans to work on a problem they had already found important before learning about EA.
They would say they wanted to work on combating climate change or increasing gender equality, because
They had studied this and felt really motivated to work on it
Therefore, their 'personal fit' was really good for working on this topic
Therefore surely, it was the highest impact thing they could be doing.
I think a lot of them were likely mistaken, in one or more of the following ways:
They overestimated their personal fit for roles in these (broad!) fields
They underestimated the differences in impact between career options and cause areas
They thought that they were motivated to do the most good they could, but in fact they were motivated by a specific cause
To be clear: the ideal standard here is probably unattainable, and I surely don't live up to it. However, if I could stress one thing, it would be that people scoping out their career options could benefit from first identifying high-impact career options, and only second thinking about which ones they might have a great personal fit for - not the other way around.
^
This was last year
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Mar 18, 2024 • 3min
EA - Ways in which I'm not living up to my EA values by Joris P
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Ways in which I'm not living up to my EA values, published by Joris P on March 18, 2024 on The Effective Altruism Forum.
This is a
Draft Amnesty Week draft. It may not be polished, up to my usual standards, fully thought through, or fully fact-checked.
When I was pretty new to EA, I was way too optimistic about how Wise and Optimized and Ethical and All-Knowing experienced EAs would be.
I thought Open Phil would have some magic spreadsheets with the answers to all questions in the universe
I thought that, surely, experienced EAs had for 99% figured out what they thought was the biggest problem in the world
I imagined all EAs to have optimized almost everything, and to basically endorse all their decisions: their giving practices, their work-life balance, the way they talked about EA to others, etc.
I've now been around the community for a few years. I'm still really grateful for and excited about EA ideas, and I love being around the people inspired by EA ideas (I even
work on growing our community!). However, I now also realize that today, I am far from how Wise and Optimized and Ethical and All-Knowing Joris-from-4-years-ago expected future Joris and his peers to be.
There's two things that caused me to not live up to those ideals:
I was naive about how Wise and Optimized and Ethical and All-Knowing someone could realistically be
There's good things I could reasonably do or should have reasonably done in the past 4 years
To make this concrete, I wanted to share some ways in which I think I'm not living up to my EA values or expectations from a few years ago. I think Joris-from-4-years-ago would've found this list helpful.[1]
I'm still not fully vegan
Donating:
I just default to the community norm of donating 10%, without having thought about it hard
I haven't engaged for more than 30 minutes with arguments around e.g. patient philanthropy
I left my GWWC donations to literally the last day of the year and didn't spend more than one hour on deciding where to donate
I have a lot less certainty over the actual positive impact of the programs we run than I expected when I started this job
I'm still as bad at math as I was in uni, meaning my botecs are just not that great
It's so, so much harder than I expected to account for counterfactuals and to find things you can measure that are robustly good
I still find it really hard to pitch EA
I hope this inspires some people (especially those who I (and others) might look up to) to share how they're not perfect. What are some ways in which you're not living up to your values, or to what you-from-the-past maybe expected you would be doing by now?
^
I'll leave it up to you whether these fall in category 1 (basically unattainable) or 2 (attainable). I also do not intend to turn this into a discussion of what things EAs "should" do, which things are actually robustly good, etc.
Thanks for listening. To help us out with The Nonlinear Library or to learn more, please visit nonlinear.org

Mar 17, 2024 • 8min
EA - The Scale of Fetal Suffering in Late-Term Abortions by Ariel Simnegar
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Scale of Fetal Suffering in Late-Term Abortions, published by Ariel Simnegar on March 17, 2024 on The Effective Altruism Forum.
This is a draft amnesty post.
Summary
It seems plausible that fetuses can suffer from 12 weeks of age, and quite reasonable that then can suffer from 24 weeks of age.
Some late-term abortion procedures seem that they might cause a fetus excruciating suffering.
Over 35,000 of these procedures occur each year in the US alone.
Further research would be desired on interventions to reduce this suffering, such as mandating fetal anesthesia for late-term abortions.
Background
Most people agree that a fetus has the capacity to suffer at some point. If a fetus has the capacity to suffer, then we ought to reduce that suffering when possible. Fetal anesthesia is standard practice for fetal surgery,[1] but I am unaware of it ever being used during late-term abortions. If the fetus can suffer, these procedures likely cause the fetus extreme pain.
I think the cultural environment EAs usually live in tends to minimize concern for fetal suffering. Some worry that promoting care for fetal welfare will play into the hands of abortion opposers. However, as Brian Tomasik has pointed out, one can certainly support abortion as an option while recognizing the potential for fetal consciousness during late-term abortion procedures.
Surgical Abortion Procedures
LI (Labor Induction)[2]
Gestational age: 20+ weeks.
Method: The fetus is administered a lethal injection with no anesthesia, often of potassium chloride, which causes cardiac arrest and death within a minute.
The Human Rights Watch calls the use of potassium chloride for the death penalty without anesthesia "excruciatingly painful" because it "inflames the potassium ions in the sensory nerve fibers, literally burning up the veins as it travels to the heart."[3] The American Veterinary Medical Association considers the use of potassium chloride without anesthesia "unacceptable" when euthanizing vertebrate animals.[4]
D&E (Dilation and Evacuation)[5]
Gestational age: 13-24 weeks.
Method: The fetus's limbs are torn off before the fetus's head is crushed. The procedure takes several minutes.
When Can a Fetus Suffer?
The traditional view of fetal sentience has been that "the cortex and intact thalamocortical tracts," which develop after 24 weeks, "are necessary for pain experience."[6] However, mounting evidence of suffering from adults with disabled cortices and animals without cortices has cast doubt on the traditional view.[7] "Overall, the evidence, and a balanced reading of that evidence, points towards an immediate and unreflective pain experience mediated by the developing function of the nervous
system from as early as 12 weeks."[8] 12 weeks is when the first projections are made into the fetus's cortical subplate,[9] which will eventually grow into the cortex.
I am a layperson who doesn't have the expertise to evaluate these studies. However, I don't see a good reason to have substantially less concern for 24+ week fetuses than for infants. Though the arguments for 12-24 week fetuses are weaker, it still seems plausible that they have some capacity to suffer. Given the potential scale of fetal suffering due to late-term abortions, it seems that this evidence is worth seriously examining.
Scale in US and UK
2021 UK[10]
The following is a selection from the UK abortion data tables:
7a: Weeks from Gestation
13 to 14
15 to 19
20+
Total Abortions
5,322
5,528
2,686
D&E (%)
25%
74%
44%
LI with surgical evacuation (%)
0%
1%
18%
LI with medical evacuation (%)
0%
0%
20%
Assuming the given percentages are exact, this gives us:
Abortion Procedure
Abortions per Year (UK)
D&E
6,603
LI
1,076
2020 USA[11]
36,531 surgical abortions at >13 weeks and 4,382 abortions at 21 weeks were reported.
In 2021 UK, 38% of the 20 we...

Mar 17, 2024 • 6min
LW - The Worst Form Of Government (Except For Everything Else We've Tried) by johnswentworth
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The Worst Form Of Government (Except For Everything Else We've Tried), published by johnswentworth on March 17, 2024 on LessWrong.
Churchill
famously called democracy "the worst form of Government except for all those other forms that have been tried from time to time" - referring presumably to the relative success of his native Britain, the US, and more generally Western Europe and today most of the first world.
I claim that Churchill was importantly wrong. Not (necessarily) wrong about the relative success of Britain/US/etc, but about those countries' governments being well-described as simple democracy. Rather, I claim, the formula which has worked well in e.g.
Britain and the US diverges from pure democracy in a crucial load-bearing way; that formula works better than pure democracy both in theory and in practice, and when thinking about good governance structures we should emulate the full formula rather than pure democracy.
Specifically, the actual governance formula which is "worst except for everything else we've tried" is:
Give a de-facto veto to each major faction
Within each major faction, do pure democracy.
A Stylized Tale of Democracy
Let's start with the obvious failure mode of pure democracy: suppose a country consists of 51% group A, 49% group B, and both groups hate each other and have centuries-long blood feuds. Some first world country decides to invade, topple the local dictator, and hold democratic elections for a new government. Group A extremist candidate wins with a 51% majority, promising to enact divine vengeance upon the B's for their centuries of evil deeds.
Group B promptly rebels, and the country descends into civil war.
This is obviously a stylized, oversimplified picture, but… well, according to wikipedia the three largest ethnic groups in Iraq are the Shiites (14 million), Sunni arabs (9 million), and Sunni Kurds (4.7 million), which would make the Shiites just over 50% (excluding the various smaller groups)[1]. In the 2005 elections, the Shiites claimed 48% of the seats - not quite a majority but close enough to dominate political decisions in practice. Before long, the government was led by
a highly sectarian Shiite, who generally tried to limit the power of Sunnis and Kurds. In response, around 2013/2014, outright Sunni rebellion coalesced around ISIL and Iraq plunged into civil war.
Now, I'm not about to claim that this was democracy at its purest - the US presumably put its thumb on the scales, the elections were presumably less than ideal, Iraq's political groups presumably don't perfectly cleave into two camps, etc. But the outcome matches the prediction of the oversimplified model well enough that I expect the oversimplified model captures the main drivers basically-correctly.
So what formula should have been applied in Iraq, instead?
The Recipe Which Works In Practice
In its infancy, the US certainly had a large minority which was politically at odds with the majority: the old North/South split. The solution was a two-house Congress. Both houses of Congress were democratically elected, but the votes were differently weighted (one population-weighted, one a fixed number of votes per state), in such a way that both groups would have a de-facto veto on new legislation. In other words: each major faction received a de-facto veto.
That was the key to preventing the obvious failure mode.
Particularly strong evidence for this model came later on in US history. As new states were added, the Southern states were at risk of losing their de-facto veto. This came to a head with Kansas: by late 1860 it became clear that Kansas was likely to be added as a state and would align with the Northern faction, fully eliminating the Southern veto.
In response, South Carolina formally seceded in December 1860, followed by five more Southern states ...


