
The Nonlinear Library EA - Survey on the acceleration risks of our new RFPs to study LLM capabilities by Ajeya
Nov 14, 2023
14:00
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Survey on the acceleration risks of our new RFPs to study LLM capabilities, published by Ajeya on November 14, 2023 on The Effective Altruism Forum.
My team at Open Philanthropy just launched two requests for proposals:
Proposals to
create benchmarks measuring how well
LLM agents (like
AutoGPT) perform on difficult real-world tasks, similar to
recent work by ARC Evals.[1]
Proposals to
study and/or forecast the near-term real-world capabilities and impacts of LLMs and systems built from LLMs more broadly.
I think creating a shared scientific understanding of where LLMs are at has large benefits, but it can also accelerate AI capabilities: for example, it might demonstrate possible commercial use cases and spark more investment, or it might allow researchers to more effectively iterate on architectures or training processes. Other things being equal, I think acceleration is harmful because
we're not ready for very powerful AI systems - but I believe the benefits outweigh these costs in expectation, and think better measurements of LLM capabilities are net-positive and important.
To get a sense for whether acting on this belief by launching these two RFPs would constitute falling prey to
the unilateralist's curse, I sent
a survey about whether funding this work would be net-positive or net-negative to 47 relatively senior people who have been full-time working on AI x-risk reduction for multiple years and have likely thought about the risks and benefits of sharing information about AI capabilities.
Out of the 47 people who received the survey, 30 people (64%) responded. Of those, 25 out of 30 said they were "Positive" or "Lean positive" on the RFP, and only 1 person said they were "Lean negative," with no one saying they were "Negative." The remaining four people said they had "No idea," meaning that 29 out of 30 respondents (97%) would not vote to stop the RFPs from happening. With that said, many respondents (~37%) felt torn about the question or considered it complicated.
The rest of this post provides more detail on
the information that the survey-takers received and
the survey results (including sharing answers from those respondents who gave permission to share).
The information that was sent to the survey-takers
The survey-takers received the below email, which links to a
one-pager on the risks and benefits of these RFPs, and a
four-pager (written in late July and early August) about the sorts of projects I expected to fund. After the survey, the latter document evolved into the public-facing RFPs here and here.
Subject: [by Sep 8] Survey on whether measuring AI capabilities is harmful
Hi,
I want to launch a request for proposals asking researchers to produce better measurements of the real-world capabilities of systems composed out of LLMs (similar to the recent work done by
ARC evals).
I expect this work to shorten timelines to superhuman AI, but I think the harm from this is outweighed by the benefits of convincing people of short timelines (if that's true) and enabling a regime of precautions gated to capabilities. See
this 1-pager for more discussion. You can also skim my
project description (~4 pages) to get a better idea of the kinds of grants we might fund, though it's not essential reading (especially if you're broadly familiar with ARC evals).
Please fill out
this short survey on whether you think this project is net-positive or net-negative by EOD Fri Sep 8.
I'm sending this survey to a large number of relatively senior people who have been full-time working on AI x-risk reduction for multiple years and have likely thought about the risks and benefits of sharing information about AI capabilities. The primary intention of this survey is to check whether going ahead with this RFP would constitute falling prey to the unilateralist's curse (i.e., to check ...
My team at Open Philanthropy just launched two requests for proposals:
Proposals to
create benchmarks measuring how well
LLM agents (like
AutoGPT) perform on difficult real-world tasks, similar to
recent work by ARC Evals.[1]
Proposals to
study and/or forecast the near-term real-world capabilities and impacts of LLMs and systems built from LLMs more broadly.
I think creating a shared scientific understanding of where LLMs are at has large benefits, but it can also accelerate AI capabilities: for example, it might demonstrate possible commercial use cases and spark more investment, or it might allow researchers to more effectively iterate on architectures or training processes. Other things being equal, I think acceleration is harmful because
we're not ready for very powerful AI systems - but I believe the benefits outweigh these costs in expectation, and think better measurements of LLM capabilities are net-positive and important.
To get a sense for whether acting on this belief by launching these two RFPs would constitute falling prey to
the unilateralist's curse, I sent
a survey about whether funding this work would be net-positive or net-negative to 47 relatively senior people who have been full-time working on AI x-risk reduction for multiple years and have likely thought about the risks and benefits of sharing information about AI capabilities.
Out of the 47 people who received the survey, 30 people (64%) responded. Of those, 25 out of 30 said they were "Positive" or "Lean positive" on the RFP, and only 1 person said they were "Lean negative," with no one saying they were "Negative." The remaining four people said they had "No idea," meaning that 29 out of 30 respondents (97%) would not vote to stop the RFPs from happening. With that said, many respondents (~37%) felt torn about the question or considered it complicated.
The rest of this post provides more detail on
the information that the survey-takers received and
the survey results (including sharing answers from those respondents who gave permission to share).
The information that was sent to the survey-takers
The survey-takers received the below email, which links to a
one-pager on the risks and benefits of these RFPs, and a
four-pager (written in late July and early August) about the sorts of projects I expected to fund. After the survey, the latter document evolved into the public-facing RFPs here and here.
Subject: [by Sep 8] Survey on whether measuring AI capabilities is harmful
Hi,
I want to launch a request for proposals asking researchers to produce better measurements of the real-world capabilities of systems composed out of LLMs (similar to the recent work done by
ARC evals).
I expect this work to shorten timelines to superhuman AI, but I think the harm from this is outweighed by the benefits of convincing people of short timelines (if that's true) and enabling a regime of precautions gated to capabilities. See
this 1-pager for more discussion. You can also skim my
project description (~4 pages) to get a better idea of the kinds of grants we might fund, though it's not essential reading (especially if you're broadly familiar with ARC evals).
Please fill out
this short survey on whether you think this project is net-positive or net-negative by EOD Fri Sep 8.
I'm sending this survey to a large number of relatively senior people who have been full-time working on AI x-risk reduction for multiple years and have likely thought about the risks and benefits of sharing information about AI capabilities. The primary intention of this survey is to check whether going ahead with this RFP would constitute falling prey to the unilateralist's curse (i.e., to check ...
