LW - Palisade is hiring Research Engineers by Charlie Rogers-Smith

Nov 11, 2023

Ask episode

Chapters

Transcript

Episode notes

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Palisade is hiring Research Engineers, published by Charlie Rogers-Smith on November 11, 2023 on LessWrong.
Palisade is looking to hire Research Engineers. We are a small team consisting of Jeffrey Ladish (Executive Director), Charlie Rogers-Smith (Chief of Staff), and Kyle Scott (part-time Treasurer & Operations). In joining Palisade, you would be a founding member of the team, and would have substantial influence over our strategic direction. Applications are rolling, and you can fill out our short (~10-20 minutes) application form
here.
Palisade's mission
We research dangerous AI capabilities to better understand misuse risks from current systems, and how advances in hacking, deception, and persuasion will affect the risk of catastrophic AI outcomes. We create concrete demonstrations of dangerous capabilities to advise policy makers and the public on AI risks.
We are working closely with government agencies, policy think tanks, and media organizations to inform relevant decision makers. For example,
our
work demonstrating that it is possible to effectively undo Llama 2-Chat 70B's safety fine-tuning for less than $200 has been used to
confront Mark Zuckerburg in the first of Chuck Schumer's Insight Forums, cited by Senator Hassan in a
senate hearing on threats to national security, and used to advise the UK AI Safety Institute.
We plan to study dangerous capabilities in both open source and API-gated models in the following areas:
Automated hacking. Current AI systems can already automate parts of the
cyber kill chain. We've demonstrated that GPT-4 can leverage known vulnerabilities to achieve remote code execution on unpatched Windows 7 machines. We plan to explore how AI systems could conduct reconnaissance, compromise target systems, and use information from compromised systems to pivot laterally through corporate networks or carry out social engineering attacks.
Spear phishing and deception. Preliminary
research suggests that LLMs can be effectively used to phish targets. We're currently exploring how well AI systems can scrape personal information and leverage it to craft scalable spear-phishing campaigns. We also plan to study how well conversational AI systems could build rapport with targets to convince them to reveal information or take actions contrary to their interests.
Scalable disinformation. Researchers have
begun to explore how LLMs can be used to create targeted disinformation campaigns at scale. We've demonstrated to policymakers how a combination of text, voice, and image generation models can be used to create a fake reputation-smearing campaign against a target journalist. We plan to study the cost, scalability, and effectiveness of AI-disinformation systems.
We are looking for
People who excel at:
Working with language models. We're looking for somebody who is or could quickly become very skilled at working with frontier language models. This includes supervised fine-tuning, using reward models/functions (RLHF/RLAIF), building scaffolding (e.g. in the style of AutoGPT), and prompt engineering / jailbreaking.
Software engineering. Alongside working with LMs, much of the work you do will benefit from a strong foundation in software engineering - such as when designing APIs, working with training data, or doing front-end development. Moreover, strong SWE experience will help getting up to speed with working with LMs, hacking, or new areas we want to pivot to.
Technical communication. By writing papers, blog posts, and internal documents; and by speaking with the team and external collaborators about your research.
While it's advantageous to excel at all three of these skills, we will strongly consider people who are either great at working with language models or at software engineering, while being able to communicate their work well.
Competenci...