#045 RAG As Two Things - Prompt Engineering and Search

13 snips

Mar 6, 2025

In this discussion, John Berryman, an expert who transitioned from aerospace engineering to search and machine learning, explores the dual nature of retrieval-augmented generation (RAG). He emphasizes separating search from prompt engineering for optimal performance. Berryman shares insights on effective prompting strategies using familiar structures, testing human evaluations, and managing token limits. He dives into the differences between chat and completion models and highlights practical techniques for tackling AI applications and workflows. It's a deep dive into enhancing interactions with AI!

Ask episode

AI Snips

Chapters

Books

Transcript

Episode notes

INSIGHT

RAG Splits Into Two Distinct Problems

RAG is two separate problems: retrieval and prompt engineering.
Treat and optimize search and prompting independently to find where failures occur.

ADVICE

Stay On The Model's Familiar Path

Mimic formats and structures the model saw in training when you prompt.
Use Markdown, docstrings, or domain report formats so the model recognizes the pattern.

ADVICE

Start With Vibe Tests, Then Quantify

Start prompt tuning with human 'vibe testing' to detect obvious failures fast.
Then build systematic tests and use token probabilities to measure when few-shot examples stop adding value.

Get the Snipd Podcast app to discover more snips from this episode

Get the app

John Berryman moved from aerospace engineering to search, then to ML and LLMs. His path: Eventbrite search → GitHub code search → data science → GitHub Copilot. He was drawn to more math and ML throughout his career.

RAG Explained

"RAG is not a thing. RAG is two things." It breaks into:

Search - finding relevant information
Prompt engineering - presenting that information to the model

These should be treated as separate problems to optimize.

The Little Red Riding Hood Principle

When prompting LLMs, stay on the path of what models have seen in training. Use formats, structures, and patterns they recognize from their training data:

For code, use docstrings and proper formatting
For financial data, use SEC report structures
Use Markdown for better formatting

Models respond better to familiar structures.

Testing Prompts

Testing strategies:

Start with "vibe testing" - human evaluation of outputs
Develop systematic tests based on observed failure patterns
Use token probabilities to measure model confidence
For few-shot prompts, watch for diminishing returns as examples increase

Managing Token Limits

When designing prompts, divide content into:

Static elements (boilerplate, instructions)
Dynamic elements (user inputs, context)

Prioritize content by:

Must-have information
Nice-to-have information
Optional if space allows

Even with larger context windows, efficiency remains important for cost and latency.

Completion vs. Chat Models

Chat models are winning despite initial concerns about their constraints:

Completion models allow more flexibility in document format
Chat models are more reliable and aligned with common use cases
Most applications now use chat models, even for completion-like tasks

Applications: Workflows vs. Assistants

Two main LLM application patterns:

Assistants: Human-in-the-loop interactions where users guide and correct
Workflows: Decomposed tasks where LLMs handle well-defined steps with safeguards

Breaking Down Complex Problems

Two approaches:

Horizontal: Split into sequential steps with clear inputs/outputs
Vertical: Divide by case type, with specialized handling for each scenario

Example: For SOX compliance, break horizontally (understand control, find evidence, extract data, compile report) and vertically (different audit types).

On Agents

Agents exist on a spectrum from assistants to workflows, characterized by:

Having some autonomy to make decisions
Using tools to interact with the environment
Usually requiring human oversight

Best Practices

For building with LLMs:

Start simple: API key + Jupyter notebook
Build prototypes and iterate quickly
Add evaluation as you scale
Keep users in the loop until models prove reliability

John Berryman:

Nicolay Gerold:

⁠LinkedIn⁠
⁠X (Twitter)
00:00 Introduction to RAG: Retrieval and Generation
00:19 Optimizing Retrieval Systems
01:11 Introducing John Berryman
02:31 John's Journey from Search to Prompt Engineering
04:05 Understanding RAG: Search and Prompt Engineering
05:39 The Little Red Riding Hood Principle in Prompt Engineering
14:14 Balancing Static and Dynamic Elements in Prompts
25:52 Assistants vs. Workflows: Choosing the Right Approach
30:15 Defining Agency in AI
30:35 Spectrum of Assistance and Workflows
34:35 Breaking Down Problems Horizontally and Vertically
37:57 SOX Compliance Case Study
40:56 Integrating LLMs into Existing Applications
44:37 Favorite Tools and Missing Features
46:37 Exploring Niche Technologies in AI
52:52 Key Takeaways and Future Directions