MLOps.community

RAG Quality Starts with Data Quality // Adam Kamor // #262

37 snips
Sep 20, 2024
In this engaging discussion, Adam Kamor, co-founder of Tonic, shares his expertise in creating mock data while ensuring data privacy. He highlights the significance of high-quality data for Retrieval-Augmented Generation (RAG) systems, tackling challenges like data documentation and chunking. Adam emphasizes innovative strategies for managing sensitive information and maintaining accuracy in retrieval. Listeners will gain valuable insights into building effective data pipelines and the critical role of database tools in today’s AI landscape.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ADVICE

Chunking Algorithms

  • There's no one-size-fits-all chunking algorithm for RAG systems.
  • Tonic Textual allows integrating custom chunking algorithms for specific data needs.
ANECDOTE

Chunking FAQs

  • Chunking FAQs by question-answer pairs is effective in RAG systems.
  • This bespoke approach highlights the need for varied chunking strategies.
ADVICE

Data Pipeline Prep

  • Consider document types and storage locations when building RAG pipelines.
  • Convert diverse formats to Markdown to preserve formatting and enable easier processing.
Get the Snipd Podcast app to discover more snips from this episode
Get the app