The Reasoning Show

The Evolution of Data Platforms

10 snips
Jul 31, 2024
Jeff Denworth, Co-Founder of VAST Data, discusses the dynamic landscape of data platforms shaped by AI demands. He highlights the impact of GPUs on data center architectures, raising questions about storage performance and price in an AI-driven world. Denworth explores the real-world adoption of AI in enterprises and the modernization of infrastructure to meet these demands. He also emphasizes the transition to all-flash storage systems, addressing the evolving skills needed in data management amidst these technological shifts.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Dataset Quality Shapes LLM Behavior

  • Data curation defines model behavior more than model architecture.
  • Jeff Denworth notes repeated training on the same dataset makes different LLMs converge, so dataset quality drives outcomes.
INSIGHT

RAG Keeps Sensitive Data Outside Models

  • Retrieval Augmented Generation (RAG) keeps large corpora external to the model by indexing and using vector search at query time.
  • Denworth explains LLMs query a vector database for nearest-context since GPUs' memory can't hold massive corpora.
INSIGHT

Embeddings Multiply Storage Requirements

  • Activating data for LLMs multiplies storage needs because embeddings are much larger than raw items.
  • Jeff Denworth reports customers sometimes need ~10x more storage after creating embeddings for RAG workflows.
Get the Snipd Podcast app to discover more snips from this episode
Get the app