
The Reasoning Show The Evolution of Data Platforms
10 snips
Jul 31, 2024 Jeff Denworth, Co-Founder of VAST Data, discusses the dynamic landscape of data platforms shaped by AI demands. He highlights the impact of GPUs on data center architectures, raising questions about storage performance and price in an AI-driven world. Denworth explores the real-world adoption of AI in enterprises and the modernization of infrastructure to meet these demands. He also emphasizes the transition to all-flash storage systems, addressing the evolving skills needed in data management amidst these technological shifts.
AI Snips
Chapters
Transcript
Episode notes
Dataset Quality Shapes LLM Behavior
- Data curation defines model behavior more than model architecture.
- Jeff Denworth notes repeated training on the same dataset makes different LLMs converge, so dataset quality drives outcomes.
RAG Keeps Sensitive Data Outside Models
- Retrieval Augmented Generation (RAG) keeps large corpora external to the model by indexing and using vector search at query time.
- Denworth explains LLMs query a vector database for nearest-context since GPUs' memory can't hold massive corpora.
Embeddings Multiply Storage Requirements
- Activating data for LLMs multiplies storage needs because embeddings are much larger than raw items.
- Jeff Denworth reports customers sometimes need ~10x more storage after creating embeddings for RAG workflows.
