Super Data Science: ML & AI Podcast with Jon Krohn

914: Data Lakes 101 (and Why They’re Key for AI Models), with Oz Katz

147 snips
Aug 15, 2025
Oz Katz, Cofounder and CTO of lakeFS, shares his expertise on data lakes, essential for modern AI applications. He highlights the differences between data lakes and data warehouses, emphasizing their roles in managing complex data infrastructures. Katz discusses lakeFS's collaboration with Legofest, the challenges of handling multimodal data, and how version control can enhance team collaboration. He also explores the revolutionary shift towards object storage and the integration of vector databases to improve data accessibility and efficiency.
Ask episode
AI Snips
Chapters
Books
Transcript
Episode notes
ANECDOTE

Time Zone Travel Anecdote

  • Oz Katz mentions he was in San Francisco en route to Brisbane after recording.
  • He uses the anecdote about the international date line being nonstraight to illustrate time zone oddities.
INSIGHT

AI Pushes Beyond Tabular Data

  • Modern AI extracts value from many modalities beyond tabular data, like images, audio, and embeddings.
  • AI workflows demand infrastructure that supports multimodal and unstructured formats.
INSIGHT

Multimodal Data Creates Fragmented Truths

  • Multimodal pipelines create fragmented storage: raw files, embeddings, features, and labels often live in different places.
  • This fragmentation leads to multiple sources of truth that quickly fall out of sync.
Get the Snipd Podcast app to discover more snips from this episode
Get the app