
Super Data Science: ML & AI Podcast with Jon Krohn 914: Data Lakes 101 (and Why They’re Key for AI Models), with Oz Katz
147 snips
Aug 15, 2025 Oz Katz, Cofounder and CTO of lakeFS, shares his expertise on data lakes, essential for modern AI applications. He highlights the differences between data lakes and data warehouses, emphasizing their roles in managing complex data infrastructures. Katz discusses lakeFS's collaboration with Legofest, the challenges of handling multimodal data, and how version control can enhance team collaboration. He also explores the revolutionary shift towards object storage and the integration of vector databases to improve data accessibility and efficiency.
AI Snips
Chapters
Books
Transcript
Episode notes
Time Zone Travel Anecdote
- Oz Katz mentions he was in San Francisco en route to Brisbane after recording.
- He uses the anecdote about the international date line being nonstraight to illustrate time zone oddities.
AI Pushes Beyond Tabular Data
- Modern AI extracts value from many modalities beyond tabular data, like images, audio, and embeddings.
- AI workflows demand infrastructure that supports multimodal and unstructured formats.
Multimodal Data Creates Fragmented Truths
- Multimodal pipelines create fragmented storage: raw files, embeddings, features, and labels often live in different places.
- This fragmentation leads to multiple sources of truth that quickly fall out of sync.


