How AI Is Built

Lance v2: Rethinking Columnar Storage for Faster Lookups, Nulls, and Flexible Encodings | changelog 2

9 snips
Apr 29, 2024
Weston Pace discusses LanceDB V2, a vector database with new file format enhancing columnar storage for multimodal datasets. Goals include null value support, multimodal data handling, and optimal search performance. Lance V2 allows efficient storage of large data without memory hogging. Benefits of Arrow integration and custom encodings in Python for experimentation.
Ask episode
AI Snips
Chapters
Transcript
Episode notes
INSIGHT

Arrow Round-Trip And Custom Encodings

  • Lance V2 restores Arrow compatibility and adds null support so data can fully round-trip through the format.
  • It also introduces custom encodings and varbinary layouts to reduce IO size for specific workloads.
INSIGHT

Columnar Container, Not Just Tables

  • Lance V2 treats columns like a flexible container format, not just table columns, enabling unconventional layouts.
  • The format records where encodings live and lets developers create custom codecs independently of the container.
INSIGHT

Three Core Design Goals

  • Lance V2 targets three goals: null support, efficient multimodal data writes, and balanced point-lookup vs full-scan performance.
  • It aims to write large images/embeddings without huge memory buffering while retaining read performance.
Get the Snipd Podcast app to discover more snips from this episode
Get the app