The Reasoning Show

Composable Data Analytics

10 snips
Feb 8, 2023
Ask episode
AI Snips
Chapters
Transcript
Episode notes
ANECDOTE

Data Movement Crippled Our Cybersecurity Cluster

  • Josh Patterson describes a cybersecurity project where teams used Spark, Hadoop, Kudu, Cassandra and others and spent most compute just moving data between systems.
  • Adding GPUs worsened it: moving data into a GPU graph engine took far longer than running graph algorithms, exposing the heavy serialization/deserialization cost.
INSIGHT

Arrow Is A Universal Columnar Data Layer

  • Apache Arrow standardizes an in-memory and on-disk columnar representation so systems avoid repeated serialization and deserialization.
  • Arrow is language and hardware agnostic and also defines transport standards, enabling direct data sharing across CPU, GPU, and languages.
INSIGHT

Arrow Inverts The Usual Abstraction Layer

  • Arrow standardizes the data layout 'under the hood' so different languages and systems can interoperate without custom conversions.
  • This inverts typical abstraction: user APIs differ, but internal binary layout is consistent across systems.
Get the Snipd Podcast app to discover more snips from this episode
Get the app